AI News, Machine Learning algorithms: Working with text data

Machine Learning algorithms: Working with text data

JAXenter: What is the difference between image and text from a machine’s point of view?

Christoph Henkelmann: Almost all ML methods, especially neural networks, want tensors (multidimensional arrays of numbers) as input.

In case of an image the transformation is obvious, we already have a three-dimensional array of pixels (width x height x color channel), i.e.

Text and words exist at a higher level of meaning, for example, if you simply enter Unicode-encoded letters as numbers in the net, the jump from coding to semantics is too “high”.

after all, it is intended to finally solve all the coding problems from the early days of word processing.

If you use standard methods of some programming languages to split text from different sources, you suddenly wonder why words still stick together.

basically the same as with a text file, through methods where individual words are encoded as the smallest unit, to methods, where a tensor is generated from an entire document, which is actually more of a “fingerprint”

Christoph Henkelmann: Exactly, much more than with images or audio, the pre-processing of text has an effect on the semantic level at which the process moves.

Sometimes preprocessing itself is already a kind of machine learning, so that we can already answer questions, only because we have coded the text differently.

Rob Speer of Luminoso Discusses Natural Language Processing

Rob Speer is the chief scientist at Luminoso. He is an alumnus of the MIT Media Lab, where he worked on the ConceptNet project, an open, multilingual ...

Manuel Ebert - Putting 1 million new words into the dictionary - PyCon 2016

Speaker: Manuel Ebert 2015 was the year of spocking, amabots, dadbuds, and smol. Like half of all english words used every day, these words are not in the ...