AI News, Machine Learning algorithms: Working with text data
- On Sunday, June 3, 2018
- By Read More
Machine Learning algorithms: Working with text data
JAXenter: What is the difference between image and text from a machine’s point of view?
Christoph Henkelmann: Almost all ML methods, especially neural networks, want tensors (multidimensional arrays of numbers) as input.
In case of an image the transformation is obvious, we already have a three-dimensional array of pixels (width x height x color channel), i.e.
Text and words exist at a higher level of meaning, for example, if you simply enter Unicode-encoded letters as numbers in the net, the jump from coding to semantics is too “high”.
after all, it is intended to finally solve all the coding problems from the early days of word processing.
If you use standard methods of some programming languages to split text from different sources, you suddenly wonder why words still stick together.
basically the same as with a text file, through methods where individual words are encoded as the smallest unit, to methods, where a tensor is generated from an entire document, which is actually more of a “fingerprint”
Christoph Henkelmann: Exactly, much more than with images or audio, the pre-processing of text has an effect on the semantic level at which the process moves.
Sometimes preprocessing itself is already a kind of machine learning, so that we can already answer questions, only because we have coded the text differently.
- On Monday, March 25, 2019
Rob Speer of Luminoso Discusses Natural Language Processing
Rob Speer is the chief scientist at Luminoso. He is an alumnus of the MIT Media Lab, where he worked on the ConceptNet project, an open, multilingual ...
Manuel Ebert - Putting 1 million new words into the dictionary - PyCon 2016
Speaker: Manuel Ebert 2015 was the year of spocking, amabots, dadbuds, and smol. Like half of all english words used every day, these words are not in the ...