AI News, Reading a neural network’s mind

Reading a neural network’s mind

Neural networks, which learn to perform computational tasks by analyzing huge sets of training data, have been responsible for the most impressive recent advances in artificial intelligence, including speech-recognition and automatic-translation systems.

In several recent papers, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Qatar Computing Research Institute have used a recently developed interpretive technique, which had been applied in other areas, to analyze neural networks trained to do machine translation and speech recognition.

The MIT and QCRI researchers’ technique consists of taking a trained network and using the output of each of its layers, in response to individual training examples, to train another neural network to perform a particular task.

The “t” sounds in the words “tea,” “tree,” and “but,” for instance, might be classified as separate phones, but a speech recognition system has to transcribe all of them using the letter “t.” And indeed, Belinkov and Glass found that lower levels of the network were better at recognizing phones than higher levels, where, presumably, the distinction is less important.

Similarly, in an earlier paper, presented last summer at the Annual Meeting of the Association for Computational Linguistics, Glass, Belinkov, and their QCRI colleagues showed that the lower levels of a machine-translation network were particularly good at recognizing parts of speech and morphology — features such as tense, number, and conjugation.

As Belinkov explains, a part-of-speech tagger will recognize that “herself” is a pronoun, but the meaning of that pronoun — its semantic sense — is very different in the sentences “she bought the book herself” and “she herself bought the book.” A semantic tagger would assign different tags to those two instances of “herself,” just as a machine translation system might find different translations for them in a given target language.

In such systems, the input, in the source language, passes through several layers of the network — known as the encoder — to produce a vector, a string of numbers that somehow represent the semantic content of the input.

Reading a neural network’s mind

Neural networks, which learn to perform computational tasks by analyzing huge sets of training data, have been responsible for the most impressive recent advances in artificial intelligence, including speech-recognition and automatic-translation systems.

In several recent papers, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Qatar Computing Research Institute have used a recently developed interpretive technique, which had been applied in other areas, to analyze neural networks trained to do machine translation and speech recognition.

The MIT and QCRI researchers’ technique consists of taking a trained network and using the output of each of its layers, in response to individual training examples, to train another neural network to perform a particular task.

The “t” sounds in the words “tea,” “tree,” and “but,” for instance, might be classified as separate phones, but a speech recognition system has to transcribe all of them using the letter “t.” And indeed, Belinkov and Glass found that lower levels of the network were better at recognizing phones than higher levels, where, presumably, the distinction is less important.

Similarly, in an earlier paper, presented last summer at the Annual Meeting of the Association for Computational Linguistics, Glass, Belinkov, and their QCRI colleagues showed that the lower levels of a machine-translation network were particularly good at recognizing parts of speech and morphology — features such as tense, number, and conjugation.

As Belinkov explains, a part-of-speech tagger will recognize that “herself” is a pronoun, but the meaning of that pronoun — its semantic sense — is very different in the sentences “she bought the book herself” and “she herself bought the book.” A semantic tagger would assign different tags to those two instances of “herself,” just as a machine translation system might find different translations for them in a given target language.

In such systems, the input, in the source language, passes through several layers of the network — known as the encoder — to produce a vector, a string of numbers that somehow represent the semantic content of the input.

arXiv.org > stat > arXiv:1706.02480

1706 Change to browse by: cs cs.LG stat References & Citations NASA ADS Bookmark (what is this?) Statistics > Machine Learning Forward Thinking: Building and Training Neural Networks One Layer at a Time Chris Hettinger, Tanner Christensen, Ben Ehlert, Jeffrey Humpherys, Tyler Jarvis, Sean Wade (Submitted on 8 Jun 2017) We present a general framework for training deep neural networks without backpropagation.

This substantially decreases training time and also allows for construction of deep networks with many sorts of learners, including networks whose layers are defined by functions that are not easily differentiated, like decision trees.

The process is repeated, transforming the data through multiple layers, one at a time, rendering a new data set, which is expected to be better behaved, and on which a final output layer can achieve good performance.