AI News, More from r/MachineLearning
- On Tuesday, June 5, 2018
- By Read More
More from r/MachineLearning
Currently, I think that recurrent neural nets, specifically LSTMs, offer a lot more hope than when I made that comment.
To raise the log probability of a particular translation you just have to backpropagate the derivatives of the log probabilities of the individual picks through the combination of encoder and decoder.
The amazing thing is that when an encoder and decoder net are trained on a fairly big set of translated pairs (WMT'14), the quality of the translations beats the former state-of-the-art for systems trained with the same amount of data.
Given what happened in 2009 when acoustic models that used deep neural nets matched the state-of-the-art acoustic models that used Gaussian mixtures, I think the writing is clearly on the wall for phrase-based translation.
One nice aspect of this approach is that it should learn to represent thoughts in a language-independent way and it will be able to translate between pairs of foreign languages without having to go via English.
How to Implement a Beam Search Decoder for Natural Language Processing
Natural language processing tasks, such as caption generation and machine translation, involve generating sequences of words.
Models developed for these problems often operate by generating probability distributions across the vocabulary of output words and it is up to decoding algorithms to sample the probability distributions to generate the most likely sequences of words.
In natural language processing tasks such as caption generation, text summarization, and machine translation, the prediction required is a sequence of words.
It is common for models developed for these types of problems to output a probability distribution over each word in the vocabulary for each word in the output sequence.
You are likely to encounter this when working with recurrent neural networks on natural language processing tasks where text is generated as an output.
The final layer in the neural network model has one neuron for each word in the output vocabulary and a softmax activation function is used to output a likelihood of each word in the vocabulary being the next word in the sequence.
Each individual prediction has an associated score (or probability) and we are interested in output sequence with maximal score (or maximal probability) […] One popular approximate technique is using greedy prediction, taking the highest scoring item at each stage.
Instead of greedily choosing the most likely next step as the sequence is constructed, the beam search expands all possible next steps and keeps the k most likely, where k is a user-specified parameter and controls the number of beams or parallel searches through the sequence of probabilities.
Common beam width values are 1 for a greedy search and values of 5 or 10 for common benchmark problems in machine translation.
Larger beam widths result in better performance of a model as the multiple candidate sequences increase the likelihood of better matching a target sequence.
In NMT, new sentences are translated by a simple beam search decoder that finds a translation that approximately maximizes the conditional probability of a trained NMT model.
The beam search strategy generates the translation word by word from left-to-right while keeping a fixed number (beam) of active candidates at each time step.
The search process can halt for each candidate separately either by reaching a maximum length, by reaching an end-of-sequence token, or by reaching a threshold likelihood.
To avoid underflowing the floating point numbers, the natural logarithm of the probabilities are multiplied together, which keeps the numbers larger and manageable.
Machine Learning is Fun Part 5: Language Translation with Deep Learning and the Magic of Sequences
Some of the smartest linguists in the world labored for years during the Cold War to create translation systems as a way to interpret Russian communications more easily.
The way we speak English more influenced by who invaded who hundreds of years ago than it is by someone sitting down and defining grammar rules.
After the failure of rule-based systems, new translation approaches were developed using models based on probability and statistics instead of grammar rules.
In the same way that the Rosetta Stone was used by scientists in the 1800s to figure out Egyptian hieroglyphs from Greek, computers can use parallel corpora to guess how to convert text from one language to another.
Here’s how it works: First, we break up our sentence into simple chunks that can each be easily translated: Next, we will translate each of these chunks by finding all the ways humans have translated those same chunks of words in our training data.
For example, it’s much more common for someone to say “Quiero” to mean “I want” than to mean “I try.” So we can use how frequently “Quiero” was translated to “I want” in our training data to give that translation more weight than a less frequent translation.
Just from the chunk translations we listed in Step 2, we can already generate nearly 2,500 different variations of our sentence by combining the chunks in different ways.
Here are some examples: But in a real-world system, there will be even more possible chunk combinations because we’ll also try different orderings of words and different ways of chunking the sentence: Now need to scan through all of these generated sentences to find the one that is that sounds the “most human.” To do this, we compare each generated sentence to millions of real sentences from books and news stories written in English.
In the early days, it was surprising to everyone that the “dumb” approach to translating based on probability worked better than rule-based systems designed by linguists.
If you are asking Google to translate Georgian to Telegu, it has to internally translate it into English as an intermediate step because there’s not enough Georgain-to-Telegu translations happening to justify investing heavily in that language pair.
The holy grail of machine translation is a black box system that learns how to translate by itself— just by looking at training data.
regular (non-recurrent) neural network is a generic machine learning algorithm that takes in a list of numbers and calculates a result (based on previous training).
For example, we can use a neural network to calculate the approximate value of a house based on attributes of that house: But like most machine learning algorithms, neural networks are stateless.
recurrent neural network (or RNN for short) is a slightly tweaked version of a neural network where the previous state of the neural network is one of the inputs to the next calculation.
You’re probably already familiar with this idea from watching any primetime detective show like CSI: The idea of turning a face into a list of measurements is an example of an encoding.
We can come up with an encoding that represents every possible different sentence as a series of unique numbers: To generate this encoding, we’ll feed the sentence into the RNN, one word at time.
The final result after the last word is processed will be the values that represent the entire sentence: Great, so now we have a way to represent an entire sentence as a set of unique numbers!
- On Tuesday, January 15, 2019
How to Make a Text Summarizer - Intro to Deep Learning #10
Only a few days left to signup for my Decentralized Applications course! I'll show you how you can turn an article into a one-sentence ..
How to Make a Language Translator - Intro to Deep Learning #11
Only a few days left to signup for my Decentralized Applications course! Let's build our own language translator using Tensorflow
Lecture 10: Neural Machine Translation and Models with Attention
Lecture 10 introduces translation, machine translation, and neural machine translation. Google's new NMT is highlighted followed by sequence models with ...
Cryptography: The Science of Making and Breaking Codes
There are lots of different ways to encrypt a message, from early, simple ciphers to the famous Enigma machine. But it's tough to make a code truly unbreakable.
Secrets Hidden in Images (Steganography) - Computerphile
Secret texts buried in a picture of your dog? Image Analyst Dr. Mike Pound explains the art of steganography in digital images. The Problem with JPEG: ...
Hex Editor for Windows - Edit Binary Files easily - Synalysis
Hex Editor for Windows - Edit Binary Files easily - Synalysis More information on A hex editor for Windows, also called ..
Lecture 9: Machine Translation and Advanced Recurrent LSTMs and GRUs
Lecture 9 recaps the most important concepts and equations covered so far followed by machine translation and fancy RNN models tackling MT. Key phrases: ...
Sequence Models and the RNN API (TensorFlow Dev Summit 2017)
In this talk, Eugene Brevdo discusses the creation of flexible and high-performance sequence-to-sequence models. He covers reading and batching sequence ...
How To Use the Translation Features of Microsoft Word
Do you have a Microsoft Word document that you want translated into another language? This quick tutorial walks you through all of the translation features of ...
Automatic Speech Recognition - An Overview
An overview of how Automatic Speech Recognition systems work and some of the challenges. See more on this video at ...