AI News, BOOK REVIEW: PetrochukM/PyTorch-NLP


PyTorch-NLP, or torchnlp for short, is a library of neural network layers, text processing modules and datasets designed to accelerate Natural Language Processing (NLP) research.

Add PyTorch-NLP to your project by following one of the common use cases: Load the IMDB dataset, for example: For example, from the neural network package, apply a Simple Recurrent Unit (SRU): Tokenize and encode text as a tensor.

Read our contributing guide to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes to PyTorch-NLP.

From an architecture standpoint, torchtext is object orientated with external coupling while PyTorch-NLP is object orientated with low coupling.

Nvidia developer blog

But many linguists think that language is best understood as a hierarchical tree of phrases, so a significant amount of research has gone into deep learning models known as recursive neural networks that take this structure into account.

The work of developers at Facebook AI Research and several other labs, the framework combines the efficient and flexible GPU-accelerated backend libraries from Torch7 with an intuitive Python frontend that focuses on rapid prototyping, readable code, and support for the widest possible variety of deep learning models.

This post walks through the PyTorch implementation of a recursive neural network with a recurrent tracker and TreeLSTM nodes, also known as SPINN—an example of a deep learning model from natural language processing that is difficult to build in many popular frameworks.

The task is to classify pairs of sentences into three categories: assuming that sentence one is an accurate caption for an unseen image, then is sentence two (a) definitely, (b) possibly, or (c) definitely not also an accurate caption?

For example, suppose sentence one is “two dogs are running through a field.” Then a sentence that would make the pair an entailment might be “there are animals outdoors,” one that would make the pair neutral might be “some puppies are running to catch a stick,” and one that would make it a contradiction could be “the pets are sitting on a couch.” In particular, the goal of the research that led to SPINN was to do this by encoding each sentence into a fixed-length vector representation before determining their relationship (there are other ways, such as attentional models that compare individual parts of each sentence with each other using a kind of soft focus).

Here’s an example of a sentence from the dataset, with its parse tree represented by nested parentheses: One way to encode this sentence using a neural network that takes the parse tree into account would be to build a neural network layer Reduce that combines pairs of words (represented by word embeddings like GloVe) and/or phrases, then apply this layer recursively, taking the result of the last Reduce operation as the encoding of the sentence: But what if I want the network to work in an even more humanlike way, reading from left to right and maintaining sentence context while still combining phrases using the parse tree?

Here’s the same parse tree written a slightly different way: Or a third way, again equivalent: All I did was remove open parentheses, then tag words with “S” for “shift” and replace close parentheses with “R” for “reduce.” But now the information can be read from left to right as a set of instructions for manipulating a stack and a stack-like buffer, with exactly the same results as the recursive method described above: I

The Tracker produces a new state at every step of the stack manipulation (i.e., after reading each word or close parenthesis) given the current sentence context state, the top entry b in the buffer, and the top two entries s1, s2 in the stack: You could easily imagine writing code to do these things in your favorite programming language.

I iterate over the set of “shift” and “reduce” operations contained in transitions, running the Tracker if it exists and going through each example in the batch to apply the “shift” operation if requested or add it to a list of examples that need the “reduce” operation.

It makes sense to operate independently on the various examples here in the main forward method, keeping separate buffers and stacks for each of the examples in the batch, since all of the math-heavy, GPU-accelerated operations that benefit from batched execution take place in Tracker and Reduce.

This composition function requires that the state of each of the children actually consist of two tensors, a hidden state h and a memory cell state c, while the function is defined using two linear layers (nn.Linear) operating on the children’s hidden states and a nonlinear combination function tree_lstm that combines the result of the linear layers with the children’s memory cell states.

The forward code for SPINN and its submodules produces an extraordinarily complex computation graph (Figure 3) culminating in loss, whose details are completely different for every batch in the dataset, but which can be automatically backpropagated each time with very little overhead simply by calling loss.backward(), a function built into PyTorch that performs backpropagation from any point in a graph.

While the original implementation takes 21 minutes to compile the computation graph (meaning that the debugging cycle during implementation is at least that long), then about five days to train, the version described here has no compilation step and takes about 13 hours to train on a Tesla K40 GPU, or about 9 hours on a Quadro GP100.

In addition, it would be effectively impossible to build a version of the SPINN whose Tracker decides how to parse the input sentence as it reads it since the graph structures in Fold—while they depend on the structure of an input example—must be completely fixed once an input example is loaded.

The researchers wrote that they “use batch size 1 since the computation graph needs to be reconstructed for every example at every iteration depending on the samples from the policy network [Tracker]”—but PyTorch would enable them to use batched training even on a network like this one with complex, stochastically varying structure.

Then, once the batch has run all the way through and the model knows how accurately it predicted its categories, I can send reward signals back through these stochastic computation graph nodes in addition to backpropagating through the rest of the graph in the traditional way: The Google researchers reported results from SPINN plus RL that were a little bit better than what the original SPINN obtained on SNLI—despite the RL version using no precomputed parse tree information.

PyTorch in 5 Minutes

I'll explain PyTorch's key features and compare it to the current most popular deep learning framework in the world (Tensorflow). We'll then write out a short ...

How to Make a Text Summarizer - Intro to Deep Learning #10

I'll show you how you can turn an article into a one-sentence summary in Python with the Keras machine learning library. We'll go over word embeddings, ...

PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Using PyTorch | Edureka

Deep Learning Training: ** ) This Edureka PyTorch Tutorial video (Blog: will help you in understanding various .

Lecture 16: Dynamic Neural Networks for Question Answering

Lecture 16 addresses the question ""Can all NLP tasks be seen as question answering problems?"". Key phrases: Coreference Resolution, Dynamic Memory ...

How to Do Sentiment Analysis - Intro to Deep Learning #3

In this video, we'll use machine learning to help classify emotions! The example we'll use is classifying a movie review as either positive or negative via TF Learn ...

How to Make a Simple Tensorflow Speech Recognizer

In this video, we'll make a super simple speech recognizer in 20 lines of Python using the Tensorflow machine learning library. I go over the history of speech ...

Deep Learning Methods for Emotion Detection from Text - Dr. Liron Allerhand

Sentiment analysis is an active research field where researchers aim to automatically determine the polarity of text [1], either as a binary problem or as a ...

Text Classification - Natural Language Processing With Python and NLTK p.11

Now that we understand some of the basics of of natural language processing with the Python NLTK module, we're ready to try out text classification. This is ...

TDLS: Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

Toronto Deep Learning Series, 7 August 2018 Paper Review: Speaker: Organizer: .

How to Predict Stock Prices Easily - Intro to Deep Learning #7

We're going to predict the closing price of the S&P 500 using a special type of recurrent neural network called an LSTM network. I'll explain why we use ...