AI News, Text Classifier Algorithms in Machine Learning

Text Classifier Algorithms in Machine Learning

In fields such as computer vision, there’s a strong consensus about a general way of designing models − deep networks with lots of residual connections.

The toolbox of a modern machine learning practitioner who focuses on text mining spans from TF-IDF features and Linear SVMs, to word embeddings (word2vec) and attention-based neural architectures.

The practical lesson we can learn here is that despite the results of certain methods published in research, getting the best performance from the particular tasks in vivo is closer to art than to science, requiring careful tuning of complicated pipelines.

Their functionality is really straightforward, and since the actual semantics of those vectors are not interesting for our problem, the only remaining question is “What is the best way to initialize the weights?” Depending on the problem, the answers may be as counterintuitive as the advice “generate your own synthetic labels, train word2vec on them, and init the embedding layer with them.” But for all practical purposes you can use a pre-trained set of embeddings and jointly fine-tune it for your particular model.

The go-to solution here is to use pretrained word2vec embeddings and try to use lower learning rates for the embedding layer (multiply general learning rate by 0.1).

When the problem consists of obtaining a single prediction for a given document (spam/not spam), the most straightforward and reliable architecture is a multilayer fully connected text classifier applied to the hidden state of a recurrent network.

There’s no single trick here, but keeping a lot of feature maps in the beginning and reducing their number exponentially later helps to avoid learning irrelevant patterns.

Sequence Classification with LSTM Recurrent Neural Networks in Python with Keras

Sequence classification is a predictive modeling problem where you have some sequence of inputs over space or time and the task is to predict a category for the sequence.

What makes this problem difficult is that the sequences can vary in length, be comprised of a very large vocabulary of input symbols and may require the model to learn the long-term context or dependencies between symbols in the input sequence.

In this post, you will discover how you can develop LSTM recurrent neural network models for sequence classification problems in Python using the Keras deep learning library.

The data was collected by Stanford researchers and was used in a 2011 paper where a split of 50-50 of the data was used for training and test.

We will map each movie review into a real vector domain, a popular technique when working with text called word embedding.

This is a technique where words are encoded as real-valued vectors in a high dimensional space, where the similarity between words in terms of meaning translates to closeness in the vector space.

Finally, the sequence length (number of words) in each review varies, so we will constrain each review to be 500 words, truncating long reviews and pad the shorter reviews with zero values.

Let’s start off by importing the classes and functions required for this model and initializing the random number generator to a constant value to ensure we can easily reproduce the results.

The model will learn the zero values carry no information so indeed the sequences are not the same length in terms of content, but same length vectors is required to perform the computation in Keras.

Finally, because this is a classification problem we use a Dense output layer with a single neuron and a sigmoid activation function to make 0 or 1 predictions for the two classes (good and bad) in the problem.

For example, we can modify the first example to add dropout to the input and recurrent connections as follows: The full code listing with more precise LSTM dropout is listed below for completeness.

Dropout is a powerful technique for combating overfitting in your LSTM models and it is a good idea to try both methods, but you may bet better results with the gate-specific dropout provided in Keras.

The IMDB review data does have a one-dimensional spatial structure in the sequence of words in reviews and the CNN may be able to pick out invariant features for good and bad sentiment.

Text Classifier Algorithms in Machine Learning

In fields such as computer vision, there’s a strong consensus about a general way of designing models − deep networks with lots of residual connections.

The toolbox of a modern machine learning practitioner who focuses on text mining spans from TF-IDF features and Linear SVMs, to word embeddings (word2vec) and attention-based neural architectures.

The practical lesson we can learn here is that despite the results of certain methods published in research, getting the best performance from the particular tasks in vivo is closer to art than to science, requiring careful tuning of complicated pipelines.

Their functionality is really straightforward, and since the actual semantics of those vectors are not interesting for our problem, the only remaining question is “What is the best way to initialize the weights?” Depending on the problem, the answers may be as counterintuitive as the advice “generate your own synthetic labels, train word2vec on them, and init the embedding layer with them.” But for all practical purposes you can use a pre-trained set of embeddings and jointly fine-tune it for your particular model.

The go-to solution here is to use pretrained word2vec embeddings and try to use lower learning rates for the embedding layer (multiply general learning rate by 0.1).

When the problem consists of obtaining a single prediction for a given document (spam/not spam), the most straightforward and reliable architecture is a multilayer fully connected text classifier applied to the hidden state of a recurrent network.

There’s no single trick here, but keeping a lot of feature maps in the beginning and reducing their number exponentially later helps to avoid learning irrelevant patterns.

The Keras Blog

In this tutorial, we will walk you through the process of solving a text classification problem using pre-trained word embeddings and a convolutional neural network.

The task we will try to solve will be to classify posts coming from 20 different newsgroup, into their original 20 categories --the infamous '20 Newsgroup dataset'.

Here are a few sample categories: Here's how we will solve the classification problem: First, we will simply iterate over the folders in which our text samples are stored, and format them into a list of samples.

We will also prepare at the same time a list of class indices matching the samples: Then we can format our text samples and labels into tensors that can be fed into a neural network.

Next, we compute an index mapping words to known embeddings, by parsing the data dump of pre-trained embeddings: At this point we can leverage our embedding_index dictionary and our word_index to compute our embedding matrix: We load this embedding matrix into an Embedding layer.

These input sequences should be padded so that they all have the same length in a batch of input data (although an Embedding layer is capable of processing sequence of heterogenous length, if you don't pass an explicit input_length argument to the layer).

Finally we can then build a small 1D convnet to solve our classification problem: This model reaches 95% classification accuracy on the validation set after only 2 epochs.

In general, using pre-trained embeddings is relevant for natural processing tasks were little training data is available (functionally the embeddings act as an injection of outside information which might prove useful for your model).

Implementing a CNN for Text Classification in TensorFlow

The model presented in the paper achieves good classification performance across a range of text classification tasks (like Sentiment Analysis) and has since become a standard baseline for new text classification architectures.

Also, the dataset doesn’t come with an official train/test split, so we simply use 10% of the data as a dev set.

won’t go over the data pre-processing code in this post, but it is available on Github and does the following: The network we will build in this post looks roughly as follows:

Next, we max-pool the result of the convolutional layer into a long feature vector, add dropout regularization, and classify the result using a softmax layer.

Because this is an educational post I decided to simplify the model from the original paper a little: It is relatively straightforward (a few dozen lines of code) to add the above extensions to the code here.

To allow various hyperparameter configurations we put our code into a TextCNN class, generating the model graph in the init function.

To instantiate the class we then pass the following arguments: We start by defining the input data that we pass to our network: tf.placeholder creates a placeholder variable that we feed to the network when we execute it at train or test time.

TensorFlow’s convolutional conv2d operation expects a 4-dimensional tensor with dimensions corresponding to batch, width, height and channel.

The result of our embedding doesn’t contain the channel dimension, so we add it manually, leaving us with a layer of shape [None, sequence_length, embedding_size, 1].

Because each convolution produces tensors of different shapes we need to iterate through them, create a layer for each of them, and then merge the results into one big feature vector.

'VALID' padding means that we slide the filter over our sentence without padding the edges, performing a narrow convolution that gives us an output of shape [1, sequence_length - filter_size + 1, 1, 1].

Performing max-pooling over the output of a specific filter size leaves us with a tensor of shape [batch_size, 1, 1, num_filters].

Once we have all the pooled output tensors from each filter size we combine them into one long feature vector of shape [batch_size, num_filters_total].

Using the feature vector from max-pooling (with dropout applied) we can generate predictions by doing a matrix multiplication and picking the class with the highest score.

Here, tf.nn.softmax_cross_entropy_with_logits is a convenience function that calculates the cross-entropy loss for each class, given our scores and the correct input labels.

The allow_soft_placement setting allows TensorFlow to fall back on a device with a certain operation implemented when the preferred device doesn’t exist.

When we instantiate our TextCNN models all the variables and operations defined will be placed into the default graph and session we’ve created above.

Checkpoints can be used to continue training at a later point, or to pick the best parameters setting using early stopping.

Let’s now define a function for a single training step, evaluating the model on a batch of data and updating the model parameters.

We write a similar function to evaluate the loss and accuracy on an arbitrary data set, such as a validation set or the whole training set.

We iterate over batches of our data, call the train_step function for each batch, and occasionally evaluate and checkpoint our model: Here, batch_iter is a helper function I wrote to batch the data, and tf.train.global_step is convenience function that returns the value of global_step.

Our training script writes summaries to an output directory, and by pointing TensorBoard to that directory we can visualize the graph and the summaries we created.

Running the training procedure with default parameters (128-dimensional embeddings, filter sizes of 3, 4 and 5, dropout of 0.5 and 128 filters per filter size) results in the following loss and accuracy plots (blue is training data, red is 10% dev data).

How to Make an Image Classifier - Intro to Deep Learning #6

Only a few days left to signup for my Decentralized Applications course! We're going to make our own Image Classifier for cats & dogs ..

Text Analytics - Ep. 25 (Deep Learning SIMPLIFIED)

Unstructured textual data is ubiquitous, but standard Natural Language Processing (NLP) techniques are often insufficient tools to properly analyze this data.

Train an Image Classifier with TensorFlow for Poets - Machine Learning Recipes #6

Monet or Picasso? In this episode, we'll train our own image classifier, using TensorFlow for Poets. Along the way, I'll introduce Deep Learning, and add context ...

Word Embedding Explained and Visualized - word2vec and wevi

This is a talk I gave at Ann Arbor Deep Learning Event (a2-dlearn) hosted by Daniel Pressel et al. I gave an introduction to the working mechanism of the ...

Sequential Model - Keras

Here we go over the sequential model, the basic building block of doing anything that's related to Deep Learning in Keras. (this is super important to understand ...

Sequence Models and the RNN API (TensorFlow Dev Summit 2017)

In this talk, Eugene Brevdo discusses the creation of flexible and high-performance sequence-to-sequence models. He covers reading and batching sequence ...

Deep Learning Approach for Extreme Multi-label Text Classification

Extreme classification is a rapidly growing research area focusing on multi-class and multi-label problems involving an extremely large number of labels.

Neural Models for Information Retrieval

In the last few years, neural representation learning approaches have achieved very good performance on many natural language processing (NLP) tasks, such ...

High-Accuracy Neural-Network Models for Speech Enhancement

In this talk we will discuss our recent work on AI techniques that improve the quality of audio signals for both machine understanding and sensory perception.

SPACY'S ENTITY RECOGNITION MODEL: incremental parsing with Bloom embeddings & residual CNNs

spaCy v2.0's Named Entity Recognition system features a sophisticated word embedding strategy using subword features and "Bloom" embeddings, a deep ...