AI News, Machine Learning Tutorial: The Max Entropy Text Classifier

Machine Learning Tutorial: The Max Entropy Text Classifier

Implementing Max Entropy in a standard programming language such as JAVA, C++ or PHP is non-trivial primarily due to the numerical optimization problem that one should solve in order to estimate the weights of the model.

The Max Entropy classifier can be used to solve a large variety of text classification problems such as language detection, topic classification, sentiment analysis and more.

Our target is to use the contextual information of the document (unigrams, bigrams, other characteristics within the text) in order to categorize it to a given class (positive/neutral/negative, objective/subjective etc).

Following the standard bag-of-words framework that is commonly used in natural language processing and information retrieval, let {w1,…,wm} be the m words that can appear in a document.

Then each document is represented by a sparse array with 1s and 0s that indicate whether a particular word wi exists or not in the context of the document.

Our target is to construct a stochastic model, as described by Adam Berger (1996), which accurately represents the behavior of the random process: take as input the contextual information x of a document and produce the output value y.

As in the case of Naive Bayes, the first step of constructing this model is to collect a large number of training data which consists of samples represented on the following format: (xi,yi) where the xi includes the contextual information of the document (the sparse array) and yi its class.

We will use the above empirical probability distribution in order to construct the statistical model of the random process which assigns texts to a particular class by taking into account their contextual information.

[6] Given that: To solve the above optimization problem we introduce the Lagrangian multipliers, we focus on the unconstrained dual problem and we estimate the lamda free variables {λ1,…,λn} with the Maximum Likelihood Estimation method.

[10] Thus given that we have found the lamda parameters of our model, all we need to do in order to classify a new document is use the “maximum a posteriori” decision rule and select the category with the highest probability.

Thus we can select as C the maximum number of active features for all (x,y) pairs within our training dataset:  [16] Making the above adaptations on the standard versions of IIS can help us find the {λ1,…,λn} parameters and build our model relatively quickly.

Machine Learning Lecture 2: Sentiment Analysis (text classification)

In this video we work on an actual sentiment analysis dataset (which is an instance of text classification), for which I also provide Python code (see below).

Lecture 3 | Loss Functions and Optimization

Lecture 3 continues our discussion of linear classifiers. We introduce the idea of a loss function to quantify our unhappiness with a model's predictions, and ...

Entropy Calculation Part 1 - Intro to Machine Learning

This video is part of an online course, Intro to Machine Learning. Check out the course here: This course was designed ..

Lecture 11 | Detection and Segmentation

In Lecture 11 we move beyond image classification, and show how convolutional networks can be applied to other core computer vision tasks. We show how ...

Text Classification 2: Maximum Margin Hyperplane

There will be infinitely many hyperplanes that classify your training documents perfectly. Which one should we pick? The one with the largest ..

Deep Learning Approach for Extreme Multi-label Text Classification

Extreme classification is a rapidly growing research area focusing on multi-class and multi-label problems involving an extremely large number of labels.

How to load a custom dataset with tf.data [Tensorflow]

We look into how to create TFRecords to and handle images from a custom dataset. Later we load these records into a model and do some predictions. Github ...

Lesson 1: Deep Learning 2018

NB: Please go to to view this video since there is important updated information there. If you have questions, use the forums at ..

Signal Analysis using Matlab - A Heart Rate example

A demonstration showing how matlab can be used to analyse a an ECG (heart signal) to determine the average beats per minute. Code available at ...

Text Categorization Discriminative Classifier Part 1

Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for "FAIR USE" for purposes such as criticism, comment, news reporting, ...