AI News, Machine Learning Blog Software Development News
- On Friday, June 8, 2018
- By Read More
Machine Learning Blog Software Development News
The Sentiment Analysis is an application of Natural Language Processing which targets on the identification of the sentiment (positive vs negative vs neutral), the subjectivity (objective vs subjective) and the emotional states of the document.
I worked on the particular project for over 9 months and used several different statistical methods and techniques under the supervision of Professors Tsiamyrtzis and Kakadiaris.
During my thesis, I had the opportunity learn about new machine learning techniques but also bumped into some interesting and non-obvious matters.
In this article I discuss the things that I found most interesting while working on the Sentiment Analysis project and I provide some tips that you should have in mind while working on similar Natural Language Processing problems.
This technique uses dictionaries of words annotated with their semantic orientation (polarity and strength) and calculates a score for the polarity of the document.
This means that you must first gather a dataset with examples for positive, negative and neutral classes, extract the features/words from the examples and then train the algorithm based on the examples.
On the other hand Learning based techniques deliver good results nevertheless they require obtaining datasets and require training.
Syntactic techniques can deliver better accuracy because they make use of the syntactic rules of the language in order to detect the verbs, adjectives and nouns.
Unfortunately such techniques heavily depend on the language of the document and as a result the classifiers can’t be ported to other languages.
Statistical techniques have 2 significant benefits over the Syntactic ones: we can use them in other languages with minor or no adaptations and we can use Machine Translation of the original dataset and still get quite good results.
Training the classifier to detect only the 2 classes forces several neutral words to be classified either as positive or negative something that leads to over fitting.
Usually Binarized versions (occurrences clipped to 1) of the algorithms perform better than the ones that use multiple occurrences.
For example you might find that Max Entropy with Chi-square as feature selection is the best combination for restaurant reviews, while for twitter the Binarized Naïve Bayes with Mutual Information feature selection outperforms even the SVMs.
Particularly in case of twitter, avoid using lexicon based techniques because users are known to use idioms, jargons and twitter slangs what heavily affect the polarity of the tweet.
Ask yourself if it delivers the results that you expect or if it makes your algorithm unnecessary complicated and difficult to explain its results.
One of the most powerful techniques for building highly accurate classifiers is using ensemble learning and combining the results of different classifiers.
- On Wednesday, January 16, 2019
Import Data and Analyze with Python
Python programming language allows sophisticated data analysis and visualization. This tutorial is a basic step-by-step introduction on how to import a text file ...
K Nearest Neighbor (kNN) Algorithm | R Programming | Data Prediction Algorithm
In this video I've talked about how you can implement kNN or k Nearest Neighbor algorithm in R with the help of an example data set freely available on UCL ...
Robert Meyer - Analysing user comments with Doc2Vec and Machine Learning classification
Description I used the Doc2Vec framework to analyze user comments on German online news articles and uncovered some interesting relations among the data ...
Scikit Learn Machine Learning SVM Tutorial with Python p. 2 - Example
In this machine learning tutorial, we cover a very basic, yet powerful example of machine learning for image recognition. The point of this video is to get you ...
LDA Topic Models
LDA Topic Models is a powerful tool for extracting meaning from text. In this video I talk about the idea behind the LDA itself, why does it work, what are the free ...
Sentiment Analysis: Feelings, not Facts
Brief explanation of Sentiment Analysis along with a few basic examples.
Linear Regression With R | Edureka
Watch Sample Class recording: This course is designed for professionals who aspire to learn 'R' language for Analytics. The course starts ..
Sentiment Based Movie Rating System
DATA MINING It is the process to discover the knowledge or hidden pattern form large databases. The overall goal of data mining is to extract and obtain ...
Weka Text Classification for First Time & Beginner Users
59-minute beginner-friendly tutorial on text classification in WEKA; all text changes to numbers and categories after 1-2, so 3-5 relate to many other data analysis ...
Matti Lyra - Evaluating Topic Models
Description Unsupervised models in natural language processing (NLP) have become very popular recently. Word2vec, GloVe and LDA provide powerful ...