AI News, Microsoft is taking autocorrect to the next level

Microsoft is taking autocorrect to the next level

The standard version didn’t pick up on three missing determiners while the prototype Windows ML-powered version highlighted the three nouns that were missing their determiners.  “We’ve trained the grammar checker and it now can suggest corrections that I can take action on and fix,”

“We’re running this on Windows ML, which enables Word to build an experience that is low-latency, has high scalability because there are a lot of Word users out there, and it can work offline.”  The big news here is that Microsoft’s products, such as Word, are now relying on machine learning algorithms running locally on a Windows 10 device, and not in the cloud.

And because these algorithms are running locally within apps installed on a device, the results are extremely quick.  Group program manager Kam VedBrat introduced the new Windows ML application programming interface (API) in March, a platform that enables developers to implement pre-trained machine learning models in their apps and experiences.

Microsoft working on machine-learning grammar features for Word

In its day two keynote for Build 2018, Microsoft showed off a brief demonstration of how it is planning to bring the power of machine learning to Word.

On stage, Kevin Gallo, head of the Windows developer platform, displayed an older version of Word compared to the newer version with a machine learning-powered grammar checker built in.

The demo was relatively short and there's no word on when to expect the Windows ML features in Word, but it's an interesting, albeit small, look at how Microsoft is using machine learning in its own apps.

Create text analytics models in Azure Machine Learning Studio

In a text analytics experiment, you would typically: In this tutorial, you learn these steps as we walk through a sentiment analysis model using Amazon Book Reviews dataset (see this research paper “Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification” by John Blitzer, Mark Dredze, and Fernando Pereira;

You can find experiments covered in this tutorial at Azure AI Gallery: Predict Book Reviews Predict Book Reviews - Predictive Experiment We begin the experiment by dividing the review scores into categorical low and high buckets to formulate the problem as two-class classification.

The cleaning reduces the noise in the dataset, help you find the most important features, and improve the accuracy of the final model.

You can also use custom C# syntax regular expression to replace substrings, and remove words by part of speech: nouns, verbs, or adjectives.

In this tutorial, we set N-gram size to 2, so our feature vectors include single words and combinations of two subsequent words.

This approach adds weight of words that appear frequently in a single record but are rare across the entire dataset.

Therefore, we add Extract N-Gram Features module to the scoring branch of the experiment, connect the output vocabulary from training branch, and set the vocabulary mode to read-only.

We also disable the filtering of N-grams by frequency by setting the minimum to 1 instance and maximum to 100%, and turn off the feature selection.

After the text column in test data has been transformed to numeric feature columns, we exclude the string columns from previous stages like in training branch.

It uses the learned N-gram vocabulary to transform the text to features, and trained logistic regression model to make a prediction from those features.

To set up the predictive experiment, we first save the N-gram vocabulary as dataset, and the trained logistic regression model from the training branch of the experiment.

That way, the web service does not request the label it is trying to predict, and does not echo the input features in response.

Applying NLP in Sentiment Classification Entity Recognition Using Azure ML and the Team Data Science Process

We recently published two real-world scenarios demonstrating how to use Azure Machine Learning alongside the Team Data Science Process (TDSP) to execute AI projects involving Natural Language Processing (NLP) use-cases, namely, for sentiment classification and entity extraction.

The samples use a variety of Azure data platforms, such as Data Science Virtual Machines (DSVMs) to train DNN models for sentiment classification and entity extraction using GPUs, and HDInsight Spark for data processing and word embedding model training at scale.

The samples show how domain-specific word embeddings generated using domain-specific and labeled training data sets outperforms generic word embeddings trained on general and unlabeled data, which leads to improved accuracy in classification and entity extraction tasks.

The data used in this project is the Sentiment140 dataset, which contains the text of the tweets (with emoticons removed) along with the polarity of each of the tweets (positive and negative, neutral tweets are removed for this project).

Skip-gram is a shallow neural network taking the target word encoded as a one hot vector as input and using it to predict nearby words.

2014 tries to overcome the weakness of the Word2vec algorithm whereby words with similar contexts and opposite polarity can have similar word vectors.

We are using a simplified variant of SSWE here implemented as a convolutional neural network (CNN) designed to optimize the cross-entropy of sentiment classes as the loss function.

Our use case scenario focuses on how a large amount of unstructured data corpus such as Medline PubMed abstracts can be analyzed to train a word embedding model.

Our results show that the biomedical entity extraction model training on the domain-specific word embedding features outperforms the model trained on the generic feature type (using Google News data).

We also showed that customization of the word embedding approach or the use of domain-specific data-sets for word embeddings can improve the accuracy of subsequent tasks such as classification or entity extraction.

Hello World - Machine Learning Recipes #1

Six lines of Python is all it takes to write your first machine learning program! In this episode, we'll briefly introduce what machine learning is and why it's ...

OCR, Deep Learning & Algorithms: Building Tanmay's Word Search using Tesseract and OCR.Space!

I hope you enjoyed this tutorial! If you did, please make sure to leave a like, comment, and subscribe! It really does help out a lot! Links: tWordSearch Swift Script: ...

Natural Language Processing With Python and NLTK p.1 Tokenizing words and Sentences

Natural Language Processing is the task we give computers to read and understand (process) written text (natural language). By far, the most popular toolkit or ...

TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Python | Edureka

TensorFlow Training - ) This Edureka TensorFlow Tutorial video (Blog: will help .

MarI/O - Machine Learning for Video Games

MarI/O is a program made of neural networks and genetic algorithms that kicks butt at Super Mario World. Source Code: "NEAT" ..

Sentiment Analysis in 4 Minutes

Link to the full Kaggle tutorial w/ code: Sentiment Analysis in 5 lines of ..

Lecture 4: Word Window Classification and Neural Networks

Lecture 4 introduces single and multilayer neural networks, and how they can be used for classification purposes. Key phrases: Neural networks. Forward ...

Lecture 3 | GloVe: Global Vectors for Word Representation

Lecture 3 introduces the GloVe model for training word vectors. Then it extends our discussion of word vectors (interchangeably called word embeddings) by ...

How to Make a Simple Tensorflow Speech Recognizer

In this video, we'll make a super simple speech recognizer in 20 lines of Python using the Tensorflow machine learning library. I go over the history of speech ...

How to Predict Stock Prices Easily - Intro to Deep Learning #7

We're going to predict the closing price of the S&P 500 using a special type of recurrent neural network called an LSTM network. I'll explain why we use ...