AI News, Google Prediction API: a Machine Learning black box for developers

Google Prediction API: a Machine Learning black box for developers

This is my third article on how to build Machine Learning models in the Cloud. I previously explored Amazon Machine Learning and Azure Machine Learning –

Google Prediction API, on the other hand, was released all the way back in 2011, and offers a very stable and simple way to train Machine Learning models via a RESTful interface, although it might seem less friendly if you generally prefer browser interfaces.

am not going to explore the wide range of services offered by Google Cloud Platform, you can easily check the Developers Console out by yourself for free, sign up for the Free Trial offered by Google ($300 in credit to use for 2 months), and check out Cloud Academy’s courses on Google Cloud Platform.

Let me clarify a few basic concepts that will help you specifically with Google Prediction API: On the other hand, your input features (your columns) can contain any type of data, although certain types are easier to work with (i.e.

The dataset is composed of more than 10,000 records, each one defined by 560 input features and one target column, which can take one of the following values: 1, 2, 3, 4, 5 and 6 (walking, walking upstairs, walking downstairs, sitting, standing, laying down).

We are going to build a multi-class model to understand whether a given record of sensor data (generated in real time) can be definitively associated with walking, standing, sitting, laying, etc.

have already gone through the process of manipulating the original dataset to create one single CSV file, since the original dataset has been split into smaller datasets (for training and testing) and input features have been separated from target values.

Eventually, you might use a WebApp Client ID even for a server to server application, but your code will end up being slightly more complicated and you will need to go through the typical oAuth flow (either using your browser or copying and pasting oAuth codes on your terminal).

The Trainedmodels.Insert method expects a body parameter, containing a model ID (that you choose), your model type (classification or regression), and your dataset (either a Cloud Storage location or a set of instances).

With this dataset and the applied data split, here is how the Confusion Matrix looks: It is not too bad, considering the required effort and the total absence of configuration and data normalization.

In case you can’t blindly trust your model – or if you can make more advanced decisions based on you application context – you may want to inspect outputMulti and take your final decision based on each class’

believe Google’s black box reached a pretty high level of abstraction for developers, although a more flexible dataset configuration and better analysis visualization would make the product easier to use for everyone, especially non-coders.

This is especially nice for systems that span long periods of time, so that you can easily adapt your model to new data and conditions, without the need for a new modelling phase.

it achieved the highest accuracy, taking only a couple of minutes for training and an average response time below 1.3 seconds for real-time predictions.


Research datasets regularly disappear, change over time, become obsolete or come without a sane implementation to handle the data format reading and processing.

For this reason, Gensim launched its own dataset storage, committed to long-term support, a sane standardized usage API and focused on datasets for unstructured text processing (no images or audio).

To load a model or corpus, use either the Python or command line interface of Gensim (you'll need Gensim installed first): (generated by based on list.json) Gensim-data is open source software released under the LGPL 2.1 license.

8 Useful Databases to Dig for Data (and 100 more)

You already know that data is the bread and butter of reports and presentations.

To make your life easier, we’ve put together a list of useful databases that you can use to find the data you seek.

This database contains large datasets, consisting virtually all the public data collected by the United Nation.

Some other topics included are: is leading the way in democratizing public sector data and driving innovation.

Citing journal publishers, universities research papers, and other scholarly materials do not just make your content looks smarter, but as well as more trustworthy.

It’s a good place to explore data related to economics, healthcare, food and agriculture, and the automotive industry.

Since our original post, we’ve come across a few more sources of data that might be useful for you: You can also get a crazy amount of datasets and related information from Datamob.

Now that you have an abundance of data on hand, find out how to avoid these common mistakes when transforming them into infographics.

Google's trained Word2Vec model in Python

In this post I’m going to describe how to get Google’s pre-trained Word2Vec model up and running in Python to play with.

It includes word vectors for a vocabulary of 3 million words and phrases that they trained on roughly 100 billion words from a Google News dataset.

you just need to pass in the path to the model file (update the path in the code below to wherever you’ve placed the file).

This is because gensim allocates a big matrix to hold all of the word vectors, and if you do the math… …that’s a big matrix!

If you’d like to browse the 3M word list in Google’s pre-trained model, you can just look at the text files in the vocabulary folder of that project.

I split the word list across 50 files, and each text file contains 100,000 entries from the model.

It looks to be actively supported, and includes all of the features I cared about from Python(x, y) (it includes the Spyder IDE and scikit-learn with all its dependencies).

Then, in the same command window, you can install gensim easily by executing the following on the command line: easy_install -U gensim That should do it!

The GDELT Project

Monitoring nearly the entire world's news media is only the beginning - even the largest team of humans could not begin to read and analyze the billions upon billions of words and images published each day.

Underlying the streams are a vast array of sources, from hundreds of thousands of global media outlets to special collections like 215 years of digitized books, 21 billion words of academic literature spanning 70 years, human rights archives and even saturation processing of the raw closed captioning stream of almost 100 television stations across the US in collaboration with the Internet Archive's Television News Archive.

Intro and Getting Stock Price Data - Python Programming for Finance p.1

Welcome to a Python for Finance tutorial series. In this series, we're going to run through the basics of importing financial (stock) data into Python using the ...

How to Train Your Models in the Cloud

Let's discuss whether you should train your models locally or in the cloud. I'll go through several dedicated GPU options, then compare three cloud options; AWS ... Fast, flexible, and easy-to-use input pipelines (TensorFlow Dev Summit 2018)

Derek Murray discusses, the recommended API for building input pipelines in TensorFlow. In this talk, he introduces the library, and presents some ...

Data Modeling for BigQuery (Google Cloud Next '17)

BigQuery is a different data warehouse, permitting new approaches to data modeling. To get the most out of this system, Dan McClary and Daniel Mintz examine ...

10.4: Loading JSON data from a URL (Asynchronous Callbacks!) - p5.js Tutorial

This video covers begins the process of working with APIs. The first step is just using a URL instead of a local JSON file. How does this change your code?

Easily Analyze and Visualize Large Datasets with Cloud Datalab | Google Cloud Labs

Visualize and explore your big data on Google Cloud. Try the Cloud Datalab: Qwik Start lab here: *Automatically get 3 free credits to take ..

How to Deploy Keras Models to Production

We'll train an image classifier in Keras using a Tensorflow backend, then serve it to the browser using a super simple Flask backend. We can then deploy this ...

[Data on the Mind 2017] Collecting data from the web

Abstract: This workshop will explore different ways to collect data from the web with Python. Have you ever needed to copy and paste hundreds (or thousands!)

deepfakes guide:Fake App 2 2 Tutorial. installation(totally simplified ,model folder included)

if this video has helped you, you can buy me a coffee maybe :)? ETH: 0x1fcbBBa480b4c116cc37924353F93D26365B2303 ..

Predicting the Winning Team with Machine Learning

Can we predict the outcome of a football game given a dataset of past games? That's the question that we'll answer in this episode by using the scikit-learn ...