AI News, Meet Michelangelo: Uber’s Machine Learning Platform

Meet Michelangelo: Uber’s Machine Learning Platform

Year in Review: 2017 Highlights from Uber Open Source

Year in Review: 2017 Highlights from the Uber Engineering Blog

Growing the Data Visualization Community with v5

Build an end-to-end churn prediction model

Churn prediction is one of the most well known applications of machine learning and data science in the Customer Relationship Management (CRM) and Marketing fields. Simply

Churn applications are common in several sectors: To read more about how Dataiku’s customers fight churn, feel free to consult our Churn and Lifetime Value success stories.

by opening the events dataset: it contains a user id, a timestamp, an event type, a product id and a seller id. For

Not very surprisingly, the most common event type is viewing a product, followed by viewing a seller page (remember it is a market place e-business company).

When building a churn prediction model, a critical step is to define churn for your particular problem, and determine how it can be translated into a variable that can be used in a machine learning model.

For this tutorial, we’ll use a simple definition: a churner is someone who hasn’t taken any action on the website for 4 months.

We have to use a short time frame here, because our dataset contains a limited history (1 year and 3 months of data).

Note that you may have been able to convert the visual join recipe into an editable SQL code recipe, by navigating to the output section (from the left bar) of the recipe, clicking on View query, then Convert to SQL recipe:

The idea here is to join the previous list of “best” customers with the ones that will buy something in the next four months (found using the subquery “loyal”). The

churners are actually flagged using the CASE statement: if the best customers don’t make a purchase in the next 4 months, they will not be found in the list built by the subquery, hence the NULL value.

This marks the end of the first big step: defining what churn is, and translating it into an actual variable that can be used by a machine learning algorithm.

make sure the model will remain robust over time, and to replicate what is going to happen when using it (new records arrive on a regular basis), we can create

we need to keep 4 months worth of future data to create the target variable in our training set (did the customer repurchase?), the

target will be generated from the data in the [T, T + 4 months] time interval, and the train features will use values in the [T - 4 months, T] interval, this way data “from

we originally choose the reference date to be 2014-08-01, let’s choose T as 2014-12-01 for our test set to stay aligned with what we have already done.

on the Administration menu on the top right of your screen, then select Settings on the top right and Variables on the top left: The content of the variables editor is the following: The first parameter is our reference date, when we want to actually score our clients.

used previously the default random 80% (train) / 20% (test) split, but this assumes that our flow and model are time-independent, an assumption clearly violated in

Note that the performance (here measured via the AUC) decreased a lot, even though we are using more data (because there is no train split, we have 1.25 times more data in our train set). The

reason of this decrease is the combination of 2 factors: This issue can be solved by designing smarter features, like count of product seen in the past “k” weeks from the reference date, or by creating ratios such as the total

can then transform the code used to generate the train_enriched dataset this way: This SQL recipe looks rather cumbersome, but if we look at it more in details, we see that we generated three kinds of variables: By adapting in the same way the test_enriched dataset (modifying table names and dates in the query), you should be able to retrain your model, and have performances similar to this:

Build and deploy forecasting models with Azure Machine Learning

In this article, learn how to use Azure Machine Learning Package for Forecasting (AMLPF) to quickly build and deploy a forecasting model.

The workflow is as follows: Consult the package reference documentation for the full list of transformers and models as well as the detailed reference for each module and class.

The example follows the workflow: Download the notebook to run the code samples described herein yourself.

The machine learning forecasting examples in the follow code samples rely on the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales.

These dependencies must be imported for the following code samples: This code snippet shows the typical process of starting with a raw data set, in this case the data from Dominick's Finer Foods.

To model the time series, you need to extract the following elements from this dataframe: The data contains approximately 250 different combinations of store and brand in a data frame.

You can use the TimeSeriesDataFrame class to conveniently model multiple series in a single data structure using the grain.

Internal package functions use group to build a single model from multiple time series if the user believes this grouping helps improve model performance.

In the TimeSeriesDataFrame representation, the time axis and grain are now part of the data frame index, and allow easy access to pandas datetime slicing functionality.

Check to see if the series are regular, meaning that they have a time index sampled at regular intervals, using the check_regularity_by_grain function.

The estimators in AMLPF follow the same API as scikit-learn estimators: a fit method for model training and a predict method for generating forecasts.

Forecast sales on test dataSimilar to the fit method, you can create predictions for all 249 series in the testing data set with one call to the predict function.

One model for a group of series often uses the data from longer series to improve forecasts for short series.

Some machine learning models were able to take advantage of the added features and the similarities between series to get better forecast accuracy.

To score a large dataset, use the parallel scoring mode to submit multiple web service calls, one for each group of data.

Forecasting Time Series With R

We’ll show how to explore time series data, choose an appropriate modeling method and deploy the model in DSS.

Second, there is a yearly cycle with the lowest number of passengers occurring around the new year and the highest number of passengers during the late-summer.

To start a notebook, I go back to the flow, click on the international_airline_passengers_prepared data set, click on Lab, New Code Notebook, R, and then Create.

The forecast library has the functions we need for training models to predict time series.

Now that we’ve loaded our data, let’s create a time series object using the ts() function.

For us, these values are the number of international passengers, 1949 (the year for which the measurements begin) and a frequency of 12 (months in a year).

In general, it’s good practice to test several different modeling methods and choose the method that provides the best performance.

The ets() function in the forecast package fits exponential state smoothing (ETS) models.

The forecast is shown in blue with the grey area representing a 95% confidence interval.

Maybe this is because of a better fit to the data, but let’s train a third model before doing a model comparison.

AIC is common method for determining how well a model fits the data, while penalizing more complex models.

Now we have everything we need to deploy the model onto DSS: the code to create the forecast for the next 24 months and the code to convert the result into a data.frame.

Ensure that international_airline_passengers_prepared dataset is the input dataset, and create a new managed dataset, forecast, for the output of the recipe.

We can optimize the code in the recipe to only run the portions that will output to the forecast dataset, but for now run the recipe and then return to the Flow where we see our newly created dataset.

API Endpoints Model Deployment

NOTE: We are now rolling out our next generation of API Endpoints that replaces this current feature.

Domino lets you publish R or Python models as web services and invoke them using our REST API, so you can easily integrate your data science projects into existing applications and business processes without involving engineers or devops resources.

Because the script is only sourced once — at publish time — any more expensive initialization can happen up front, rather than happening on each call.

Once the request is received, Domino will take care of converting the inputs to the proper types in the language of your endpoint function: Updating your Endpoint You can publish a new version of the endpoint at any time.

Python The model is first trained using, and stored at results/classifier.pkl: The endpoint is bound to the predict(...) function, defined in (which loads and uses results/classifier.pkl): R

The model is trained using train.R, and stored at classifier.Rda: The endpoint is bound to the predictQuality(...) function, defined in predict.R (which loads and uses classifier.Rda): Example API Request In either case, once published, a request might look something like this: Note that the v1 in the URL denotes the version of the Domino API and not your API endpoint.

The model's output is returned in the 'result' object while the 'release' object contains information about the version of your API endpoint and the file and function which powers it.

Nested dataYou can send data of varying structure to your endpoint by nesting it inside of an array or JSON object which occupies a single element of the 'parameters' array.

For instance, if your endpoint function expects a single argument, then it can accept as well as or even To access the elements in Python, use standard list and dictionary access syntax.

Build, Train and Deploy ML Models at Scale with Amazon SageMaker

Learn more at - Amazon SageMaker is a fully-managed service that enables data scientists and developers to quickly and easily build, ..

Regression Training and Testing - Practical Machine Learning Tutorial with Python p.4

Welcome to part four of the Machine Learning with Python tutorial series. In the previous tutorials, we got our initial data, we transformed and manipulated it a bit ...

Ramesh Sampath | Build Data Apps by Deploying ML Models as API Services

PyData SF 2016 Ramesh Sampath | Build Data Apps by Deploying ML Models as API Services As data scientists, we love building models using IPython ...

Deploying Python Machine Learning Models in Production | SciPy 2015 | Krishna Sridhar

Shawn Scully: Production and Beyond: Deploying and Managing Machine Learning Models

PyData NYC 2015 Machine learning has become the key component in building intelligence-infused applications. However, as companies increase the number ...

Live Coding with AWS | Training and Deploying AI with Amazon SageMaker

Check out the upcoming schedule, previous recordings, and links to the resources discussed at - Build a model to predict a time series ..

Building Robust Machine Learning Models

Modern machine learning libraries make model building look deceptively easy. An unnecessary emphasis (admittedly, annoying to the speaker) on tools like R, ...

Pickling and Scaling - Practical Machine Learning Tutorial with Python p.6

In the previous Machine Learning with Python tutorial we finished up making a forecast of stock prices using regression, and then visualizing the forecast with ...

Adrin Jalali - The path between developing and serving machine learning models.

Description As a data scientist, one of the challenges after you develop and train your model, is to deploy it in production where other systems would use the ...

The Anatomy of a Production-Scale Continuously-Training Machine Learning Platform

The Anatomy of a Production-Scale Continuously-Training Machine Learning Platform Denis Baylor (Google Inc.) Eric Breck (Google Inc.) Heng-Tze Cheng ...