AI News, Meet Michelangelo: Uber’s Machine Learning Platform

Meet Michelangelo: Uber’s Machine Learning Platform

Year in Review: 2017 Highlights from Uber Open Source

Year in Review: 2017 Highlights from the Uber Engineering Blog

Meet Michelangelo: Uber’s Machine Learning Platform

Year in Review: 2017 Highlights from Uber Open Source

Year in Review: 2017 Highlights from the Uber Engineering Blog

Welcoming Peter Dayan to Uber AI Labs

Engineering More Reliable Transportation with Machine Learning and AI at Uber

Turbocharging Analytics at Uber with our Data Science Workbench

Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow

Build an end-to-end churn prediction model

Churn prediction is one of the most well known applications of machine learning and data science in the Customer Relationship Management (CRM) and Marketing fields. Simply

Churn applications are common in several sectors: To read more about how Dataiku’s customers fight churn, feel free to consult our Churn and Lifetime Value success stories.

by opening the events dataset: it contains a user id, a timestamp, an event type, a product id and a seller id. For

Not very surprisingly, the most common event type is viewing a product, followed by viewing a seller page (remember it is a market place e-business company).

When building a churn prediction model, a critical step is to define churn for your particular problem, and determine how it can be translated into a variable that can be used in a machine learning model.

For this tutorial, we’ll use a simple definition: a churner is someone who hasn’t taken any action on the website for 4 months.

We have to use a short time frame here, because our dataset contains a limited history (1 year and 3 months of data).

Note that you may have been able to convert the visual join recipe into an editable SQL code recipe, by navigating to the output section (from the left bar) of the recipe, clicking on View query, then Convert to SQL recipe:

The idea here is to join the previous list of “best” customers with the ones that will buy something in the next four months (found using the subquery “loyal”). The

churners are actually flagged using the CASE statement: if the best customers don’t make a purchase in the next 4 months, they will not be found in the list built by the subquery, hence the NULL value.

This marks the end of the first big step: defining what churn is, and translating it into an actual variable that can be used by a machine learning algorithm.

make sure the model will remain robust over time, and to replicate what is going to happen when using it (new records arrive on a regular basis), we can create

we need to keep 4 months worth of future data to create the target variable in our training set (did the customer repurchase?), the

target will be generated from the data in the [T, T + 4 months] time interval, and the train features will use values in the [T - 4 months, T] interval, this way data “from

we originally choose the reference date to be 2014-08-01, let’s choose T as 2014-12-01 for our test set to stay aligned with what we have already done.

on the Administration menu on the top right of your screen, then select Settings on the top right and Variables on the top left: The content of the variables editor is the following: The first parameter is our reference date, when we want to actually score our clients.

used previously the default random 80% (train) / 20% (test) split, but this assumes that our flow and model are time-independent, an assumption clearly violated in

Note that the performance (here measured via the AUC) decreased a lot, even though we are using more data (because there is no train split, we have 1.25 times more data in our train set). The

reason of this decrease is the combination of 2 factors: This issue can be solved by designing smarter features, like count of product seen in the past “k” weeks from the reference date, or by creating ratios such as the total

can then transform the code used to generate the train_enriched dataset this way: This SQL recipe looks rather cumbersome, but if we look at it more in details, we see that we generated three kinds of variables: By adapting in the same way the test_enriched dataset (modifying table names and dates in the query), you should be able to retrain your model, and have performances similar to this:

Model deployment (Legacy)

NOTE: We are now rolling out our next generation of API Endpoints that replaces this current feature.

Domino lets you publish R or Python models as web services and invoke them using our REST API, so you can easily integrate your data science projects into existing applications and business processes without involving engineers or devops resources.

Because the script is only sourced once — at publish time — any more expensive initialization can happen up front, rather than happening on each call.

Once the request is received, Domino will take care of converting the inputs to the proper types in the language of your endpoint function: Updating your Endpoint You can publish a new version of the endpoint at any time.

Python The model is first trained using, and stored at results/classifier.pkl: The endpoint is bound to the predict(...) function, defined in (which loads and uses results/classifier.pkl): R

The model is trained using train.R, and stored at classifier.Rda: The endpoint is bound to the predictQuality(...) function, defined in predict.R (which loads and uses classifier.Rda): Example API Request In either case, once published, a request might look something like this: Note that the v1 in the URL denotes the version of the Domino API and not your API endpoint.

The model's output is returned in the 'result' object while the 'release' object contains information about the version of your API endpoint and the file and function which powers it.

Nested dataYou can send data of varying structure to your endpoint by nesting it inside of an array or JSON object which occupies a single element of the 'parameters' array.

For instance, if your endpoint function expects a single argument, then it can accept as well as or even To access the elements in Python, use standard list and dictionary access syntax.

Forecasting Time Series With R

We’ll show how to explore time series data, choose an appropriate modeling method and deploy the model in DSS.

Second, there is a yearly cycle with the lowest number of passengers occurring around the new year and the highest number of passengers during the late-summer.

To start a notebook, I go back to the flow, click on the international_airline_passengers_prepared data set, click on Lab, New Code Notebook, R, and then Create.

The forecast library has the functions we need for training models to predict time series.

Now that we’ve loaded our data, let’s create a time series object using the ts() function.

For us, these values are the number of international passengers, 1949 (the year for which the measurements begin) and a frequency of 12 (months in a year).

In general, it’s good practice to test several different modeling methods and choose the method that provides the best performance.

The ets() function in the forecast package fits exponential state smoothing (ETS) models.

The forecast is shown in blue with the grey area representing a 95% confidence interval.

Maybe this is because of a better fit to the data, but let’s train a third model before doing a model comparison.

AIC is common method for determining how well a model fits the data, while penalizing more complex models.

Now we have everything we need to deploy the model onto DSS: the code to create the forecast for the next 24 months and the code to convert the result into a data.frame.

Ensure that international_airline_passengers_prepared dataset is the input dataset, and create a new managed dataset, forecast, for the output of the recipe.

We can optimize the code in the recipe to only run the portions that will output to the forecast dataset, but for now run the recipe and then return to the Flow where we see our newly created dataset.

Build, Train and Deploy ML Models at Scale with Amazon SageMaker

Learn more at - Amazon SageMaker is a fully-managed service that enables data scientists and developers to quickly and easily build, ..

Regression Training and Testing - Practical Machine Learning Tutorial with Python p.4

Welcome to part four of the Machine Learning with Python tutorial series. In the previous tutorials, we got our initial data, we transformed and manipulated it a bit ...

Ramesh Sampath | Build Data Apps by Deploying ML Models as API Services

PyData SF 2016 Ramesh Sampath | Build Data Apps by Deploying ML Models as API Services As data scientists, we love building models using IPython ...

Shawn Scully: Production and Beyond: Deploying and Managing Machine Learning Models

PyData NYC 2015 Machine learning has become the key component in building intelligence-infused applications. However, as companies increase the number ...

Live Coding with AWS | Training and Deploying AI with Amazon SageMaker

Check out the upcoming schedule, previous recordings, and links to the resources discussed at - Build a model to predict a time series ..

Building Robust Machine Learning Models

Modern machine learning libraries make model building look deceptively easy. An unnecessary emphasis (admittedly, annoying to the speaker) on tools like R, ...

Adrin Jalali - The path between developing and serving machine learning models.

Description As a data scientist, one of the challenges after you develop and train your model, is to deploy it in production where other systems would use the ...

Training Custom Object Detector - TensorFlow Object Detection API Tutorial p.5

Welcome to part 5 of the TensorFlow Object Detection API tutorial series. In this part of the tutorial, we will train our object detection model to detect our custom ...

Building, Training and Deploying Custom Algorithms Such as with Amazon SageMaker

Learn more about AWS at - AWS London Summit 2018 - Breakout Session: A number of new open-source Deep Learning frameworks ..

Operationalize your models with Azure Machine Learning - BRK2290

This is the introduction to the new Azure ML features.