AI News, Zero-overhead scalable machine learning

Zero-overhead scalable machine learning

By Peter Zhokhov, Senior Data Scientist We analyze the complexity overhead and a learning curve associated with the transition from quick-and-dirty machine learning experiments to large-scale production-grade models with the recently released Amazon SageMaker and open-source project Studio.ML.

With the number of machine learning experiments and models growing, data science teams in big and small companies realize the need for a unifying framework, one that lets data scientists build on top of their own models and experiments as well as leverage the efforts of other community teams and members in an efficient manner.

concept, however, faces multiple challenges (mainly related to the usage of large datasets, large amounts of computing resources and custom hardware).

In this blog series, we’ll build several models ranging from very simple toy examples to state-of-the-art deep neural networks presented at NIPS 2017.

The story revolves around a fairly simple exercise (chosen from the SageMaker getting started guide to ensure my lack of knowledge with the SageMaker is not affecting the results) –

Each step consists of assigning data samples (in our case, 28×28 grayscale MNIST images -> 784 dimensional vector samples) to clusters, and then moving the cluster centers to the averaged coordinates of data samples in each cluster.

The first cell deals with imports and downloading the MNIST data, while the second cell converts and uploads the data to the s3 bucket (the SageMaker training routine assumes data resides within S3).

Studio.ML gives you all of the above (serving, custom hardware in the cloud, hyperparameter optimization, model provenance) without leaving the comfort zone of the locally (or where ever your preference is)-running jupyter notebook or python command line.

K-Means with Studio.ML Let us install studioml package via pip install studioml Then, we use the same jupyter notebook as in the last exercise (K-means with sklearn), and add a single line to the imports section that imports cell magics from studio:

To start an experiment with studio, we simply add a cell magic %%studio_run to the notebook cell: Technically, returns all the variables created in the cell, so I also erase train_set and test_set variables to prevent them being returned)

The link in the beginning of the experiment sends us to a central repo of experiments, to the experiment page that shows experiment progress, artifacts, and list of python packages necessary to reproduce the experiment.

The code in the cell runs in 6 minutes (slightly longer than plain sklearn due to, mainly, compressing, storing in the cloud, and returning the validation data;

The cool part is that once the training is complete, the rest of the notebook code is exactly the same as it used to be, including prediction and displaying cluster centers.

We then run the following command: studio serve 1513115524_jupyter_7afb38f0-9918-48b8-9921-0d29f44f421d – This command will serve the model locally, so in our notebook the following command generates the predictions:

In contrast, Studio.ML can seamlessly extend jupyter notebooks to provide the experiment provenance, training and / or inference on custom cloud compute (including spot instances) and serving.

The Fader Networks require a lot of computational resources to train, and as such, they are a tempting and yet difficult to chew fruit for data scientists outside large corporations such as Google or Facebook, which makes them a good demo for the full power of machine learning provenance frameworks such as the SageMaker or Studio.ML.


Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale.

During a workshop, you'll explore various data sets, create model training jobs using SageMaker's hosted training feature, and create endpoints to serve predictions from your models using SageMaker's hosted endpoint feature.

In order to complete this workshop you'll need an AWS Account, and an AWS IAM user in that account with at least full permissions to the following AWS services: Use Your Own Account: The code and instructions in this workshop assume only one student is using a given AWS account at a time.

You can work around these by appending a unique suffix to the resources that fail to create due to conflicts, but the instructions do not provide details on the changes required to make this work.

Use a personal account or create a new AWS account for this workshop rather than using an organization’s account to ensure you have full access to the necessary services and to ensure you do not leave behind any resources from the workshop.

Do NOT attempt to use a locally installed AWS CLI during a live workshop because there is insufficient time during a live workshop to resolve related issues.


This repository contains example notebooks that show how to apply machine learning and deep learning in Amazon SageMaker These examples provide a gentle introduction to machine learning concepts as they are applied in practical use cases across a variety of sectors.

These examples introduce SageMaker's hyperparameter tuning functionality which helps deliver the best possible predictions by running a large number of training jobs to determine which hyperparameter values are the most impactful.

They cover a broad range of topics and will utilize a variety of methods, but aim to provide the user with sufficient insight or inspiration to develop within Amazon SageMaker.

PipelineAI Monthly Community Dev Sync - Jan 2018 - AWS SageMaker + Kubernetes + TensorFlow + GPUs

Title PipelineAI Monthly Community Dev Sync (Online) Recordings This event will be recorded and posted .

DataOps Machine Learning in Production by Stepan Pushkarev, CTO of

Awesome video "DataOps Machine Learning in Production"! Learning Points: How to design your ML application to be production ready from day one How to ...

PipelineAI: High Performance Distributed TensorFlow AI + GPU + Model Optimizing Predictions

Highlights We will each build an end-to-end, continuous TensorFlow AI model training and deployment pipeline on our own GPU-based cloud ..

DSS LA | Clustering YouTube: A Top Down and Bottom up Approach

Presented by Jonathan Morra VP Data Science at ZEFR. Next DSS LA on 9.13 ...