Automated Machine Learning — A Paradigm Shift That Accelerates Data Scientist Productivity @ Airbnb

These repetitive tasks include, but are not limited to: Exploratory Data Analysis: Visualizing data before embarking on a modeling exercise is a crucial step in machine learning.

Automating tasks such as plotting all your variables against the target variable being predicted as well as computing summary statistics can save lots of time.

There is a growing community around creating tools that automate the tasks outlined above, as well as other tasks that are part of the machine learning workflow.

There is no universally agreed upon scope of AML, however the folks who routinely organize the AML workshop at the annual ICML conference define a reasonable scope on their website, which includes automating all of the repetitive tasks defined above.

Our view is that it is difficult to perform wholesale replacement of a data scientist with an AML framework, because most machine learning problems require domain knowledge and human judgement to set up correctly.

Also, we have found AML tools to be most useful for regression and classification problems involving tabular datasets, however the state of this area is quickly advancing.

We have experimented with the following tools: At Airbnb, we use machine learning to build customer lifetime value models (LTV) for guests and hosts.

The reason for our biases were the following: Being aware of our bias, we fed our raw training data through an AML platform to perform a sanity check and to benchmark our model’s error.

It turned out that the AML platform tested a plethora of alternate feature engineering steps as well as performed more rigorous hyper-parameter tuning that we did not have time to explore manually.

AML is a powerful set of techniques for faster data exploration as well as improving model accuracy through model tuning and better diagnostics.

