AI News, A Taxonomy of Data Science

A Taxonomy of Data Science

the goal of machine learning for digit recognition is not to build a theory of ‘3.’ However, in the natural sciences, the ability to predict complex phenomena is different from what most mean by ‘understanding’ or ‘interpreting.’ The predictive power of a model lies in its ability to generalize in the quantitative sense: to make accurate quantitative predictions of data in new experiments.

The interpretability of a model lies in its ability to generalize in the qualitative sense: to suggest to the modeler which would be the most interesting experiments to perform next.

Startups building products without the perspective of multi-year research cycles are often both exploring the data and constructing systems on the data at the same time.

with further exploration we realized that people were using links on images embedded in a page in order to study their own real-time metrics.

The role of model interpretability in data science

In data science, models can involve abstract features in high dimensional spaces, or they can be more concrete, in lower dimensions, and more readily understood by humans;

Last week, I was working on a typical cold-start predictive modeling problem for e-commerce: how do you predict initial sales for new products if you’ve never sold them before?

One can also find similar products that you have sold, making the naïve assumption that the market will respond similarly to this new product, and create predictors based on their historical sales.

(If you don’t understand what this means, don’t worry, that is the whole point of this story.) I created a model that contained both readily-understood temporal features and the more abstract principal component #3, “pc03,” which turned out to be a good predictor.

I was creating it for the benefit of the business, in particular, a team whose job it is to estimate demand and place purchase orders for initial inventory and ongoing replenishment.

This detailed example was meant to highlight a few reasons why a poorer, more complex but interpretable models might be favored: My team often makes these tradeoffs, i.e., choosing an interpretable model when working in close partnership with a business team where we care more about understanding a system or the customer than pure predictive power.

We may use classification trees as we can sit with the business owner and interpret the model together and, importantly, discuss the actionable insights which tie naturally to the decision points, the splits, in those trees.


In the context of machine learning and data science, interpretability refers to how easy it is for humans to understand the processes an algorithm uses to arrive at its outcomes.

DataRobot includes several components that result in highly human-interpretable models: DataRobot works to make sure that models are highly interpretable, minimizing model risk and making it easy for any enterprise to comply with regulations and best practices. 

Interpretability is crucial for trusting AI and machine learning

As machine learning takes its place in many recent advances in science and technology, the interpretability of machine learning models grows in importance.

We are surrounded with applications powered by machine learning, and we’re personally affected by the decisions made by machines more and more every day.

From the mundane to the lifesaving, we ask machine learning models for answers to questions like: These and many other questions are answered by predictive models that most users know very little about.

Data scientists often put emphasis on the prediction accuracy of their models – not on understanding how those predictions are actually made.

However, with the recent advances in machine learning and artificial intelligence, models have become very complex, including complex deep neural networks and ensembles of different models.

Unfortunately, the complexity that gives extraordinary predictive abilities to black box models also makes them very difficult to understand and trust.

Sometimes there are thousands (even millions) of model parameters, there’s no one-to-one relationship between input features and parameters, and often combinations of multiple models using many parameters affect the prediction.

In machine learning, accuracy is measured by comparing the output of a machine learning model to the known actual values from the input data set.

Even if it is sufficiently representative initially, if we consider that the data in the production environment is not stationary, it can become outdated very quickly.

We need to demystify the black box machine learning models and improve transparency and interpretability to make them more trustworthy and reliable.

In a typical machine learning pipeline, we have control over the data set used to train the model, we have control over the model that we use, and we have control over how we assess and deploy those models.

An Engagement-Based Customer Lifetime Value System for E-commerce

Author: Ali Vanderveld, Groupon, Inc. Abstract: A comprehensive understanding of individual customer value is crucial to any successful customer relationship ...

KDD2016 paper 755

Title: An Engagement-Based Customer Lifetime Value System for E-commerce Authors: Ali Vanderveld*, Groupon Angela Han, Groupon Addhyan Pandey, ...

15. Factor Modeling

MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013 View the complete course: Instructor: Peter ..

Lecture 13 | Generative Models

In Lecture 13 we move beyond supervised learning, and discuss generative modeling as a form of unsupervised learning. We cover the autoregressive ...



Lecture 15: Coreference Resolution

Lecture 15 covers what is coreference via a working example. Also includes research highlight "Summarizing Source Code", an introduction to coreference ...

Improving Machine Learning Beyond the Algorithm

User interaction data is at the heart of interactive machine learning systems (IMLSs), such as voice-activated digital assistants, e-commerce destinations, news ...

High-Accuracy Neural-Network Models for Speech Enhancement

In this talk we will discuss our recent work on AI techniques that improve the quality of audio signals for both machine understanding and sensory perception.

Panel Discussion: Applications of Machine Learning

Dr. Yoshua Bengio, Professor, Department of Computer Science & Operations Research, Université de Montréal Dr. Fei-Fei Li, Associate Professor, Computer ...

Lecture 10 | Recurrent Neural Networks

In Lecture 10 we discuss the use of recurrent neural networks for modeling sequence data. We show how recurrent neural networks can be used for language ...