AI News, What is Minimum Viable (Data) Product?
What is Minimum Viable (Data) Product?
A couple of months ago I left Pivotal to join idealo.de (the leading price comparison website in Europe and one of the largest portals in the German e-commerce market) to help them integrating machine learning (ML) into their products.
Besides the usual tasks like building out the data science team, setting up the infrastructure and many more administrative stuff, I had to define the ML powered product roadmap.
Its central idea is that by building products or services iteratively by constantly integrating customer feedback you can reduce the risk that the product/service will fail (build-measure-learn).
An integral part of the build-measure-learn concept is the MVP which is essentially a “version of a new product which allows a team to collect the maximum amount of validated learning about customers with the least effort”.
What they did was to take pictures of their apartment, put it online on a simple website (see image below) and soon they had three paying guest for the duration of the conference.
This little test gave them valuable insights that people would be willing to pay to stay in someone else’s home rather than a hotel and that not just recent college grads would sign up.
They raised $120m from investors to create a well designed juice-squeezing machine which they released after some time of development for a very high price (initially it was originally priced at $699 and then subsequently reduced to $399).
Maybe some of you might know but the company already shutdown because it didn’t realize that you didn’t really need a high-priced juice-squeezing machine to squeeze out the juice packs.
The MVP concept can also be applied to machine learning because at the end of the day, machine learning is also part of the overall product or the end product itself.
However, for most of the classification problems out there, unless they are not specialized problems like the ones that we face in computer vision or natural language understanding, this is not the best way to solve this kind of problem.
Moreover, you fairly spend a huge amount of time in tuning the parameters of a neural network where the gain in model performance is negligible.
Although in many real-world applications the assumption of linearity is unrealistic, logistic regression does fairly well and serves as a good benchmark aka baseline model.
Essentially, there are many examples for data products such as chatbots, spam detectors and many others — the list is long (check out Neal Lathia’s excellent article for more ML product examples).
Then after testing that users are really inclined to click on those recommendations (sometimes they have to get used to this first especially if it’s a new product feature) and eventually also might buy those recommended items, we could try it out with more sophisticated approaches like collaborative filtering techniques.
For example, we could create a recommendation based on users who bought this item are also interested in those items or users who viewed this item are also interested in those items.
Moreover to give a clearer understanding of what I mean with MVP for machine learning products, I discussed three major dimensions that I find critical for a good MVP data product: Hopefully, in your next machine learning project you will keep those three dimensions in mind as well.
#3: Developing a Machine Learning Model from Start to Finish
At a high level, building a good ML model is like building any other product: You start with ideation, where you align on the problem you’re trying to solve and some potential approaches.
Since data is an integral part of ML, we need to layer data on top of this product development process, so our new process looks as follows: The goal of this phase is to align as a team on the key problem the model solves, the objective function and the potential inputs to the model.
“Apple”), you need to separate the brand chatter from the general chatter (about the fruit) and run it through a sentiment analysis model, and all that before you can begin to build your prototype.
Say your product is a movie recommendation tool: You may want to only open access to a handful of users but provide a complete experience for each user, in which case your model needs to rank every movie in your database by relevance to each of the users.
You may get unsatisfactory results from your data sourcing efforts and have to rethink the approach, or productize the model and see that it works so poorly with production data that you need to go back to prototyping etc.
For example, labeling hundreds or thousands of data points with the right categories as input for a classification algorithm and then testing whether the output of the classification model is correct.
Supervised learning algorithms are trained using labeled examples, such as an input where the desired output is known.
The learning algorithm receives a set of inputs along with the corresponding correct outputs, and the algorithm learns by comparing its actual output with correct outputs to find errors.
Through methods like classification, regression, prediction and gradient boosting, supervised learning uses patterns to predict the values of the label on additional unlabeled data.
Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering and singular value decomposition.
How to Tell If Machine Learning Can Solve Your Business Problem
“AI,” “big data,” and “machine learning” are all trending buzzwords, and you might be curious about how they apply to your domain.
These are the kinds of tasks where you have a clear, predefined sequence of steps that is currently being executed by a human, but that could conceivably be transitioned to a machine.
Screening incoming data from an outside data provider for well-defined potential errors is an example of a problem ready for automation.
(For example, hedge funds automatically filtering out bad data in the form of a negative value for trading volume, which can’t be negative.) On the other hand, encoding human language into a structured dataset is something that is just a tad too ambitious for a straightforward set of rules.
For example, researchers at the Univeristy of Pittsburg in the late 1990s evaluated machine learning algorithms for predicting mortality rates from pneumonia.
The algorithms recommended that hospitals send home pneumonia patients who were also asthma sufferers, estimating their risk of death from pneumonia to be lower.
It turned out that the dataset fed into the algorithms did not account for the fact that asthma sufferers had been immediately sent to intensive care, and had fared better only due to the additional attention.
If, in the future, the thing you’re trying to predict changes unexpectedly – and no longer matches prior patterns in the data – the algorithm will not know what to make of it.
Examples of good machine learning problems include predicting the likelihood that a certain type of user will click on a certain kind of ad, or evaluating the extent to which a piece of text is similar to previous texts you have seen.
Bad examples include predicting profits from the introduction of a completely new and revolutionary product line, or extrapolating next year’s sales from past data, when an important new competitor just entered the market.
Getting Started with Apache Spark™ on Databricks
In this section, we will execute two different linear regression models using different regularization parameters and determine its efficacy.
For example, the code below takes the first model (modelA) and shows you both the label (original sales price) and prediction (predicted sales price) based on the features (population).
As Databricks supports Python pandas and ggplot, the code below creates a linear regression plot using Python Pandas DataFrame (pydf) and ggplot to display the scatterplot and the two regression models.
- On Thursday, September 19, 2019
What is machine learning and how to learn it ?
Machine learning is just to give trained data to a program and get better result for complex problems. It is very close to data ..
How to Start an AI Startup
How are you supposed to get in on the AI hype? Deep learning has enabled a whole new breed of applications, and there are still so many different ...
Elena Grewal - Before the Model: How Machine Learning Products Start... - MLconf SF 2016
Presentation slides: Before the Model: How Machine ..
How to formulate and solve Machine Learning problems: Sandeep Khurana, Head Analytics, Call Health
UpX Academy is an exclusive ed-tech venture under the umbrella of Tech Mahindra. We provide live, online and interactive courses on big data and Data ...
Learn Machine Learning in 3 Months (with curriculum)
How is a total beginner supposed to get started learning machine learning? I'm going to describe a 3 month curriculum to help you go from beginner to ...
Data Science: Reality vs Expectations ($100k+ Starting Salary 2018)
Skillshare might not like this. You can sign up for a 2 month trial for Skillshare, complete the data science course and then cancel your membership before being ...
9 Cool Deep Learning Applications | Two Minute Papers #35
Machine learning provides us an incredible set of tools. If you have a difficult problem at hand, you don't need to hand craft an algorithm for it. It finds out by itself ...
Where to Start with Machine Learning
Kirk Borne, PhD Principal Data Scientist with Booz Allen Hamilton, shares his approach of "think big, but start small" when ..
How to Make a Data Science Project with Kaggle
It can take a lot of tools to do data science, but Kaggle is a one-stop shop that provides all the tools to share and collaborate on data science projects.
How To Get Started With Machine Learning? | Two Minute Papers #51
I get a lot of messages from you Fellow Scholars that you would like to get started in machine learning and are looking for materials. Below you find a ton of ...