AI News, Visualizing Machine Learning Thresholds to Make Better Business Decisions
- On Monday, June 4, 2018
- By Read More
Visualizing Machine Learning Thresholds to Make Better Business Decisions
As data scientists, when we build a machine learning model our ultimate goal is to create value: We want to leverage our model’s predictions to do something better than we were doing it before, when we didn’t have a model or when our model was more primitive.
There are three factors to consider when choosing a threshold point: To understand all these factors at once, I like to draw a chart that shows queue rate, precision, and recall as a function of classifier threshold.
The answer depends on company priorities and the value of churn prevention, but some ideas are: send a $10 off coupon to high-risk users, offer high-risk users a discount if they sign a new 12 month contract now, offer to subsidize a new phone purchase to such users if they remain in contract for another 12 months, etc.) Let’s build a classifier to solve this task.
For example, if we choose a threshold of 0.4 (all cases with a score above 0.4 get reviewed / sent an offer / whatever treatment we have at hand), then: Of course, 0.4 might not be the best threshold value for this particular company.
If, for example, you only have the capacity to act on 500 cases (5% of the hypothetical 10k scored cases) per period, then you’re bound to choose a threshold around 0.8, in which case your precision jumps to 100% but your recall drops to ~35%.
As a simple example, let’s say that your company’s churn prevention strategy is to make direct phone calls to each churn-risky user (defined as users whose churn score is above the threshold you set).
Right now, your total cost is around $20 x 1000 = $20,000 and, since your precision is 96%, you’re making about 0.96 x $100 x 1000 = $96,000 in saved account “revenue”, making the total profit of your operation about $76,000.
queue cases with churn score above 0.5, ignore cases below 0.5), which is a common default value for classifiers, we would have created far less value than we were able to do do by using our thresholding chart.
With a threshold of 0.5 and review capacity to match, we would have made about $84,000 rather than $89,200, more than a $5,000 difference per period!) In general, using the queue rate / precision / recall graph is an easy way to perform “what if” analysis on the operational and strategic decision of how your model can be best used.
Once you’ve decided to threshold, data visualization techniques like the ones in this post will let you understand the tradeoffs you face for each possible threshold choice — and thus help you choose the most value-creating threshold for your particular business application.
Introduction to Anomaly Detection
Simple Statistical Methods The simplest approach to identifying irregularities in data is to flag the data points that deviate from common statistical properties of a distribution, includingmean, median, mode, and quantiles.
you can find a very intuitive explanation of ithere.) Challenges The low pass filter allows you to identify anomalies in simple use cases, but there are certain situations where this technique won't work.
The nearest set of data points are evaluated using a score, which could be Eucledian distance or a similar measure dependent on the type of the data (categorical or numerical).
The algorithm learns a soft boundary in order to cluster the normal data instances using the training set, and then, using the testing instance, it tunes itself to identify the abnormalities that fall outside the learned region.
Simple guide to confusion matrix terminology
A confusion matrix is a table that is often used to describe the performance of a classification model (or 'classifier') on a set of test data for which the true values are known.
wanted to create a 'quick reference guide' for confusion matrix terminology because I couldn't find an existing resource that suited my requirements: compact in presentation, using numbers instead of arbitrary variables, and explained both in terms of formulas and sentences.
couple other terms are also worth mentioning: And finally, for those of you from the world of Bayesian statistics, here's a quick summary of these terms from Applied Predictive Modeling: In relation to Bayesian statistics, the sensitivity and specificity are the conditional probabilities, the prevalence is the prior, and the positive/negative predicted values are the posterior probabilities.
- On Friday, January 18, 2019
RaceDay makes it easy for you to figure out your "threshold power", "threshold pace" or Critical Power or Pace for the purposes of calculating training zones.
Data Mining with Weka (4.4: Logistic regression)
Data Mining with Weka: online course from the University of Waikato Class 4 - Lesson 4: Logistic regression Slides (PDF): ..