AI News, Visualizing Machine Learning Thresholds to Make Better Business Decisions

Visualizing Machine Learning Thresholds to Make Better Business Decisions

As data scientists, when we build a machine learning model our ultimate goal is to create value: We want to leverage our model’s predictions to do something better than we were doing it before, when we didn’t have a model or when our model was more primitive.

There are three factors to consider when choosing a threshold point: To understand all these factors at once, I like to draw a chart that shows queue rate, precision, and recall as a function of classifier threshold.

The answer depends on company priorities and the value of churn prevention, but some ideas are: send a \$10 off coupon to high-risk users, offer high-risk users a discount if they sign a new 12 month contract now, offer to subsidize a new phone purchase to such users if they remain in contract for another 12 months, etc.) Let’s build a classifier to solve this task.

For example, if we choose a threshold of 0.4 (all cases with a score above 0.4 get reviewed / sent an offer / whatever treatment we have at hand), then: Of course, 0.4 might not be the best threshold value for this particular company.

If, for example, you only have the capacity to act on 500 cases (5% of the hypothetical 10k scored cases) per period, then you’re bound to choose a threshold around 0.8, in which case your precision jumps to 100% but your recall drops to ~35%.

As a simple example, let’s say that your company’s churn prevention strategy is to make direct phone calls to each churn-risky user (defined as users whose churn score is above the threshold you set).

Right now, your total cost is around \$20 x 1000 = \$20,000 and, since your precision is 96%, you’re making about 0.96 x \$100 x 1000 = \$96,000 in saved account “revenue”, making the total profit of your operation about \$76,000.

queue cases with churn score above 0.5, ignore cases below 0.5), which is a common default value for classifiers, we would have created far less value than we were able to do do by using our thresholding chart.

With a threshold of 0.5 and review capacity to match, we would have made about \$84,000 rather than \$89,200, more than a \$5,000 difference per period!) In general, using the queue rate / precision / recall graph is an easy way to perform “what if” analysis on the operational and strategic decision of how your model can be best used.

Once you’ve decided to threshold, data visualization techniques like the ones in this post will let you understand the tradeoffs you face for each possible threshold choice — and thus help you choose the most value-creating threshold for your particular business application.

Introduction to Anomaly Detection

Simple Statistical Methods The simplest approach to identifying irregularities in data is to flag the data points that deviate from common statistical properties of a distribution, includingmean, median, mode, and quantiles.

you can find a very intuitive explanation of ithere.) Challenges The low pass filter allows you to identify anomalies in simple use cases, but there are certain situations where this technique won't work.

The nearest set of data points are evaluated using a score, which could be Eucledian distance or a similar measure dependent on the type of the data (categorical or numerical).

The algorithm learns a soft boundary in order to cluster the normal data instances using the training set, and then, using the testing instance, it tunes itself to identify the abnormalities that fall outside the learned region.

Machine Learning: Predicting Customer Churn

9 out of 10 customers who were predicted to stay by the model ended up staying, while 9 out of 10 of the customers predicted to churn by the model ended up churning.

The variables used to predict whether the customer churned available were: KNN only works with numerical variables, so for this model I will remove all non-numerical variables.(There are techniques for using categorical variables like one-hot encoding, but we’ll ignore them here).

After using k = 5, model performance improved to 90% Not all variable are useful in predicting if a customer will churn.

Information gains looks at each variable individually and asks “If we split the data set by this variable alone, how much easier is it to make a prediction on the outcome”.

If you reduce the probability threshold, more people will be predicted to churn, this gives you a higher number of “at risk customers” to target.

The choice of the probability threshold will be based on the business context, if the company wants to target a large amount of customers then a low threshold will be set.

However, if the company wants to be more efficient in spending a higher threshold will be set, at the cost of a smaller number of customers to target.

The company can then take steps to retain these customers, for example: If the company has segmented its customers and has an understanding of the types of customers with the highest lifetime value.

Companies live or die by their customers and being truly customer centric means knowing your customers and treating each one, especially the most valuable ones, uniquely.

Two-step Cluster Analysis in SPSS

This is a two-step cluster analysis using SPSS. I do this to demonstrate how to explore profiles of responses. These profiles can then be used as a moderator in ...

Data Mining with Weka (4.4: Logistic regression)

Data Mining with Weka: online course from the University of Waikato Class 4 - Lesson 4: Logistic regression Slides (PDF): ..