AI News, The Challenges of Building a Predictive Churn Model

The Challenges of Building a Predictive Churn Model

There's nothing wrong with doing a quick web search for a solution, but in many cases, what you'll find isn't technical or specific enough to solve your problem.Take the majority of online materials about churn modeling, for example.

Even the term 'churn modeling' has multiple meanings: It can refer tocalculating the proportion of customers who are churning, forecasting a future churn rate, or predicting the risk of churn for particular individuals.

The two most popular broad approaches to churn modeling are machine learning techniques and survival analysis, which each require distinct data structures and feature selection procedures.

In addition to these two approaches, there are many others:Ensemble models can provide superior accuracy, but could be time consuming to train and tune;rule-based techniques, latent probability models, and network-based models have all also shown some promising results.

When estimating model accuracy, it'simportant tochoosethe correct metricto optimize andthe rightvalidation dataset to train the model on.Both classimbalance and model monetary impact are metrics that could potentially be optimized;

This would mean that maximizing model precision is important, and lift captures how well a model identifies churners compared to the results you'd see sending out a discount to a random group of customers to retain them.

In an ideal case, you'd monitor a deployed model or several versions of the model to identify problems.But when a live test is too costly, careful construction of a validation set canachieve arealistic estimate of model performance.

In the coming weeks, we'll continue to explore theintricacies of churn modeling tohelp equip your team with the right tools to accurately measure when and why your customers churn.

Customer Analytics: Using Deep Learning With Keras To Predict Customer Churn

Customer churn is a problem that all companies need to monitor, especially those that depend on subscription-based revenue streams.

The simple fact is that most organizations have data that can be used to target these individuals and to understand the key drivers of churn, and we now have Keras for Deep Learning available in R (Yes, in R!!), which predicted customer churn with 82% accuracy.

As for most business problems, it’s equally important to explain what features drive the model, which is why we’ll use the lime package for explainability.

In addition, we use three new packages to assist with Machine Learning (ML): recipes for preprocessing, rsample for sampling data and yardstick for model metrics.

Customer churn refers to the situation when a customer ends their relationship with a company, and it’s a costly problem.

For many businesses that offer subscription based services, it’s critical to both predict customer churn and explain what features relate to customer churn.

Older techniques such as logistic regression can be less accurate than newer techniques such as deep learning, which is why we are going to show you how to model an ANN in R with the keras package.

telecommunications company [Telco] is concerned about the number of customers leaving their landline business for cable competitors.

The dataset includes information about: In this example we show you how to use keras to develop a sophisticated and highly accurate deep learning model in R.

We inspect the various classification metrics, and show that an un-tuned ANN model can easily get 82% accuracy on the unseen data.

Neural networks used to be frowned upon because of the “black box” nature meaning these sophisticated models (ANNs are highly accurate) are difficult to explain using traditional methods.

View in Full Screen Mode for best experience We saw that just last week the same Telco customer churn dataset was used in the article, Predict Customer Churn – Logistic Regression, Decision Tree and Random Forest.

We encourage the readers to check out both articles because, although the problem is the same, both solutions are beneficial to those learning data science and advanced modeling.

The data has a few columns and rows we’d like to remove: We’ll perform the cleaning operation with one tidyverse pipe (%>%) chain.

This phase of the analysis is often called exploratory analysis, but basically we are trying to answer the question, “What steps are needed to prepare for ML?” The key concept is knowing what transformations are needed to run the algorithm most effectively.

Numeric features like age, years worked, length of time in a position can generalize a group (or cohort).

We can split into six cohorts that divide up the user base by tenure in roughly one year (12 month) increments.

The correlation between “Churn” and “LogTotalCharges” is greatest in magnitude indicating the log transformation should improve the accuracy of the ANN model we build.

One-hot encoding is the process of converting categorical data to sparse data, which has columns of only zeros and ones (this is also called creating “dummy variables” or a “design matrix”).

It becomes slightly more complicated with multiple categories, which requires creating new columns of 1’s and 0`s for each category (actually one less).

ANN’s typically perform faster and often times with higher accuracy when the features are scaled and/or normalized (aka centered and scaled, also known as standardizing).

According to Sebastian Raschka, an expert in the field of Deep Learning, several examples when feature scaling is important are: The interested reader can read Sebastian Raschka’s article for a full discussion on the scaling/normalization topic.

Max Kuhn (creator of caret) has been putting some work into Rlang ML tools lately, and the payoff is beginning to take shape.

The function takes a familiar object argument, which is a modeling function such as object = Churn ~ .

This step is used to “estimate the required parameters from a training set that can later be applied to other data sets”.

Pro Tip: We can save the recipe object as an RDS file using saveRDS(), and then use it to bake() (discussed next) future raw data into ML-ready data in production!

We can apply the “recipe” to any data set with the bake() function, and it processes the data following our recipe steps.

We add “vec” to the name so we can easily remember the class of the object (it’s easy to get confused when working with tibbles, vectors, and matrix data types).

MLPs are one of the simplest forms of deep learning, but they are both highly accurate and serve as a jumping-off point for more complex algorithms.

The batch_size = 50 sets the number samples per gradient update within each epoch.

We set validation_split = 0.30 to include 30% of the data for model validation, which prevents overfitting.

Now let’s make some predictions from our keras model on the test data set, which was unseen during modeling (we use this for the true performance assessment).

We have two functions to generate predictions: The yardstick package has a collection of handy functions for measuring performance of machine learning models.

We create a data frame with the truth (actual values as factors), estimate (predicted values as factors), and the class probability (probability of yes as numeric).

Precision and recall are very important to the business case: The organization is concerned with balancing the cost of targeting and retaining customers at risk of leaving with the cost of inadvertently targeting customers that are not planning to leave (and potentially decreasing revenue from this group).

For those new to LIME, this YouTube video does a really nice job explaining how LIME helps to identify feature importance with black box machine learning models (e.g.

The trick here is to realize that it’s inputs must be x a model, newdata a dataframe object (this is important), and type which is not used but can be use to switch the output type.

We could tell the algorithm to bin continuous variables, but this may not make sense for categorical numeric data that we didn’t change to factors.

Finally, setting kernel_width = 0.5 allows us to increase the “model_r2” value by shrinking the localized evaluation.

It’s a more condensed version of plot_features(), but we need to be careful because it does not provide exact statistics and it makes it less easy to investigate binned features (Notice that “tenure” would not be identified as a contributor even though it shows up as a top feature in 7 of 10 cases).

0.25): Increases Likelihood of Churn (Red): Decreases Likelihood of Churn (Blue): We can investigate features that are most frequent in the LIME feature importance visualization along with those that the correlation analysis shows an above normal magnitude.

We’ll investigate: LIME cases indicate that the ANN model is using this feature frequently and high correlation agrees that this is important.

While we did not implement the CLV methodology herein, a full customer churn analysis would tie the churn to an classification cutoff (threshold) optimization to maximize the CLV with the predictive ANN model.

The application walks the user through the machine learning journey for how the model was developed, what it means to stakeholders, and how it can be used in production.

We are happy to announce a new project for 2018: Business Science University, an online school dedicated to helping data science learners.

We built an ANN model using the new keras package that achieved 82% predictive accuracy (without tuning)!

We used three new machine learning packages to help with preprocessing and measuring performance: recipes, rsample and yardstick.

For the IBM Telco dataset, tenure, contract type, internet service type, payment menthod, senior citizen status, and online security status were useful in diagnosing customer churn.

You learn everything you need to know about how to apply data science in a business context: “If you’ve been looking for a program like this, I’m happy to say it’s finally here!

It’s why I created Business Science University.” Matt Dancho, Founder of Business Science Did you know that an organization that loses 200 high performing employees per year is essentially losing $15M/year in lost productivity?

What if you could use data science to predict and explain turnover in a way that managers could make better decisions and executives would see results?

Shiny App That Predicts Attrition and Recommends Management Strategies, Taught in HR 301 Our first Data Science For Business (HR 201) Virtual Workshop teaches you how to solve this employee attrition problem in four courses that are fully integrated: The Virtual Workshop is intended for intermediate and advanced R users.

We help businesses that seek to add this competitive advantage but may not have the resources currently to implement predictive analytics.

What makes predicting customer churn a challenge?

There are three main challenges here: The dataset used to model customer churn is typically in ({features}, label) form, where features are a set of various customer metrics and label is set to 1 if customer is considered a churner and 0 otherwise.

The churn modelling approaches can be presented in three categories: For the sake of completeness, we should mention an interesting recent method called WTTE-RNN where the author essentially turns the churn modelling strategy on its head.

Simply put, a churn model that works well today could cease to perform in the future due to changes in customers’ behavioural patterns driving churn.

In order to achieve better churn prediction results we plan to enhance our training dataset, as well as experiment with churn modelling using advanced hybrid approaches.

In addition, we plan to experiment with modelling churn using a dataset augmented with time series features such as customers usage stats within their last six months.

The hybrid churn modelling methods discussed earlier might also prove useful as they allow for capturing non-linear relationship between features and customer churn risk.

Lastly, the recently proposed WTTE-RNN approach appears like a promising alternative to churn modelling as it can handle censorship and time series data, and is based on a mathematically sound formulation of churn.

Browse journals by subject

We use cookies to improve your website experience.

To learn about our use of cookies and how you can manage your cookie settings, please see our Cookie Policy.

Random Forest in R - Classification and Prediction Example with Definition & Steps

Provides steps for applying random forest to do classification and prediction. R code file: Data: Machine Learning .

IPPCR 2015: Conceptual Approach to Survival Analysis

IPPCR 2015: Conceptual Approach to Survival Analysis Air date: Monday, November 16, 2015, 5:00:00 PM Category: IPPCR Runtime: 01:30:11 Description: ...

Playtime 2016 - Predicting lifetime value in the apps world

Deepdive into lifetime value models and predictive analytics in the apps ecosystem. Tactics to get the most out of identified segments and how to upgrade their ...

Customer Life Time Value (CLTV) Prediction Using Embeddings

Customer Life Time Value (CLTV) Prediction Using Embeddings Ben Chamberlain (Imperial College London) Angelo Cardoso (ASOS) Bryan Liu (ASOS) Marc ...

How Random Forest algorithm works

In this video I explain very briefly how the Random Forest algorithm works with a simple example composed by 4 decision trees.

Logistic Regression Using Excel

Predict who survives the Titanic disaster using Excel. Logistic regression allows us to predict a categorical outcome using categorical and numeric data.

Cyfarfod Llawn Cynulliad Cenedlaethol Cymru 20.06.18

Y Cyfarfod Llawn yw cyfarfod o'r Cynulliad cyfan, a gynhelir yn Siambr drafod y Senedd. Y Llywydd sy'n cadeirio'r Cyfarfod Llawn a dyma'r prif fforwm i ...

Animal consciousness

Animal consciousness, or animal awareness, is the quality or state of self-awareness within an animal, or of being aware of an external object or something ...

15 Finding the Right Model

Download the sample tutorial files at