AI News, Add Machine Learning For an Effective Marketing Campaign
Add Machine Learning For an Effective Marketing Campaign
To minimize the cost, the company wants to reach out to the smallest number of customers as possible but at the same time reach out to most (user defined) of the customers who are likely to respond.
In this technical blog, we will talk about a Cumulative Gains chart and Lift chart created in Oracle Data Visualization for Binary Classification machine learning models and how these charts are useful in evaluating the performance of a classification model.
Cumulative Gains and Lift chart is a measure of the effectiveness of a binary classification predictive model calculated as the ratio between the results obtained with and without the predictive model.
Now they want to find out how good the model is in identifying most number of likely subscribers by selecting a relatively small number of campaign base (i.e., 50,000).
What the cumulative Actuals chart says is that by the time we covered 40 percent of the population we already identified 80 percent of the subscribers and by reaching close to 70 percent of the population we have 90 percent of the subscribers.
If we are to compare one model with another using cumulative gains chart model with a greater area between the Cumulative Actuals line and Baseline is more effective in identifying a larger portion of subscribers by selecting a relatively smaller portion of the total population.
How to Compare Two Models Using Cumulative Gain and Lift Chart in Oracle Data Visualization: To compare how well two machine learning models have performed we can use Lift Calculation dataflow (included in the .dva project) as a template and plug in the output of Apply Model data flow as a data source/input to the flow.
Motivating Example Suppose we have a direct marketing campaign population is very big we want to select only a fraction of the population for marketing - those that are likely to respond we build a model that scores receivers - assigns probability that he will reply want to evaluate the performance of this model
P 0.42 10 N 0.39 11 P 0.33 12 N 0.31 13 P 0.23 14 N 0.22 15 N 0.19 16 N 0.15 17 P 0.12 18 N 0.11 19 P 0.04 20 N 0.01 sort the table by score desc max on top, min at bottom if model works well, expect responders at top non-responders at bottom the better the model the clearer the separation between positive and negative
Intuition suppose now we select top 20% records we see that out of 4 examples 3 of them are positive in total, there are 10 responders (positive classes) so with only 20% (4 records) we can target 3/10 = 30% responders we also can use a random model if you randomly sample 20% of records, you can expect to target only 20% your responders 20% of 10 = 2 so we're doing better than random can do it for all possible fractions of our data set and get this chart:
N 0.6 10 N 0.7 11 N 0.75 12 N 0.85 13 P 0.52 14 P 0.72 15 P 0.73 16 P 0.79 17 P 0.82 18 P 0.88 19 P 0.9 20 P 0.92 ⇒
Cls Score 20 P 0.92 19 P 0.9 18 P 0.88 12 N 0.85 17 P 0.82 16 P 0.79 11 N 0.75 15 P 0.73 14 P 0.72 10 N 0.7 9
'N', 'P', 'N', 'N', 'N', 'P', 'N', 'P', 'N') score = c(0.9, 0.8, 0.7, 0.6, 0.55, 0.51, 0.49, 0.43,
- On Thursday, December 14, 2017
- By SHUBHAM JAIN
7 Important Model Evaluation Error Metrics Everyone should know
Once they are finished building a model, they hurriedly map predicted values on unseen data.
The choice of metric completely depends on the type of model and the implementation plan of the model.
After you are finished building your model, these 7 metrics will help you in evaluating your model accuracy.
Warming up: Types of Predictive models When we talk about predictive models, we are talking either about a regression model (continuous output) or a classification model (nominal or binary output).
In classification problems, we use two types of algorithms (dependent on the kind of output it creates): Class output : Algorithms like SVM and KNN create a class output.
Illustrative Example For classification model evaluation metric discussion, I have used my predictions for the problem BCI challenge on Kaggle (link) .
The solution of the problem is irrelevant for the discussion, however the final predictions on the training set has been used for this article.
The predictions made for this problem were probability outputs which have been converted to class outputs assuming a threshold of 0.5 .
Here are a few definitions, you need to remember for a confusion matrix : Accuracy : the proportion of the total number of predictions that were correct.
As you can see from the above two tables, the Positive predictive Value is high, but negative predictive value is quite low.
On the other hand an attrition model will be more concerned with Senstivity.Confusion matrix are generally used only with class output models.
Here are the steps to build a Lift/Gain chart: Step 1 : Calculate probability for each observation Step 2 : Rank these probabilities in decreasing order.
For the case in hand here is the graph : This graph tells you how well is your model segregating responders from non-responders.
Here is the plot for the case in hand : You can also plot decile wise lift with decile number : What does this graph tell you?
Any model with lift @ decile above 100% till minimum 3rd decile and maximum 7th decile is a good model.
The K-S is 100, if the scores partition the population into two separate groups in which one group contains all the positives and the other all the negatives.
On the other hand, If the model cannot differentiate between positives and negatives, then it is as if the model selects cases randomly from the population.
In most classification models the K-S will fall between 0 and 100, and that the higher the value the better the model is at separating the positive from negative cases.
Following is a sample plot : The metrics covered till here are mostly used in classification problems.
Till here, we learnt about confusion matrix, lift and gain chart and kolmogorov-smirnov chart.
The biggest advantage of using ROC curve is that it is independent of the change in proportion of responders.
If we look at the confusion matrix below, we observe that for a probabilistic model, we get different value for each metric.
(1- specificity) is also known as false positive rate and sensitivity is also known as True Positive rate.
Following are a few thumb rules: .90-1 = excellent (A) .80-.90 = good (B) .70-.80 = fair (C) .60-.70 = poor (D) .50-.60 = fail (F) We see that we fall under the excellent band for the current model.
For instance, model with parameters (0.2,0.8) and model with parameter (0.8,0.2) can be coming out of the same model, hence these metrics should not be directly compared.
A solution to this concern can be true lift chart (finding the ratio of lift and perfect model lift at each decile).
The numerator and denominator of both x and y axis will change on similar scale in case of response rate shift.
Let’s see what happens in our case : AB – Concordant BC – Discordant Hence, we have 50% of concordant cases in this example.
Here are the key points to consider on RMSE: The power of ‘square root’ empowers this metric to show large number deviations.
The ‘squared’ nature of this metric helps to deliver more robust results which prevents cancelling the positive and negative error values.
As compared to mean absolute error, RMSE gives higher weightage and punishes large errors.
Though, cross validation isn’t a really a evaluation metric which is used openly to communicate model accuracy.
For TFI competition, following were three of my solution and scores (Lesser the better) : You will notice that the third entry which has the worst Public score turned to be the best model on Private ranking.
In the following section, I will discuss how you can know if a solution is an over-fit or not before we actually know the test results.
It simply says, try to leave a sample on which you do not train the model and test the model on this sample before finalizing the model.
This reduces bias because of sample selection to some extent but gives a smaller sample to train the model on.
Now we train models on 6 samples (Green boxes) and validate on 1 sample (grey box).
Here is how you code a k-fold in Python : from sklearn import cross_validation model = RandomForestClassifier(n_estimators=100) #Simple K-Fold cross validation.
#(Note: in older scikit-learn versions the 'n_folds' argument is named 'k'.) cv = cross_validation.KFold(len(train), n_folds=5, indices=False) results =  #
'model' can be replaced by your model object # 'Error_function' can be replaced by the error function of your analysis for traincv, testcv in cv:
probas = model.fit(train[traincv], target[traincv]).predict_proba(train[testcv])
results.append( Error_function ) print out the mean of the cross-validated results print 'Results: ' + str( np.array(results).mean() )
We have n samples and modelling repeated n number of times leaving only one observation out for cross validation.
In addition, the metrics covered in this article are some of the most used metrics of evaluation in a classification and regression problems.
Share this: 960 Click to share on LinkedIn (Opens in new window) 960 348 Click to share on Facebook (Opens in new window) 348 Click to share on Google+ (Opens in new window) Click to share on Twitter (Opens in new window) Click to share on Pocket (Opens in new window) Click to share on Reddit (Opens in new window) RELATED Model performance metrics: How well does my model perform?