AI News, Response Modeling using Machine Learning Techniques in R

Response Modeling using Machine Learning Techniques in R

Though we have selected credit scoring problem as a case study in this article, the same process will be applicable for wide range of classification or regression problems “Response modeling”, “Risk Management”, “Attrition/Churn management”, “Cross-Sell/Up-Sell”, “usage Patterns”, “Net Present Value”, “Life Time Value”, “Predictive Maintenance and condition based monitoring”, “Warranty”, “Reliability”, “Failure Prediction”, “Image/Video Processing”, “Crime”, “Medical Experiments”, “Hidden pattern recognition” etc.

The basic difference of traditional modeling and machine learning is that “in traditional modeling we intend to setup a modelimg framework and try to establish relationships while in machine learning we allow the model to learn from the data by understanding the hidden patterns”.

Hence the first one requires analyst to have solid understanding of statistical techniques and business knowledge while the later one is more complex in nature and computational intensive, hence requires higher computation power of the systems and analyst needs to be tech savvy.

How to Build Credit Risk Models Using AI and Machine Learning - FICO

Which works better for modeling credit risk: traditional scorecards or artificial intelligence and machine learning?

While some new market entrants may have a vested interest in pushing AI solutions, the fact is that traditional scorecard methods and AI bring different advantages to credit risk modeling — if you know how to use them together.

How FICO Uses AI to Build Better Credit Risk Models As with our other origination products, Origination Manager Essentials includes credit risk models, and these models are segmented — different types of small business customers and different credit products require different models to assess their credit risk.

This allows us to apply AI to improve risk prediction without creating “black box” models that don’t give risk managers, customers and regulators the required insights into why individuals score the way they do.

For example, collaborative profiles derive behavioral archetype distributions — these could include archetypes that point to credit seekers building credit histories vs.

Another approach is to use AI and machine learning to “train” models to discover maximum predictive power, and find new relationships amongst input features that could produce a stronger model.

The option to include this interaction as a nonlinear input feature in an interpretable fashion into a scorecard led to a substantial improvement (~10%) of the lift measure, used to characterize the performance of attrition models.

By combining this technology with scorecard technology, we created a strong, robust, palatable solution and saw a 20% improvement in model performance (KS) over a traditional scorecard model alone (see below).

Innovation is great, but you don’t want to naively throw in a lot of new data sources ––many of which may not be permissible in credit decision-making, and might be easily manipulated (like social media data) –– into an AI model that comes up with a score that may not be explainable.

Most AI technology remains “black box” and can’t provide an answer when a customer asks, “How did I get this score?” That said, if I can use machine learning to expose powerful and predictive new latent features of credit risk, I can then directly incorporate them into a scorecard model.

Machine Learning

Supervised learning algorithms are trained using labeled examples, such as an input where the desired output is known.

The learning algorithm receives a set of inputs along with the corresponding correct outputs, and the algorithm learns by comparing its actual output with correct outputs to find errors.

Through methods like classification, regression, prediction and gradient boosting, supervised learning uses patterns to predict the values of the label on additional unlabeled data.

Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering and singular value decomposition.

Practical techniques for interpreting machine learning models

Along the way, Patrick, Avni, and Mark draw on their applied experience to highlight crucial success factors and common pitfalls not typically discussed in blog posts and open source software documentation, such as the importance of both local and global interpretability and the approximate nature of nearly all machine learning interpretability techniques.

Outline: Enhancing transparency in machine learning models with Python and XGBoost (example Jupyter notebook) Increasing transparency and accountability in your machine learning project with Python (example Jupyter notebook) Explaining your predictive models to business stakeholders with local interpretable model-agnostic explanations (LIME) using Python and H2O (example Jupyter Notebook) Testing machine learning models for accuracy, trustworthiness, and stability with Python and H2O (example Jupyter notebook)

Building A Deep Learning Model using Keras

Deep learning is an increasingly popular subset of machine learning.

A neural network takes in inputs, which are then processed in hidden layers using weights that are adjusted during training.

We will build a regression model to predict an employee’s wage per hour, and we will build a classification model to predict whether or not a patient has diabetes.

Note: The datasets we will be using are relatively clean, so we will not perform any data preprocessing in order to get our data ready for modeling.

Datasets that you will use in future projects may not be so clean — for example, they may have missing values — so you may need to use data preprocessing techniques to alter your datasets to get more accurate results.

I will not go into detail on Pandas, but it is a library you should become familiar with if you’re looking to dive further into data science and machine learning.

The ‘head()’ function will show the first 5 rows of the dataframe so you can check that the data has been read in properly and can take an initial look at how the data is structured.

For example, if you are predicting diabetes in patients, going from age 10 to 11 is different than going from age 60–61.

A smaller learning rate may lead to more accurate weights (up to a certain point), but the time it takes to compute the weights will be longer.

To train, we will use the ‘fit()’ function on our model with the following five parameters: training data (train_X), target data (train_y), validation split, the number of epochs and callbacks.

During training, we will be able to see the validation loss, which give the mean squared error of our model on the validation set.

We will set the validation split at 0.2, which means that 20% of the training data we provide in the model will be set aside for testing model performance.

Early stopping will stop the model from training before the number of epochs is reached if the model stops improving.

Sometimes, the validation loss can stop improving then improve in the next epoch, but after 3 epochs in which the validation loss doesn’t improve, it usually won’t improve again.

We can see that by increasing our model capacity, we have improved our validation loss from 32.63 in our old model to 28.06 in our new model.

Predictive analytics

Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modelling, and machine learning, that analyze current and historical facts to make predictions about future or otherwise unknown events.[1][2]

The defining functional effect of these technical approaches is that predictive analytics provides a predictive score (probability) for each individual (customer, employee, healthcare patient, product SKU, vehicle, component, machine, or other organizational unit) in order to determine, inform, or influence organizational processes that pertain across large numbers of individuals, such as in marketing, credit risk assessment, fraud detection, manufacturing, healthcare, and government operations including law enforcement.

Scoring models process a customer's credit history, loan application, customer data, etc., in order to rank-order individuals by their likelihood of making future credit payments on time.

The core of predictive analytics relies on capturing relationships between explanatory variables and the predicted variables from past occurrences, and exploiting them to predict the unknown outcome.

In future industrial systems, the value of predictive analytics will be to predict and prevent potential issues to achieve near-zero break-down and further be integrated into prescriptive analytics for decision optimization.[citation needed]

This category encompasses models in many areas, such as marketing, where they seek out subtle data patterns to answer questions about customer performance, or fraud detection models.

With advancements in computing speed, individual agent modeling systems have become capable of simulating human behaviour or reactions to given stimuli or scenarios.

For example, the training sample may consist of literary attributes of writings by Victorian authors, with known attribution, and the out-of sample unit may be newly found writing with unknown authorship;

Another example is given by analysis of blood splatter in simulated crime scenes in which the out of sample unit is the actual blood splatter pattern from a crime scene.

Decision models describe the relationship between all the elements of a decision—the known data (including results of predictive models), the decision, and the forecast results of the decision—in order to predict the results of decisions involving many variables.

Methods of predictive analysis are applied to customer data to pursue CRM objectives, which involve constructing a holistic view of the customer no matter where their information resides in the company or the department involved.

They must analyze and understand the products in demand or have the potential for high demand, predict customers' buying habits in order to promote relevant products at multiple touch points, and proactively identify and mitigate issues that have the potential to lose customers or reduce their ability to gain new ones.

Experts use predictive analysis in health care primarily to determine which patients are at risk of developing certain conditions, like diabetes, asthma, heart disease, and other lifetime illnesses.

Clinical decision support (CDS) provides clinicians, staff, patients, or other individuals with knowledge and person-specific information, intelligently filtered or presented at appropriate times, to enhance health and health care.

It encompasses a variety of tools and interventions such as computerized alerts and reminders, clinical guidelines, order sets, patient data reports and dashboards, documentation templates, diagnostic support, and clinical workflow tools.

Using large and multi-source imaging, genetics, clinical and demographic data, these investigators developed a decision support system that can predict the state of the disease with high accuracy, consistency and precision.

Predictive analytics can help optimize the allocation of collection resources by identifying the most effective collection agencies, contact strategies, legal actions and other strategies to each customer, thus significantly increasing recovery at the same time reducing collection costs.

For an organization that offers multiple products, predictive analytics can help analyze customers' spending, usage and other behavior, leading to efficient cross sales, or selling additional products to current customers.[2]

With the number of competing services available, businesses need to focus efforts on maintaining continuous customer satisfaction, rewarding consumer loyalty and minimizing customer attrition.

By a frequent examination of a customer's past service usage, service performance, spending and other behavior patterns, predictive models can determine the likelihood of a customer terminating service sometime soon.[7]

Apart from identifying prospects, predictive analytics can also help to identify the most effective combination of product versions, marketing material, communication channels and timing that should be used to target a given consumer.

Fraud is a big problem for many businesses and can be of various types: inaccurate credit applications, fraudulent transactions (both offline and online), identity thefts and false insurance claims.

They can also be addressed via machine learning approaches which transform the original time series into a feature vector space, where the learning algorithm finds patterns that have predictive power.[27][28]

For a health insurance provider, predictive analytics can analyze a few years of past medical claims data, as well as lab, pharmacy and other records where available, to predict how expensive an enrollee is likely to be in the future.

Predictive analytics in the form of credit scores have reduced the amount of time it takes for loan approvals, especially in the mortgage market where lending decisions are now made in a matter of hours rather than days or even weeks.

Examples of big data sources include web logs, RFID, sensor data, social networks, Internet search indexing, call detail records, military surveillance, and complex data in astronomic, biogeochemical, genomics, and atmospheric sciences.

Thanks to technological advances in computer hardware—faster CPUs, cheaper memory, and MPP architectures—and new technologies such as Hadoop, MapReduce, and in-database and text analytics for processing big data, it is now feasible to collect, analyze, and mine massive amounts of structured and unstructured data for new insights.[25]

Today, exploring big data and using predictive analytics is within reach of more organizations than ever before and new methods that are capable for handling such datasets are proposed.[31][32]

While mathematically it is feasible to apply multiple regression to discrete ordered dependent variables, some of the assumptions behind the theory of multiple linear regression no longer hold, and there are other techniques such as discrete choice models which are better suited for this type of analysis.

In a classification setting, assigning outcome probabilities to observations can be achieved through the use of a logistic model, which is basically a method which transforms information about the binary dependent variable into an unbounded continuous variable and estimates a regular multivariate model (See Allison's Logistic Regression for more information on the theory of logistic regression).

The multinomial logit model is the appropriate technique in these cases, especially when the dependent variable categories are not ordered (for examples colors like red, blue, green).

good way to understand the key difference between probit and logit models is to assume that the dependent variable is driven by a latent variable z, which is a sum of a linear combination of explanatory variables and a random noise term.

As a result, standard regression techniques cannot be applied to time series data and methodology has been developed to decompose the trend, seasonal and cyclical component of the series.

The identification stage involves identifying if the series is stationary or not and the presence of seasonality by examining plots of the series, autocorrelation and partial autocorrelation functions.

In recent years time series models have become more sophisticated and attempt to model conditional heteroskedasticity with models such as ARCH (autoregressive conditional heteroskedasticity) and GARCH (generalized autoregressive conditional heteroskedasticity) models frequently used for financial time series.

In addition time series models are also used to understand inter-relationships among economic variables represented by systems of equations using VAR (vector autoregression) and structural VAR models.

These techniques were primarily developed in the medical and biological sciences, but they are also widely used in the social sciences like economics, as well as in engineering (reliability and failure time analysis).

Censoring and non-normality, which are characteristic of survival data, generate difficulty when trying to analyze the data using conventional statistical models such as multiple linear regression.

The normal distribution, being a symmetric distribution, takes positive as well as negative values, but duration by its very nature cannot be negative and therefore normality cannot be assumed when dealing with duration/survival data.

In survival analysis, censored observations arise whenever the dependent variable of interest represents the time to a terminal event, and the duration of the study is limited in time.

A distribution whose hazard function slopes upward is said to have positive duration dependence, a decreasing hazard shows negative duration dependence whereas constant hazard is a process with no memory usually characterized by the exponential distribution.

Globally-optimal classification tree analysis (GO-CTA) (also called hierarchical optimal discriminant analysis) is a generalization of optimal discriminant analysis that may be used to identify the statistical model that has maximum accuracy for predicting the value of a categorical dependent variable for a dataset consisting of categorical and continuous variables.

The output of HODA is a non-orthogonal tree that combines categorical variables and cut points for continuous variables that yields maximum predictive accuracy, an assessment of the exact Type I error rate, and an evaluation of potential cross-generalizability of the statistical model.

However, ANOVA and regression analysis give a dependent variable that is a numerical variable, while hierarchical optimal discriminant analysis gives a dependent variable that is a class variable.

Classification and regression trees (CART) are a non-parametric decision tree learning technique that produces either classification or regression trees, depending on whether the dependent variable is categorical or numeric, respectively.

Today, since it includes a number of advanced statistical methods for regression and classification, it finds application in a wide variety of fields including medical diagnostics, credit card fraud detection, face and speech recognition and analysis of the stock market.

The performance of the kNN algorithm is influenced by three main factors: (1) the distance measure used to locate the nearest neighbours, (2) the decision rule used to derive a classification from the k-nearest neighbours, and (3) the number of neighbours used to classify the new sample.

as the size of the training set increases, if the observations are independent and identically distributed (i.i.d.), regardless of the distribution from which the sample is drawn, the predicted class will converge to the class assignment that minimizes misclassification error.

Occurrences of events are neither uniform nor random in distribution—there are spatial environment factors (infrastructure, sociocultural, topographic, etc.) that constrain and influence where the locations of events occur.

Geospatial predictive modeling attempts to describe those constraints and influences by spatially correlating occurrences of historical geospatial locations with environmental factors that represent those constraints and influences.

As more organizations adopt predictive analytics into decision-making processes and integrate it into their operations, they are creating a shift in the market toward business users as the primary consumers of the information.

Vendors are responding by creating new software that removes the mathematical complexity, provides user-friendly graphic interfaces and/or builds in short cuts that can, for example, recognize the kind of data available and suggest an appropriate predictive model.[33]

In a study of 1072 papers published in Information Systems Research and MIS Quarterly between 1990 and 2006, only 52 empirical papers attempted predictive claims, of which only 7 carried out proper predictive modeling or testing.[43]

R tutorial: Intro to Credit Risk Modeling

Learn more about credit risk modeling with R: Hi, and welcome to the first video of ..

Credit Risk Modeling

Credit Risk Analytics is undoubtedly one of the most crucial activities in the field of financial risk management at the moment. With the recent financial downturn ...

Logistic Regression in R | Machine Learning Algorithms | Data Science Training | Edureka

Data Science Training - ) This Logistic Regression Tutorial shall give you a clear understanding as to how a Logistic ..

Logistic Regression Modelling using SAS for beginners

Logistic regression is a popular classification technique used in classifying data in to categories. It is simple and yet powerful. It is used in credit scoring, ...

Data Mining Techniques to Prevent Credit Card Fraud

Includes a brief introduction to credit card fraud, types of credit card fraud, how fraud is detected, applicable data mining techniques, as well as drawbacks.

Model Validation:Simple ways of validating predictive models

In this video you will learn a number of simple ways of validating predictive models. Model validation is an important step in Analytics. It is important to validate ...

Technical Course: Decision Trees: Decision Tree Analysis

Decision Tree Tutorial and Introduction by Jigsaw Academy. This is part one of the Decision Tree tutorial from our Foundation Analytics course ...

Kaggle Competition No.1 Solution: Predicting Consumer Credit Default

Contributed by Bernard Ong, Jielei Emma Zhu, Miaozhi Trinity Yu, Nanda Trichy Rajarathinam. They enrolled in the NYC Data Science Academy 12-Week Full ...

AI in Financial Services–Final Mile of a Debit Card Fraud Machine Learning Model

This meetup was recorded on August 21, 2018 in Mountain View, CA. Description: Today, credit and debit card fraud detection machine learning models are a ...

Ajinkya More | Resampling techniques and other strategies

PyData SF 2016 Ajinkya More | Resampling techniques and other strategies for handling highly unbalanced datasets in classification Many real world machine ...