# AI News, Comparing supervised learning algorithms ## Comparing supervised learning algorithms

In the data science course that I instruct, we cover most of the data science pipeline but focus especially on machine learning.

Near the end of this 11-week course, we spend a few hours reviewing the material that has been covered throughout the course, with the hope that students will start to construct mental connections between all of the different things they have learned.

decided to create a game for the students, in which I gave them a blank table listing the supervised learning algorithms we covered and asked them to compare the algorithms across a dozen different dimensions.

realize that the characteristics and relative performance of each algorithm can vary based upon the particulars of the data (and how well it is tuned), and thus some may argue that attempting to construct an 'objective' comparison is an ill-advised task.

## Essentials of Machine Learning Algorithms (with Python and R Codes)

Note: This article was originally published on Aug 10, 2015 and updated on Sept 9th, 2017

Google&#8217;s self-driving cars and robots get a lot of press, but the company&#8217;s real future is in machine learning, the technology that enables computers to get smarter and more personal.

The idea behind creating this guide is to simplify the journey of aspiring data scientists and machine learning enthusiasts across the world.

How it works: This algorithm consist of a target / outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables).

Using these set of variables, we generate a function that map inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data.

This machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions.

These algorithms can be applied to almost any data problem: It is used to estimate real values (cost of houses, number of calls, total sales etc.) based on continuous variable(s).

In this equation: These coefficients a and b are derived based on minimizing the sum of squared difference of distance between data points and regression line.

And, Multiple Linear Regression(as the name suggests) is characterized by multiple (more than 1) independent variables. While finding best fit line, you can fit a polynomial or curvilinear regression.

It is a classification not a regression algorithm. It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false ) based on given set of independent variable(s).

It chooses parameters that maximize the likelihood of observing the sample values rather than that minimize the sum of squared errors (like in ordinary regression).

source: statsexchange In the image above, you can see that population is classified into four different groups based on multiple attributes to identify &#8216;if they will play or not&#8217;.

In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate.

For example, if we only had two features like Height and Hair length of an individual, we&#8217;d first plot these two variables in two dimensional space where each point has two co-ordinates (these co-ordinates are known as Support Vectors)

In the example shown above, the line which splits the data into two differently classified groups is the black line, since the two closest points are the farthest apart from the line.

It is a classification technique based on Bayes’ theorem with an assumption of independence between predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

Step 1: Convert the data set to frequency table Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64.

Yes) * P(Yes) / P (Sunny) Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64 Now, P (Yes |

However, it is more widely used in classification problems in the industry. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors.

Its procedure follows a simple and easy  way to classify a given data set through a certain number of  clusters (assume k clusters).

We know that as the number of cluster increases, this value keeps on decreasing but if you plot the result you may see that the sum of squared distance decreases sharply up to some value of k, and then much more slowly after that.

grown as follows: For more details on this algorithm, comparing with decision tree and tuning model parameters, I would suggest you to read these articles: Python R Code

For example: E-commerce companies are capturing more details about customer like their demographics, web crawling history, what they like or dislike, purchase history, feedback and many others to give them personalized attention more than your nearest grocery shopkeeper.

How&#8217;d you identify highly significant variable(s) out 1000 or 2000? In such cases, dimensionality reduction algorithm helps us along with various other algorithms like Decision Tree, Random Forest, PCA, Factor Analysis, Identify based on correlation matrix, missing value ratio and others.

GBM is a boosting algorithm used when we deal with plenty of data to make a prediction with high prediction power. Boosting is actually an ensemble of learning algorithms which combines the prediction of several base estimators in order to improve robustness over a single estimator.

The XGBoost has an immensely high predictive power which makes it the best choice for accuracy in events as it possesses both linear model and the tree learning algorithm, making the algorithm almost 10x faster than existing gradient booster techniques.

It is designed to be distributed and efficient with the following advantages: The framework is a fast and high-performance gradient boosting one based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Since the LightGBM is based on decision tree algorithms, it splits the tree leaf wise with the best fit whereas other boosting algorithms split the tree depth wise or level wise rather than leaf-wise.

So when growing on the same leaf in Light GBM, the leaf-wise algorithm can reduce more loss than the level-wise algorithm and hence results in much better accuracy which can rarely be achieved by any of the existing boosting algorithms.

Catboost can automatically deal with categorical variables without showing the type conversion error, which helps you to focus on tuning your model better rather than sorting out trivial errors.

My sole intention behind writing this article and providing the codes in R and Python is to get you started right away. If you are keen to master machine learning, start right away.

## Modern Machine Learning Algorithms: Strengths and Weaknesses

In this guide, we&#8217;ll take a practical, concise tour through modern machine learning algorithms.

For example, Scikit-Learn&#8217;s documentation page groups algorithms by their learning mechanism. This produces categories such as: However, from our experience, this isn&#8217;t always the most practical way to group algorithms.

That&#8217;s because for applied machine learning, you&#8217;re usually not thinking, &#8220;boy do I want to train a support vector machine today!&#8221;

Of course, the algorithms you try must be appropriate for your problem, which is where picking the right machine learning task comes in.

As an analogy, if you need to clean your house, you might use a vacuum, a broom, or a mop, but you wouldn't bust out a shovel and start digging.

They are: In Part 2, we will cover dimensionality reduction, including: Two notes before continuing: Regression is the supervised learning task for modeling and predicting continuous, numeric variables. Examples include predicting real-estate prices, stock price movements, or student test scores.

decision trees) learn in a hierarchical fashion by repeatedly splitting your dataset into separate branches that maximize the information gain of each split.

We won't go into their underlying mechanics here, but in practice, RF's often perform very well out-of-the-box while GBM's are harder to tune but tend to have higher performance ceilings.

They use 'hidden layers' between inputs and outputs in order to model intermediary representations of the data that other algorithms cannot easily learn.

However, deep learning still requires much more data to train compared to other algorithms because the models have orders of magnitudes more parameters to estimate.

These algorithms are memory-intensive, perform poorly for high-dimensional data, and require a meaningful distance function to calculate similarity.

Examples include predicting employee churn, email spam, financial fraud, or student letter grades.

Predictions are mapped to be between 0 and 1 through the logistic function, which means that predictions can be interpreted as class probabilities.

The models themselves are still 'linear,' so they work well when your classes are linearly separable (i.e. they can be separated by a single decision surface).

To predict a new observation, you'd simply 'look up' the class probabilities in your 'probability table' based on its feature values.

However, we want to leave you with a few words of advice based on our experience: If you'd like to learn more about the applied machine learning workflow and how to efficiently train professional-grade models, we invite you to sign up for our free 7-day email crash course.

For more over-the-shoulder guidance, we also offer a comprehensive masterclass that further explains the intuition behind many of these algorithms and teaches you how to apply them to real-world problems.

## Machine Learning Crash Course, Part I: Supervised Machine Learning

A few notable quotes include: Addressing each quote in order: With all the nonsense the media uses to describe machine learning (ML) and artificial intelligence (AI), it&#8217;s time we do a deep dive into what these technologies actually do.

Instead, a machine learning program might say something like, &#8220;examine the last 1000 games of checkers I&#8217;ve played and pick the move that maximizes the probability that I will win the game&#8221;.

In this article, we&#8217;ll cover just the first of the three.  Supervised learning algorithms try to find a formula that accurately predicts the output label from input variables.

The table below lists the dollars spent on TV ads and the resulting sales from 200 advertising campaigns.  First, we feed the historical data into our linear regression model.

This produces a mathematical formula that predicts sales based on our input variable, TV ad spending: Sales = 7.03 + 0.047(TV) In the above graph, we have plotted both the historical data points (the black dots) as well as the formula our ML algorithm produces (the red line).

To answer our original question of expected revenue, we can simply plug \$100 in for the variable TV to get, \$11.73 = 7.03 + 0.047(\$100) In other words, after spending 100 dollars on TV advertising, we can expect to generate only \$11.73 in sales, based on past data.

In summary, we used machine learning (specifically, linear regression) to predict how much revenue a TV advertising campaign would generate, based on historical data.

For example, a credit card company might want to predict whether a customer will default in the upcoming 3 months based on their current balance.

From the logistic regression equation, you could check their balance, see that it’s only \$400, and safely conclude they probably won’t default in three months.

Machine Learning - Supervised VS Unsupervised Learning

Enroll in the course for free at: Machine Learning can be an incredibly beneficial tool to uncover hidden insights and predict..

13. Classification

MIT 6.0002 Introduction to Computational Thinking and Data Science, Fall 2016 View the complete course: Instructor: John Guttag Prof. Guttag introduces supervised..

Machine Learning and Predictive Analytics - Analytics Base Table (ABT) - #MachineLearning

Machine Learning and Predictive Analytics. #MachineLearning Learn More: (Fundamentals Of Machine Learning for Predictive Data Analytics). Analytics Base Table (ABT)..

Hello World - Machine Learning Recipes #1

Six lines of Python is all it takes to write your first machine learning program! In this episode, we'll briefly introduce what machine learning is and why it's important. Then, we'll follow...

Introduction to Machine Learning with MATLAB!

Get The MATLAB Course Bundle! Limited FREE course coupons available! Get the complete course for only \$10!.

kNN Machine Learning Algorithm - Excel

kNN, k Nearest Neighbors Machine Learning Algorithm tutorial. Follow this link for an entire Intro course on Machine Learning using R, did I mention it's FREE:

Understanding Decision Tree Algorithm | Edureka

Data Science Training - ) Watch Sample Class Recording:

Machine Learning and Predictive Analytics - Generalization (Algorithms) - #MachineLearning

Machine Learning and Predictive Analytics. #MachineLearning Learn More: (Fundamentals Of Machine Learning for Predictive Data Analytics). Generalization (Algorithms)..

Machine Learning - Hierarchical Clustering

Enroll in the course for free at: Machine Learning can be an incredibly beneficial tool to uncover hidden insights and predict..

Supervised Learning Quiz Quiz Solution - Georgia Tech - Machine Learning