# AI News, How To Implement Machine Learning Algorithm Performance Metrics From Scratch With Python ## How To Implement Machine Learning Algorithm Performance Metrics From Scratch With Python

This article was written by Jason Brownlee. Jason is the editor-in-chief at MachineLearningMastery.com.He has a Masters and PhD in Artificial Intelligence, has published books on Machine Learning and has written operational code that is running in production.

Knowing how good a set of predictions is, allows you to make estimates about how good a given machine learning model of your problem, In this tutorial, you will discover how to implement four standard prediction evaluation metrics from scratch in Python.

## How to Use Machine Learning to Predict the Quality of Wines

A scenario where you need to identify benign tumor cells vs malignant tumor cells would be a classification problem.

So the job of the machine learning classifier would be to use the training data to learn, and find this line, curve, or decision boundary that most efficiently separates the two classes.

Most problems in Supervised Learning can be solved in three stages: The first step is to analyze the data and prepare it to be used for training purposes.

Now, before we dive into details of some machine learning algorithms, we need to learn about some possible errors that could happen in classification or regression problems.

An example might be when we’re trying to identify Game of Thrones characters as nobles or peasants, based on the heights and attires of the character.

It would therefore consistently mislabel future objects — for example labeling rainbows as easter eggs because they are colorful.

With respect to our wine data-set, our machine learning classifier will underfit if it is too “drunk” with only one kind of wine. :P Another example would be continuous data that is polynomial in nature, with a model that can only represent linear relationships.

If we repeatedly train a model with randomly selected subsets of data, we would expect its predictions to be different based on the specific examples given to it.

For the programmer, the challenge is to use the right type of algorithm which optimally solves the problem, while avoiding high bias or high variance.

So unlike traditional programming, Machine Learning and Deep Learning problems can often involve a lot of trail-and-error based approaches when you’re trying to find the best model.

Naive Bayes can be applied effectively for some classification problems, such as marking emails as spam or not spam, or classifying news articles.

Random forests work by using multiple decision trees — using a multitude of different decision trees with different predictions, a random forest combines the results of those individual trees to give the final outcomes.

Random forest applies an ensemble algorithm called bagging to the decision trees, which helps reduce variance and overfitting.

All wines with ratings less than 5 will fall under 0 (poor) category, wines with ratings 5 and 6 will be classified with the value 1 (average), and wines with 7 and above will be of great quality (2).

As you can see, there’s a new column called quality_categorical that classifies the quality ratings based on the range you chose earlier.

Next, you’ll create training and testing subsets for the data: In the above code cell, we use the sklearn train_test_split method and give it our features data (X)and the target labels (y).

Finally, you’ll write a function where you’ll initialize any 3 algorithms of your choice, and run the training on each of them using the above function.

The first row shows the performance metrics on the training data, and the second row shows the metrics for the testing data (data which hasn’t been seen before).

There are other metrics that can help you evaluate your models better in certain situations: Precision tells us what proportion of messages classified as spam were actually were spam.

It is a ratio of true positives (emails classified as spam, and which are actually spam) to all positives (all emails classified as spam, irrespective of whether that was the correct classification).

It is a ratio of true positives (words classified as spam, and which are actually spam) to all the words that were actually spam (irrespective of whether we classified them correctly).

This will give you the following relative rankings: As you can clearly see, the graph shows five most important features that determine how good a wine is.

Alcohol content and volatile acidity levels seem to be the most influential factors, followed by sulphates, citric acid and fixed acidity levels.

## How To Build a Machine Learning Classifier in Python with Scikit-learn

Machine learning is a research field in computer science, artificial intelligence, and statistics.

Banks use machine learning to detect fraudulent activity in credit card transactions, and healthcare companies are beginning to use machine learning to monitor, assess, and diagnose patients.

Make sure you’re in the directory where your environment is located, and run the following command: With our programming environment activated, check to see if the Sckikit-learn module is already installed: If sklearn is installed, this command will complete with no error.

If it is not installed, you will see the following error message: The error message indicates that sklearn is not installed, so download the library using pip: Once the installation completes, launch Jupyter Notebook: In Jupyter, create a new Python Notebook called ML Tutorial.

The dataset has 569 instances, or data, on 569 tumors and includes information on 30 attributes, or features, such as the radius of the tumor, texture, smoothness, and area.

The important dictionary keys to consider are the classification label names (target_names), the actual labels (target), the attribute/feature names (feature_names), and the attributes (data).

Given the label we are trying to predict (malignant versus benign tumor), possible useful attributes include the size, radius, and texture of the tumor.

To get a better understanding of our dataset, let's take a look at our data by printing our class labels, the first data instance's label, our feature names, and the feature values for the first data instance: You'll see the following results if you run the code:

As the image shows, our class names are malignant and benign, which are then mapped to binary values of 0 and 1, where 0 represents malignant tumors and 1 represents benign tumors.

Then initialize the model with the GaussianNB() function, then train the model by fitting it to the data using gnb.fit(): After we train the model, we can then use the trained model to make predictions on our test set, which we do using the predict() function.

As you see in the Jupyter Notebook output, the predict() function returned an array of 0s and 1s which represent our predicted values for the tumor class (malignant vs.

Using the array of true class labels, we can evaluate the accuracy of our model's predicted values by comparing the two arrays (test_labels vs.

## How To Implement Machine Learning Algorithm Performance Metrics From Scratch With Python

Knowing how good a set of predictions is, allows you to make estimates about how good a given machine learning model of your problem, In this tutorial, you will discover how to implement four standard prediction evaluation metrics from scratch in Python.

Performance metrics like classification accuracy and root mean squared error can give you a clear objective idea of how good a set of predictions is, and in turn how good the model is that generated them.

This is important as it allows you to tell the difference and select among: As such, performance metrics are a required building block in implementing machine learning algorithms from scratch.

This tutorial is divided into 4 parts: These steps will provide the foundations you need to handle evaluating predictions made by machine learning algorithms.

It is often presented as a percentage between 0% for the worst possible accuracy and 100% for the best possible accuracy.

This allows us to compare integers or strings, two main data types that we may choose to use when loading classification data.

Accuracy is a good metric to use when you have a small number of class values, such as 2, also called a binary classification problem.

Accuracy starts to lose it&#8217;s meaning when you have more class values and you may need to review a different perspective on the results, such as a confusion matrix.

The counts of actual class values are summarized horizontally, whereas the counts of predictions for each class values are presented vertically.

We can start off by defining the function to calculate the confusion matrix given a list of actual class values and a list of predictions.

It first makes a list of all of the unique class values and assigns each class value a unique integer or index into the confusion matrix.

The confusion matrix is always square, with the number of class values indicating the number of rows and columns required.

After the square confusion matrix is created and initialized to zero counts in each cell, it is a matter of looping through all predictions and incrementing the count in each cell.

The matrix is laid out with the expectation that each class label is a single character or single digit integer and that the counts are also single digit integers.

Looking down the diagonal of the matrix from the top left to bottom right, we can see that 3 predictions of 0 were correct and 4 predictions of 1 were correct.

Squaring each error forces the values to be positive, and the square root of the mean squared error returns the error metric back to the original units for comparison.

It uses the sqrt() function from the math module and uses the ** operator to raise the error to the 2nd power.

## Machine Learning with Python: A Tutorial

Machine learning is a field that uses algorithms to learn from data and make predictions.

Not only is machine learning interesting, it's also starting to be widely used, making it an extremely practical skill to learn.

If you want to dive more deeply into machine learning, and apply algorithms in your browser, check out our interactive machine learning fundamentals course.

Here's a list of the interesting ones: The first step in our exploration is to read in the data and print some quick summary statistics.

Pandas provides data structures and data analysis tools that make manipulating data in Python much quicker and more effective.

Our data file looks like this (we removed some columns to make it easier to look at): This is in a format called csv, or comma-separated values, which you can read more about here.

Each row of the data is a different board game, and different data points about each board game are separated by commas within the row.

We can easily conceptualize a csv file as a matrix: We removed some of the columns here for display purposes, but you can still get a sense of how the data looks visually.

The first row starts with id, the second row starts with 12333, and the third row starts with 120677.

This means that we can't effectively store our board game data in a matrix -- the name column contains strings, and the yearpublished column contains integers, which means that we can't store them both in the same matrix.

We can also see the shape of the data, which shows that it has 81312 rows, or games, and 20 columns, or data points describing each game.

There's a fairly normal distribution of ratings, with some right skew, and a mean rating around 6 (if you remove the zeros).

Here's an example: The code above will create a new dataframe, with only the rows in games where the value of the average_rating column equals 0.

We can also pass in multiple index values at once — games.iloc[0,0] will return the first column in the first row of games.

This shows us that the main difference between a game with a 0 rating and a game with a rating above 0 is that the 0 rated game has no reviews.

Clustering enables you to find patterns within your data easily by grouping similar rows (in this case, games), together.

Scikit-learn is the primary machine learning library in Python, and contains implementations of most common algorithms, including random forests, support vector machines, and logistic regression.

In order to use the clustering algorithm in Scikit-learn, we'll first intialize it using two parameters -- n_clusters defines how many clusters of games that we want, and random_state is a random seed we set in order to reproduce our results later.

Most machine learning algorithms can't directly operate on text data, and can only take numbers as input.

One sticking point is that our data has many columns -- it's outside of the realm of human understanding and physics to be able to visualize things in more than 3 dimensions.

PCA takes multiple columns, and turns them into fewer columns while trying to preserve the unique information in each column.

Generally, when we're doing regression, and predicting continuous variables, we'll need a different error metric than when we're performing classification, and predicting discrete values.

ids are presumably assigned when the game is added to the database, so this likely indicates that games created later score higher in the ratings.

Using columns that can only be computed with knowledge of the target can lead to overfitting, where your model is good in a training set, but doesn't generalize well to future data.

If you memorize that 1+1=2, and 2+2=4, you'll be able to perfectly answer any questions about 1+1 and 2+2.

However, the second anyone asks you something outside of your training set where you know the answer, like 3+3, you won't be able to solve it.

On the other hand, if you're able to generalize and learn addition, you'll make occasional mistakes because you haven't memorized the solutions -- maybe you'll get 3453 + 353535 off by one, but you'll be able to solve any addition problem thrown at you.

In order to prevent overfitting, we'll train our algorithm on a set consisting of 80% of the data, and test it on another set consisting of 20% of the data.

To do this, we first randomly samply 80% of the rows to be in the training set, then put everything else in the testing set.

Above, we exploit the fact that every Pandas row has a unique index to select any row not in the training set to be in the testing set.

This new data has to be in the exact same format as the training data, or the model won't make accurate predictions.

The random forest algorithm can find nonlinearities in data that a linear regression wouldn't be able to pick up on.

We believe in learning by doing, and you'll learn interactively in your browser by analyzing real data and building projects.

## Metrics To Evaluate Machine Learning Algorithms in Python

The metrics that you choose to evaluate your machine learning algorithms are very important.

Various different machine learning evaluation metrics are demonstrated in this post using small code recipes in Python and scikit-learn.

A 10-fold cross-validation test harness is used to demonstrate each metric, because this is the most likely scenario where you will be employing different algorithm evaluation metrics.

caveat in these recipes is the cross_val_score function used to report the performance in each recipe.It does allow the use of different scoring metrics that will be discussed, but all scores are reported so that they can be sorted in ascending order (largest score is best).

Some evaluation metrics (like mean squared error) are naturally descending scores (the smallest score is best) and as such are reported as negative by the cross_val_score() function.

You can learn more about machine learning algorithm performance metrics supported by scikit-learn on the page Model evaluation: quantifying the quality of predictions.

Classification problems are perhaps the most common type of machine learning problem and as such there are a myriad of metrics that can be used to evaluate predictions for these problems.

It is really only suitable when there are an equal number of observations in each class (which is rarely the case) and that all predictions and prediction errors are equally important, which is often not the case.

This can be converted into a percentage by multiplying the value by 100, giving an accuracy score of approximately 77% accurate.

Predictions for 0 that were actually 0 appear in the cell for prediction=0 and actual=0, whereas predictions for 0 that were actually 1 appear in the cell for prediction = 0 and actual=1.

Scikit-learn does provide a convenience report when working on classification problems to give you a quick idea of the accuracy of a model using a number of measures.

In this section will review 3 of the most common metrics for evaluating predictions on regression machine learning problems: The Mean Absolute Error (or MAE) is the sum of the absolute differences between predictions and actual values.

Taking the square root of the mean squared error converts the units back to the original units of the output variable and can be meaningful for description and presentation.

You learned about 3 classification metrics: Also 2 convenience methods for classification prediction results: And 3 regression metrics: Do you have any questions about metrics for evaluating machine learning algorithms or this post?

Regression Training and Testing - Practical Machine Learning Tutorial with Python p.4

Welcome to part four of the Machine Learning with Python tutorial series. In the previous tutorials, we got our initial data, we transformed and manipulated it a bit ...

Machine Learning: Testing and Error Metrics

A friendly journey into the process of evaluating and improving machine learning models. - Training, Testing - Evaluation Metrics: Accuracy, Precision, Recall, ...

Training a machine learning model with scikit-learn

Now that we're familiar with the famous iris dataset, let's actually use a classification model in scikit-learn to predict the species of an iris! We'll learn how the ...

Performance 4: Predictive metrics & charts

This video present the most popular and useful charts and metrics for evaluating predictive power of a forecasting model. This video supports the textbook ...

Data Mining with Weka (2.2: Training and testing)

Data Mining with Weka: online course from the University of Waikato Class 2 - Lesson 2: Training and testing Slides (PDF): ..

AI for Marketing & Growth #1 - Predictive Analytics in Marketing

AI for Marketing & Growth #1 - Predictive Analytics in Marketing Download our list of the world's best AI Newsletters Welcome to our ..

Data Science Demo - Customer Churn Analysis

This introduction to Data Science provides a demonstration of analyzing customer data to predict churn using the R programming language. MetaScale walks ...

Machine Learning 2 - Introduction to ML

In this lesson, we start with a quick prediction using a classifier that predicts if someone make more or less than \$50K annually. This classifier uses a dataset ...

Machine Learning Tutorial 3 - Intro to Models

Best Machine Learning book: (Fundamentals Of Machine Learning for Predictive Data Analytics). Machine Learning and Predictive ..

Overfitting, Underfitting, and Model Capacity | Lecture 4

Can a machine learning model predict a lottery? Let's find out! Deep Learning Crash Course Playlist: ...