AI News, Machine Learning 2

Machine Learning 2

This content was created from the Coursera course on introduction to machine learning.

This notebook contains: While there is some theory, most of the work here is about how to use the caret package for applying machine learning, we will progress into more technical discussion elsewhere.

The caret package is a great unified framework for applying all sorts of different machine learning algorithms from different developers.

p specifies the proportion of data that will exist in each chunk after splitting the data, in this case we split into two chunks of 75% and 25%.

K-folds cross-validation and other split methods Split data into k number of folds based on the outcome we want to split on (y).

Further resources: caret tutorial Usually it is best to do some exploratory plotting with data to get a sense of the basic structure of your data, and any quarks or anomalies.

Again, we will split the data into testing and training data, and we will only use the training data for testing the final model, all exploration and modeling should be done on the training data only.

Here we see that the variable is highly skewed, this skew, especially severe skew is likely to trick up many machine learning techniques, since the variance is larger because of some extreme values.

Note: that when we standardize for the test set for our final model we must subtract the mean and divide the standard deviation from the training set.

this algorithm tries to find the closest observations across all the variables and then takes the average of variable that is being imputed.

Creating dummy variables Notice that the value of these two variables are qualitative, but it makes more sense to utilize dummy categorical variables.

Non-linear relationships Once we have the variables, it still may be necessary to transform these data to make them more representative of the features we are trying to capture with the data.

For example, age may be an important variable in our model, but age2, or age3 may also serve as functional predictors, depending on the application.

The output in the first column represents the standardized age, the second represents the quadratic age, and the third column represents cubic age.

We see here that these two variables are highly correlated, the variables correspond to the number of times that the number 415 and 857 occur in the messages.

Including both of these predictors is not very useful, but what if we could combine these two variables into one single variable with higher predictive power than any of the one variables alone.

With the rotation we can see that the best principal component combination is the sum of the two variables with some coefficient, and the second best combination is taking the difference.

How to Make Predictions with scikit-learn

Once you choose and fit a final machine learning model in scikit-learn, you can use it to make predictions on new data instances.

In this tutorial, you will discover exactly how you can make classification and regression predictions with a finalized machine learning model in the scikit-learn Python library.

You can learn more about how to train a final model here: Classification problems are those where the model learns a mapping between input features and an output feature that is a label, such as “spam”

class prediction is: given the finalized model and one or more data instances, predict the class for the data instances.

We can predict the class for new data instances using our finalized classification model in scikit-learn using the predict() function.

Running the example predicts the class for the three new data instances, then prints the data and the predictions together.

This is called a probability prediction where given a new instance, the model returns the probability for each outcome class as a value between 0 and 1.

You can make these types of predictions in scikit-learn by calling the predict_proba() function, for example: This function is only available on those classification models capable of making a probability prediction, which is most, but not all, models.

Running the instance makes the probability predictions and then prints the input data instance and the probability of each instance belonging to the first and second classes (0 and 1).

Regression is a supervised learning problem where, given input examples, the model learns a mapping to suitable output quantities, such as “0.1”

The same function can be used to make a prediction for a single data instance as long as it is suitably wrapped in a surrounding list or array.

How to Use Machine Learning to Predict the Quality of Wines

A scenario where you need to identify benign tumor cells vs malignant tumor cells would be a classification problem.

So the job of the machine learning classifier would be to use the training data to learn, and find this line, curve, or decision boundary that most efficiently separates the two classes.

Most problems in Supervised Learning can be solved in three stages: The first step is to analyze the data and prepare it to be used for training purposes.

Now, before we dive into details of some machine learning algorithms, we need to learn about some possible errors that could happen in classification or regression problems.

An example might be when we’re trying to identify Game of Thrones characters as nobles or peasants, based on the heights and attires of the character.

It would therefore consistently mislabel future objects — for example labeling rainbows as easter eggs because they are colorful.

With respect to our wine data-set, our machine learning classifier will underfit if it is too “drunk” with only one kind of wine. :P Another example would be continuous data that is polynomial in nature, with a model that can only represent linear relationships.

If we repeatedly train a model with randomly selected subsets of data, we would expect its predictions to be different based on the specific examples given to it.

For the programmer, the challenge is to use the right type of algorithm which optimally solves the problem, while avoiding high bias or high variance.

So unlike traditional programming, Machine Learning and Deep Learning problems can often involve a lot of trail-and-error based approaches when you’re trying to find the best model.

Naive Bayes can be applied effectively for some classification problems, such as marking emails as spam or not spam, or classifying news articles.

Random forests work by using multiple decision trees — using a multitude of different decision trees with different predictions, a random forest combines the results of those individual trees to give the final outcomes.

Random forest applies an ensemble algorithm called bagging to the decision trees, which helps reduce variance and overfitting.

All wines with ratings less than 5 will fall under 0 (poor) category, wines with ratings 5 and 6 will be classified with the value 1 (average), and wines with 7 and above will be of great quality (2).

As you can see, there’s a new column called quality_categorical that classifies the quality ratings based on the range you chose earlier.

Next, you’ll create training and testing subsets for the data: In the above code cell, we use the sklearn train_test_split method and give it our features data (X)and the target labels (y).

Finally, you’ll write a function where you’ll initialize any 3 algorithms of your choice, and run the training on each of them using the above function.

The first row shows the performance metrics on the training data, and the second row shows the metrics for the testing data (data which hasn’t been seen before).

There are other metrics that can help you evaluate your models better in certain situations: Precision tells us what proportion of messages classified as spam were actually were spam.

It is a ratio of true positives (emails classified as spam, and which are actually spam) to all positives (all emails classified as spam, irrespective of whether that was the correct classification).

It is a ratio of true positives (words classified as spam, and which are actually spam) to all the words that were actually spam (irrespective of whether we classified them correctly).

This will give you the following relative rankings: As you can clearly see, the graph shows five most important features that determine how good a wine is.

Alcohol content and volatile acidity levels seem to be the most influential factors, followed by sulphates, citric acid and fixed acidity levels.

Intro to Azure ML: Splitting & Categorical Casting

Before we can feed this dataset into a machine learning model there are two things we have to take care of. First we have to make sure all the categorical ...

R - kNN - k nearest neighbor (part 1)

In this module we introduce the kNN k nearest neighbor model in R using the famous iris data set. We also introduce random number generation, splitting the ...

Basic Excel Business Analytics #56: Forecasting with Linear Regression: Trend & Seasonal Pattern

Download file from “Highline BI 348 Class” section: Learn: 1) (00:11) Forecasting using Regression when we ..

Probability Theory - The Math of Intelligence #6

We'll build a Spam Detector using a machine learning model called a Naive Bayes Classifier! This is our first real dip into probability theory in the series; I'll talk ...

Machine Learning: Inference for High-Dimensional Regression

At the Becker Friedman Institute's machine learning conference, Larry Wasserman of Carnegie Mellon University discusses the differences between machine ...

R 2.3 - if() Statements, Logical Operators, and the which() Function

if-else statements are a key component to any programming language. This video introduces how to effectively use these statements in R and also clarifies some ...

What is a Directed Acyclic Graph (DAG)? IOTA, Byteball, SPECTRE reviewed

In this video, I explain the concept of a Directed Acyclic Graph (DAG) vs. traditional blockchain data structures. I cover some popular projects using DAGs such ...

Weka Tutorial 10: Feature Selection with Filter (Data Dimensionality)

This tutorial shows how to select features from a set of features that performs best with a classification algorithm using filter method.

How to Use Tensorflow for Classification (LIVE)

In this live session I'll introduce & give an overview of Google's Deep Learning library, Tensorflow. Then we'll use it to build a neural network capable of ...

Calculating with hours, minutes, and time of day | Excel Tips | lynda.com

Learn how to use Microsoft Excel to calculate differences not just between dates, but times: hours, minutes, and seconds. Watch more Excel tutorials at ...