AI News, Machine Learning 2

Machine Learning 2

This content was created from the Coursera course on introduction to machine learning.

This notebook contains: While there is some theory, most of the work here is about how to use the caret package for applying machine learning, we will progress into more technical discussion elsewhere.

The caret package is a great unified framework for applying all sorts of different machine learning algorithms from different developers.

p specifies the proportion of data that will exist in each chunk after splitting the data, in this case we split into two chunks of 75% and 25%.

K-folds cross-validation and other split methods Split data into k number of folds based on the outcome we want to split on (y).

Further resources: caret tutorial Usually it is best to do some exploratory plotting with data to get a sense of the basic structure of your data, and any quarks or anomalies.

Again, we will split the data into testing and training data, and we will only use the training data for testing the final model, all exploration and modeling should be done on the training data only.

Here we see that the variable is highly skewed, this skew, especially severe skew is likely to trick up many machine learning techniques, since the variance is larger because of some extreme values.

Note: that when we standardize for the test set for our final model we must subtract the mean and divide the standard deviation from the training set.

this algorithm tries to find the closest observations across all the variables and then takes the average of variable that is being imputed.

Creating dummy variables Notice that the value of these two variables are qualitative, but it makes more sense to utilize dummy categorical variables.

Non-linear relationships Once we have the variables, it still may be necessary to transform these data to make them more representative of the features we are trying to capture with the data.

For example, age may be an important variable in our model, but age2, or age3 may also serve as functional predictors, depending on the application.

The output in the first column represents the standardized age, the second represents the quadratic age, and the third column represents cubic age.

We see here that these two variables are highly correlated, the variables correspond to the number of times that the number 415 and 857 occur in the messages.

Including both of these predictors is not very useful, but what if we could combine these two variables into one single variable with higher predictive power than any of the one variables alone.

With the rotation we can see that the best principal component combination is the sum of the two variables with some coefficient, and the second best combination is taking the difference.

R - kNN - k nearest neighbor (part 1)

In this module we introduce the kNN k nearest neighbor model in R using the famous iris data set. We also introduce random number generation, splitting the ...

Basic Excel Business Analytics #56: Forecasting with Linear Regression: Trend & Seasonal Pattern

Download file from “Highline BI 348 Class” section: Learn: 1) (00:11) Forecasting using Regression when we ..

Probability Theory - The Math of Intelligence #6

We'll build a Spam Detector using a machine learning model called a Naive Bayes Classifier! This is our first real dip into probability theory in the series; I'll talk ...

Guest Tutorial #11: Hardware + JavaScript with noopkat

The Coding Train is thrilled to welcome Suz Hinton (aka noopkat)! Suz shows how to get started with hardware using JavaScript, an Arduino Uno board, ...

4. Comparative Genomic Analysis of Gene Regulation

MIT 7.91J Foundations of Computational and Systems Biology, Spring 2014 View the complete course: Instructor: Christopher Burge ..

Lecture 17 - Three Learning Principles

Three Learning Principles - Major pitfalls for machine learning practitioners; Occam's razor, sampling bias, and data snooping. Lecture 17 of 18 of Caltech's ...

SKlearn PCA, SVD Dimensionality Reduction

Dimensionality reduction is an important step in data pre processing and data visualisation specially when we have large number of highly correlated features.

How Random Forest algorithm works

In this video I explain very briefly how the Random Forest algorithm works with a simple example composed by 4 decision trees.

Weka Tutorial 10: Feature Selection with Filter (Data Dimensionality)

This tutorial shows how to select features from a set of features that performs best with a classification algorithm using filter method.

21. Chaos and Reductionism

(May 19, 2010) Professor Robert Sapolsky gives what he calls "one of the most difficult lectures of the course" about chaos and reductionism. He references a ...