AI News, Open Machine Learning Course. Topic 3. Classification, Decision Trees and k Nearest Neighbors

Open Machine Learning Course. Topic 3. Classification, Decision Trees and k Nearest Neighbors

Before we dive into the material for this week’s article, let’s talk about the kind of problem that we are going to solve and its place in the exciting field of machine learning.

Mitchell’s book Machine Learning (1997) gives a classic, general definition of machine learning as follows: In the various problem settings T, P, and E can refer to completely different things.

Here, the experience E is the available training data: a set of instances (clients), a collection of features (such as age, salary, type of loan, past loan defaults, etc.) for each, and a target variable (whether they defaulted on the loan).

In terms of machine learning, one can see it as a simple classifier that determines the appropriate form of publication (book, article, chapter of the book, preprint, publication in the “Higher School of Economics and the Media”) based on the content (book, pamphlet, paper), type of journal, original publication type (scientific journal, proceedings), etc.

For example, before the introduction of scalable machine learning algorithms, the credit scoring task in the banking sector was solved by experts.

For example, using the above scheme, the bank can explain to the client why they were denied for a loan: e.g the client does not own a house and her income is less than 5,000.

As we’ll see later, many other models, although more accurate, do not have this property and can be regarded as more of a “black box” approach, where it is harder to interpret how the input data was transformed into the output.

Due to this “understandability” and similarity to human decision-making (you can easily explain your model to your boss), decision trees have gained immense popularity.

That is to say, the “gender” feature separates the celebrity dataset much better than other features like “Angelina Jolie”, “Spanish”, or “loves football.” This reasoning corresponds to the concept of information gain based on entropy.

If we randomly pull out a ball, then it will be blue with probability p1 = 9/20 and yellow with probability p2 = 11/20, which gives us an entropy S0 = -9/20 log2(9/20) - 11/20 log2(11/20) ≈ 1.

This value by itself may not tell us much, but let’s see how the value changes if we were to break the balls into two groups: with the position less than or equal to 12 and greater than 12.

Formally, the information gain (IG) for a split based on the variable Q(in this example it’s a variable “x ≤ 12”) is defined as where q is the number of groups after the split, Ni is number of objects from the sample in which variable Q is equal to the i-th value.

In our example, our split yielded two groups (q = 2), one with 13 elements (N1 = 13), the other with 7 (N2 = 7).

Therefore, we can compute the information gain as It turns out that dividing the balls into two groups by splitting on “coordinate is less than or equal to 12” gave us a more ordered system.

We can make sure that the tree built in the previous example is optimal: it took only 5 “questions” (conditioned on the variable x) to perfectly fit a decision tree to the training set.

At the heart of the popular algorithms for decision tree construction, such as ID3 or C4.5, lies the principle of greedy maximization of information gain: at each step, the algorithm chooses the variable that gives the greatest information gain upon splitting.

A straight line will be too simple while some complex curve snaking by each red dot will be too complex and will lead us to making mistakes on new samples.

Sean Carroll: "The Big Picture" | Talks at Google

The Big Picture: On the Origins of Life, Meaning, and the Universe Itself. Already internationally acclaimed for his notions in modern physics, Sean Carroll is one ...

Tony Robbins Relationships: Who Looks After You?

Tony Robbins Relationships: Who Looks After You? *CREDIT Tony Robbins Visit Tony Robbins' websites: ..

New Frontiers in Imitation Learning

The ongoing explosion of spatiotemporal tracking data has now made it possible to analyze and model fine-grained behaviors in a wide range of domains.

Emmy Noether and The Fabric of Reality

Google Tech Talk June 16, 2010 ABSTRACT: Emmy Noether made perhaps the most significant discovery of the 20th century. A female Jewish intellectual in ...

Machine Learning with Scikit-Learn - The Cancer Dataset - 26 - SVMs 2

In this machine learning series I will work on the Wisconsin Breast Cancer dataset that comes with scikit-learn. I will train a few algorithms and evaluate their ...

H3 Podcast #43 - Vsauce

Big appreciation to Vsauce for slammin' with us! Thanks to (search for h3) & &

Wallace Thornhill: The Elegant Simplicity of the Electric Universe (with improved audio) | EU2016

The EU2017 Conference: Future Science -- Aug 17 - 20, Phoenix: This is the same video ..

The Fate of Earth’s Ecosystems (live public talk)

Original air date: Thursday, April 12 at 7 p.m. PT (10 p.m. ET, 0200 UTC) At JPL, we use satellite observations and sophisticated modeling to understand how ...

Max Gladstone: "The Ruin of Angels: A Novel of the Craft Sequence" | Talks at Google

Max Gladstone reads from The Ruin of Angels, the sixth novel in the Hugo-nominated Craft Sequence. Gladstone went to Yale, where he wrote a short story that ...

XKCD Comes to Dartmouth!

Billed as a "webcomic of romance, sarcasm, math, and language," xkcd is ridiculously popular among scientists and mathematicians. Yet xkcd can appeal to ...