AI News, March Machine LearningMania

March Machine LearningMania

Update: If you want more practice data projects, be sure to check out In this post, we again use a third party data project taken from Kaggle, a company which hosts data science competitions.

Since this was a competition for a prize and not in the interest of learning, users are no longer able to submit their predictions to Kaggle and receive a score.

We hope this serves as another didactic example for people to follow along and since we are learners ourselves, we’d appreciate any feedback!

It’s a little bit more challenging than the Titanic data project, and we’ll do our best to explain everything as concise as possible.

Where the microscope enabled us to see things too small for the human eye, and what data analytics enables us to do now is see things previously too big.

Hal Varian, Chief Economist at Google, said this about the field of Data Analytics and Data Science: If you are looking for a career where your services will be in high demand, you should find something where you provide a scarce, complementary service to something that is getting ubiquitous and cheap.

So my recommendation is to take lots of courses about how to manipulate and analyze data: databases, machine learning, econometrics, statistics, visualization, and so on.

We are recent UC Berkeley grads who studied Statistics (among other things) and realized two things: (1) how essential an understanding of Statistics and Data Analysis was to almost every industry and (2) how teachable these analytic practices could be!

While copying and pasting allows you to run the code, you should read through and have an intuitive understanding of what is happening in the code.

Our goal isn’t to necessarily teach R syntax, but to provide a sense of the process of digging into data and enable you to use other resources to better learn R.

We again utilize the read.csv() function and set stringsAsFactors = FALSE which sets the columns of our data to be non-categorical and makes them easier to manipulate.

In the first stage of the Kaggle competition, you must make predictions on every possible first-round tournament matchups between every team for seasons N, O, P, Q, and R (each alphabet represents a season).

column, there will be numbers ranging from 0 to 1 representing the probability of the first TEAMID winning (the left side team).

column represents the probability of team ID 503 winning (or alternatively team ID 506 losing).

The code below will create a file in your working directory (the Kaggle folder on your desktop) that you can submit to the competition!

Here we simply guess 50% for every possible matchup, which is the equivalent of flipping a fair-coin to predict each game!

Specifically how can we use the data Kaggle has given us to predict each matchup, and more broadly what are the indicators for any given team winning a game in March Madness?

There is certain data that Kaggle doesn’t offer, that we may find intuitively significant or data we can create using the datasets Kaggle gives us.

Predicting the Winning Team with Machine Learning

Can we predict the outcome of a football game given a dataset of past games? That's the question that we'll answer in this episode by using the scikit-learn ...

How to do the Titanic Kaggle competition in R - Part 1

As part of submitting to Data Science Dojo's Kaggle competition you need to create a model out of the titanic data set. We will show you how to do this using ...

Data Science and Medicine: What’s Possibly at the Cutting Edge?”

Presented by: Anthony Goldbloom, Founder and CEO of Kaggle, and Dr. Andrew Arai, Senior Investigator, National Institutes of Heart, Lung, and Blood (NHLBI) ...

Azure Machine Learning: Predict Who Survives the Titanic - Jennifer Marsman - Duo Tech Talk

Interested in doing machine learning in the cloud? In this demo-heavy talk, I will set the stage with some information on the different types of machine learning ...

Intro to Azure ML: Subscriptions & Workspaces

To begin data mining in Azure Machine Learning Studio, we must first setup an Azure subscription. Once we have a subscription, create an Azure Machine ...

Intro to Azure ML: Renaming & Replicating Data

Now that we have a better understanding of our dataset. Let's go out and gather more data. There is an additional dataset inside of Azure ML we can use to look ...

Kaggle with Wendy Kan: GCPPodcast 84

Original post: Wendy Kan joins your co-hosts Francesc and Mark today to talk about ..

Lesson 1: Deep Learning 2018

NB: Please go to to view this video since there is important updated information there. If you have questions, use the forums at ..

Intro to Big Data, Data Science & Predictive Analytics

We introduce you to the wide world of Big Data, throwing back the curtain on the diversity and ubiquity of data science in the modern world. We also give you a ...

Lesson 2: Deep Learning 2018

NB: Please go to to view this video since there is important updated information there. If you have questions, use the forums at ..