AI News, How to Kick Ass in Competitive Machine Learning

How to Kick Ass in Competitive Machine Learning

In it, David distills the advice from 3 Kaggle masters Tim Salimans, Steve Donoho and Anil Thomas and an analysis of the results from 10 competitions into an approach for performing well in Kaggle competitions and then tests these lessons by participating in 2 case study competitions.

Feature engineering is the data preparation step that involves the transformation, aggregation and decomposition of attributes into those features that best characterize the structure in the data for the modeling problem.

This extends to the scores observed on the leaderboard, which is an evaluation of the models on sample of a validation dataset (typically around 20%) used to identify competition winners.

Ensembles refer to the combination of the predictions from multiple models into a single set of predictions, typically a blend weighted by the skill of each contributing model (such as on the public leaderboard).

In this post you discovered a framework of 5 concerns when participating in competitive machine learning: feature engineering, overfitting, use of simple models, ensembles and predicting the right thing.

March Machine Learning Mania, 5th Place Winner's Interview: David Scott

Kaggle's annual March Machine Learning Mania competition  drew 442 teams to predict the outcomes of the 2017 NCAA Men's Basketball tournament.

 In this winner's interview, Kaggler David Scott describes how he came in 5th place by stepping back from solution mode and taking the time to plan out his approach to the the project methodically.

 I have been working in credit risk model development in the banking industry for approximately 10 years.

have been lucky to receive exposure to big data and data science through previous roles but decided I wanted to teach myself R and to improve my machine learning knowledge.

This gave me an understanding of what the commentators use when they are evaluating a good team and looked to make sure this information was included in my final model.

spent most of my time creating a linear model predicting the best teams based on their regular season results using the points’ difference as the target variable.

This included splitting the development data into a build and validation sample and leaving the test data provided for the last 4 years.

I stuck with a basic logistic regression technique for the model development and it appeared to work well.

I ran out of time to consider that if teams met in the final 4 regardless of their rating before the tournament it is likely to be close.

How to Kick Ass in Competitive Machine Learning

In it, David distills the advice from 3 Kaggle masters Tim Salimans, Steve Donoho and Anil Thomas and an analysis of the results from 10 competitions into an approach for performing well in Kaggle competitions and then tests these lessons by participating in 2 case study competitions.

Feature engineering is the data preparation step that involves the transformation, aggregation and decomposition of attributes into those features that best characterize the structure in the data for the modeling problem.

This extends to the scores observed on the leaderboard, which is an evaluation of the models on sample of a validation dataset (typically around 20%) used to identify competition winners.

Ensembles refer to the combination of the predictions from multiple models into a single set of predictions, typically a blend weighted by the skill of each contributing model (such as on the public leaderboard).

In this post you discovered a framework of 5 concerns when participating in competitive machine learning: feature engineering, overfitting, use of simple models, ensembles and predicting the right thing.

scikit-learn video #1: Intro to machine learning with scikit-learn

As a data science instructor and the founder of Data School, I spend a lot of my time figuring out how to distill complex topics like 'machine learning' into small, hands-on lessons that aspiring data scientists can use to advance their data science skills.

As a practitioner of machine learning, there's a lot to like about scikit-learn: It provides a robust set of machine learning models with a consistent interface, all of the functionality is thoughtfully designed and organized, and the documentation is thorough and well-written.

However, I personally believe that getting started with machine learning in scikit-learn is more difficult than in a language such R, as I explain here: In R, getting started with your first model is easy: read your data into a data frame, use a built-in model (such as linear regression) along with R's easy-to-read formula language, and then review the model's summary output.

My primary goal with this video series, 'Introduction to machine learning with scikit-learn', is to help motivated individuals to gain a thorough grasp of both machine learning fundamentals and the scikit-learn workflow.

(The series does presume basic familiarity with Python, though next week I'll suggest some resources for learning Python if you're new to the language.) For those who successfully master the basics (or are already intermediate-level scikit-learn users), my secondary goal is to dive into more advanced functionality later in the series.

12 week Data Science Bootcamp Students Machine Learning project demo Rossman Kaggle

Project Description: Our team, the Gradient Boosters, was challenged by Rossmann, the second largest chain of German drug stores, to predict the daily sales ...

data.bythebay.io: Anthony Goldbloom, What Kaggle has learned from 2MM machine learning models

Scalæ By the Bay 2016 conference

Beginner's Guide to Machine Learning Competitions

This tutorial will offer a hands-on introduction to machine learning and the process of applying these concepts in a Kaggle competition. We will introduce ...

Allstate Purchase Prediction Challenge on Kaggle

This is a presentation I gave about my participation in Kaggle's "Allstate Purchase Prediction Challenge." Kaggle is a website that hosts machine learning ...

Kaggle Workshop – Branden Murray and Mark Landry

Branden and Dmitry are ranked 55 & 56 respectively in the top 100 Kagglers, worldwide and have a wealth of experience on winning data science competitions.

Introduction to Data Science with R - Data Analysis Part 1

Part 1 in a in-depth hands-on tutorial introducing the viewer to Data Science with R programming. The video provides end-to-end data science training, including ...

Which Python Package Manager Should You Use?

In this episode of AI Adventures, Yufeng discusses some of the options available when it comes to managing your Python environment for machine learning and ...

David Chudzicki, Christine Doig - Winning Machine Learning Competitions With Scikit-Learn

"Speaker: Ben Hamner This tutorial will offer an introduction machine learning and how to apply it to a Kaggle competition. We will cover methodologies that ...

The Kaggle Challenge - Higgs Boson

This video is part of the Udacity course "Deep Learning". Watch the full course at

Kaggle Founder's Talk: What Kaggle Has Learned from 2MM Machine Learning Models

Anthony Goldbloom is the founder and CEO of Kaggle. In 2011 & 2012, Forbes Magazine named Anthony as one of the 30 under 30 in technology, in 2013 the ...