AI News, The Power of Ensembles in Machine Learning

The Power of Ensembles in Machine Learning

Amit and Bargava discusses various strategies to build ensemble models, demonstrating how to combine model outputs from various base models (logistic regression, support vector machines, decision trees, etc.) to create a stronger, better model output.

Using an example they covers bagging, boosting, voting and stacking to explore where ensemble models can consistently produce better results when compared against the best-performing single models.

Boosting and Bagging: How To Develop A Robust Machine Learning Algorithm

by Ben Rogojan Machine learning and data science require more than just throwing data into a python library and utilizing whatever comes out.

Data scientists need to actually understand the data and the processes behind the data to be able to implement a successful system.

Except, instead of having several people who are singing at different octaves to create one beautiful harmony (each voice filling in the void of the other).

Ensemble learning uses hundreds to thousands of models of the same algorithm that work together to find the correct classification.

Using techniques like boosting and bagging has lead to increased robustness of statistical models and decreased variance.

This allows the model or algorithm to get a better understanding of the various biases, variances and features that exist in the resample.

Bootstrapping can be a solution in this case because algorithms that utilize bootstrapping can be more robust and handle new data sets depending on the methodology chosen(boosting or bagging) The reason to use the bootstrap method is because it can test the stability of a solution.

By using multiple sample data sets and then testing multiple models, it can increase robustness.

Most any paper or post that references using bagging algorithms will also reference Leo Breiman who wrote a paper in 1996 called “Bagging Predictors”.

Where Leo describes bagging as: “Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor.” What Bagging does is help reduce variance from models that are might be very accurate, but only on the data they were trained on.

Thus, if the data set is changed to a new data set that might have some bias or difference in spread of underlying features compared to the previous set.

Bagging gets around this by creating it’s own variance amongst the data by sampling and replacing data while it tests multiple hypothesis(models).

In turn, this reduces the noise by utilizing multiple samples that would most likely be made up of data with various attributes(median, average, etc).

Boosting refers to a group of algorithms that utilize weighted averages to make weak learners into stronger learners.

In boosting, the model’s error rates are kept track of because better models are given better weights.

That way, when the “voting” occurs, like in bagging, the models with better outcomes have a stronger pull on the final output.

Depending on the results an algorithm is getting, and what support is there, drives the final algorithm Do you need a team of experienced data specialist to come in and help develop a data driven team or design and integrate a new system?

5 Easy questions on Ensemble Modeling everyone should know

If you’ve ever participated in data science competitions, you must be aware of the pivotal role that ensemble modeling plays.

After analyzing various data science forums, I have identified the 5 most common questions related to ensemble modeling. These questions are highly relevant to data scientists new to ensemble modeling.

Solution: We can generate various rules for classification of spam emails, let’s look at the some of them: Above, I’ve listed some common rules for filtering the SPAM e-mails.

If you want to relate this to real life, a group of people are likely to make better decisions compared to individuals, especially when group members come from diverse background.

good example of how ensemble methods are commonly used to solve data science problems is the random forest algorithm (having multiple CART models).

It performs better compared to individual CART model by classifying a new object where each tree gives “votes” for that class and the forest chooses the classification having the most votes (over all the trees in the forest).

Being an iterative process, it continues to add classifier learner until a limit is reached in the number of models or accuracy. Boosting has shown better predictive accuracy than bagging, but it also tends to over-fit the training data as well.  Most common example of boosting is AdaBoost and Gradient Boosting.

Second, a new learner is used to combine their predictions with the aim of reducing the generalization error.  Yes, we can combine multiple models of same ML algorithms, but combining multiple predictions generated by different algorithms would normally give you better predictions.

For example, the predictions of a random forest, a KNN, and a Naive Bayes may be combined to create a stronger final prediction set as compared to combining three random forest model.

I am listing some of the methods below: You can also look at the winning solution of Kaggle / data science competitions to understand other methods to deal with this challenge.

In this article, we have looked at the 5  frequently asked questions on Ensemble models. While answering these questions, we have discussed about “Ensemble Models”, “Methods of Ensemble”, “Why should we ensemble diverse models?”, “Methods to identify optimal weight for ensemble”

How to Build an Ensemble Of Machine Learning Algorithms in R (ready to use boosting, bagging and stacking)

It assumes you are generally familiar with machine learning algorithms and ensemble methods and that you are looking for information on how to create ensembles with R.

There are three main techniques that you can create an ensemble of machine learning algorithms in R: Boosting, Bagging and Stacking. In this section, we will look at each in turn.

This dataset describes high-frequency antenna returns from high energy particles in the atmosphere and whether the return shows structure or not.

We can look at two of the most popular boosting machine learning algorithms: Below is an example of the C5.0 and Stochastic Gradient Boosting (using the Gradient Boosting Modeling implementation) algorithms in R.

Given a list of caret models, the caretStack() function can be used to specify a higher-order model to learn how to best combine the predictions of sub-models together.

Let’s first look at creating 5 sub-models for the ionosphere dataset, specifically: Below is an example that creates these 5 sub-models.

Note the new helpful caretList() function provided by the caretEnsemble package for creating a list of standard caret models.

Comparison of Sub-Models for Stacking Ensemble in R When we combine the predictions of different models using stacking, it is desirable that the predictions made by the sub-models have low correlation.

This would suggest that the models are skillful but in different ways, allowing a new classifier to figure out how to get the best from each model for an improved score.

If the predictions for the sub-models were highly corrected (>0.75) then they would be making the same or very similar predictions most of the time reducing the benefit of combining the predictions.

The caret and the caretEnsemble package allow you start creating and experimenting with ensembles even if you don’t have a deep understanding of how they work.

You can always adapt it to your specific cases or try out new ideas with custom code at a later time.

You discovered three types of ensembles of machine learning algorithms that you can build in R: You can use the code in this case study as a template on your current or next machine learning project in R.

How to build Ensemble Models in machine learning? (with code in R)

Over the last 12 months, I have been participating in a number of machine learning hackathons on Analytics Vidhya and Kaggle competitions.

If you are starting with machine learning, I would advise you to lay emphasis on these two areas as I have found them equally important to do well in a machine learning.

Most of the time, I was able to crack the feature engineering part but probably didn’t use the ensemble of multiple models. If you are a beginner, it’s even better to get familiar with ensembling as early as possible.

In general, ensembling is a technique of combining two or more algorithms of similar or dissimilar types called base learners.

Once we have these multiple bootstrapped samples, we can grow trees for each of these bootstrapped samples and use the majority vote or averaging concepts to get the final prediction.

Now, random forest actually uses this concept but it goes a step ahead to further reduce the variance by randomly choosing a subset of features as well for each bootstrapped sample to make the splits while training.

Here, we have two layers of machine learning models: Here, we have used only two layers but it can be any number of layers and any number of models in each layer.

Two of the key principles for selecting the models: One thing that you might have realized is that we have used the top layer model which takes as input the predictions of the bottom layer models.

We now define the training controls and the predictor and outcome variables: Now let’s get started with training a random forest and test its accuracy on the test set that we have created: Well, as you can see, we got 0.81 accuracy with the individual random forest model.

Now, let’s try out different ways of forming an ensemble with these models as we have discussed: Before proceeding further, I would like you to recall about two important criteria that we previously discussed on individual model accuracy and inter-model prediction correlation which must be fulfilled.

We can use linear regression for making a linear formula for making the predictions in regression problem for mapping bottom layer model predictions to the outcome or logistic regression similarly in case of classification problem.

Moreover, we don’t need to restrict ourselves here, we can also use more complex models like GBM, neural nets to develop a non-linear mapping from the predictions of bottom layer models to the outcome.

Remember, the following steps that we’ll take: One extremely important thing to note in step 2 is that you should always make out of bag predictions for the training data, otherwise the importance of the base layer models will only be a function of how well a base layer model can recall the training data.

Ensembling is a very popular and effective technique that is very frequently used by data scientists for beating the accuracy benchmark of even the best of individual algorithms.

Ensemble Machine Learning Algorithms in Python with scikit-learn

It assumes you are generally familiar with machine learning algorithms and ensemble methods and that you are looking for information on how to create ensembles in Python.

Each ensemble algorithm is demonstrated using 10 fold cross validation, a standard technique used to estimate the performance of any machine learning algorithm on unseen data.

Bootstrap Aggregation or bagging involves taking multiple samples from your training dataset (with replacement) and training a model for each sample.

The three bagging models covered in this section are as follows: Bagging performs best with algorithms that have high variance.

Samples of the training dataset are taken with replacement, but the trees are constructed in a way that reduces the correlation between individual classifiers.

Specifically, rather than greedily choosing the best split point in the construction of the tree, only a random subset of features are considered for each split.

The example below provides an example of Random Forest for classification with 100 trees and split points chosen from a random selection of 3 features.

The example below provides a demonstration of extra trees with the number of trees set to 100 and splits chosen from 7 random features.

It generally works by weighting instances in the dataset by how easy or difficult they are to classify, allowing the algorithm to pay or or less attention to them in the construction of subsequent models.

More advanced methods can learn how to best weight the predictions from submodels, but this is called stacking (stacked aggregation) and is currently not provided in scikit-learn.

The code below provides an example of combining the predictions of logistic regression, classification and regression trees and support vector machines together for a classification problem.

The Power of Ensembles - Machine Learning

Creating better models is a critical component to building a good data science product. It is relatively easy to build a first-cut machine-learning model, but what ...

Machine Learning: Python and the Power of Ensembles by Bargava Raman Subramanian

"It is relatively easy to build a first-cut machine learning model. But what does it take to build a reasonably good model, or even a state-of-art model ? Ensemble ...

Bargava Subramanian - Machine Learning: Power of Ensembles

Bargava Subramanian - Machine Learning: Power of Ensembles [EuroPython 2016] [22 July 2016] [Bilbao, Euskadi, Spain] ...

Bootstrap aggregating bagging

This video is part of the Udacity course "Machine Learning for Trading". Watch the full course at

Ensemble Learning Boosting - Georgia Tech - Machine Learning

Watch on Udacity: Check out the full Advanced Operating Systems course for free ..

17. Learning: Boosting

MIT 6.034 Artificial Intelligence, Fall 2010 View the complete course: Instructor: Patrick Winston Can multiple weak classifiers be ..


This video is part of the Udacity course "Machine Learning for Trading". Watch the full course at

Weka Tutorial 13: Stacking Multiple Classifiers (Classification)

In this tutorial I have shown how to use Weka for combining multiple classification algorithms. Both ensembles (bagging and boosting) and voting combining ...

Open Source TensorFlow Models (Google I/O '17)

Come to this talk for a tour of the latest open source TensorFlow models for Image Classification, Natural Language Processing, and Computer Generated ...

Simon Cheeseman discusses ETI project to boost the cost-effectiveness wave energy converter arrays

The Energy Technologies Institute (ETI) is looking to boost the cost-effectiveness of large-scale wave energy converter arrays in UK waters through a £1.4m ...