AI News, Lessons Learned From Benchmarking Fast Machine Learning Algorithms

Lessons Learned From Benchmarking Fast Machine Learning Algorithms

Gradient boosting is a machine learning technique that produces a prediction model in the form of an ensemble of weak classifiers, optimizing a differentiable loss function.

One of the most popular types of gradient boosting is boosted decision trees, that internally is made up of an ensemble of weak decision trees.

key challenge in training boosted decision trees is the computational cost of finding the best split for each leaf.

That way, the algorithm doesn’t need to evaluate every single value of the features to compute the split, but only the bins of the histogram, which are bounded.

Originally XGBoost was based on a level-wise growth algorithm, but recently has added an option for leaf-wise growth that implements split approximation using histograms.

It is based on a leaf-wise algorithm and histogram approximation, and has attracted a lot of attention due to its speed (Disclaimer: Guolin Ke, a co-author of this blog post, is a key contributor to LightGBM).

In all experiments, we found XGBoost and LightGBM had similar accuracy metrics (F1-scores are shown here), so we focused on training times in this blog post.

This is due to the large size of the datasets, as well as the large number of features, which causes considerable memory overhead for XGBoost hist.

Finally, between LightGBM and XGBoost, we found that LightGBM is faster for all tests where XGBoost and XGBoost hist finished, with the biggest difference of 25 times for XGBoost and 15 times for XGBoost hist, respectively.

As expected, with small datasets, the additional IO overhead of copying the data between RAM and GPU memory overshadows the speed benefits of running the computation on GPU.

Here, we did not observe any performance gains in using  XGBoost hist on GPU.  As a side note, the standard implementation of XGBoost (exact split instead of histogram based) does not benefit from GPU either, as compared to multi-core CPU, per this recent paper.

The significant speed advantage of LightGBM translates into the ability to do more iterations and/or quicker hyperparameter search, which can be very useful if you have a limited time budget for optimizing your model or want to experiment with different feature engineering ideas.

Lessons Learned From Benchmarking Fast Machine Learning Algorithms

Gradient boosting is a machine learning technique that produces a prediction model in the form of an ensemble of weak classifiers, optimizing a differentiable loss function.

One of the most popular types of gradient boosting is boosted decision trees, that internally is made up of an ensemble of weak decision trees.

key challenge in training boosted decision trees is the computational cost of finding the best split for each leaf.

That way, the algorithm doesn’t need to evaluate every single value of the features to compute the split, but only the bins of the histogram, which are bounded.

Originally XGBoost was based on a level-wise growth algorithm, but recently has added an option for leaf-wise growth that implements split approximation using histograms.

It is based on a leaf-wise algorithm and histogram approximation, and has attracted a lot of attention due to its speed (Disclaimer: Guolin Ke, a co-author of this blog post, is a key contributor to LightGBM).

In all experiments, we found XGBoost and LightGBM had similar accuracy metrics (F1-scores are shown here), so we focused on training times in this blog post.

This is due to the large size of the datasets, as well as the large number of features, which causes considerable memory overhead for XGBoost hist.

Finally, between LightGBM and XGBoost, we found that LightGBM is faster for all tests where XGBoost and XGBoost hist finished, with the biggest difference of 25 times for XGBoost and 15 times for XGBoost hist, respectively.

As expected, with small datasets, the additional IO overhead of copying the data between RAM and GPU memory overshadows the speed benefits of running the computation on GPU.

Here, we did not observe any performance gains in using  XGBoost hist on GPU.  As a side note, the standard implementation of XGBoost (exact split instead of histogram based) does not benefit from GPU either, as compared to multi-core CPU, per this recent paper.

The significant speed advantage of LightGBM translates into the ability to do more iterations and/or quicker hyperparameter search, which can be very useful if you have a limited time budget for optimizing your model or want to experiment with different feature engineering ideas.

Which algorithm takes the crown: Light GBM vs XGBOOST?

If you are an active member of the Machine Learning community, you must be aware of Boosting Machines and their capabilities.

Light GBM is a fast, distributed, high-performance gradient boosting framework based on decision tree algorithm, used for ranking, classification and many other machine learning tasks.

Since it is based on decision tree algorithms, it splits the tree leaf wise with the best fit whereas other boosting algorithms split the tree depth wise or level wise rather than leaf-wise.

So when growing on the same leaf in Light GBM, the leaf-wise algorithm can reduce more loss than the level-wise algorithm and hence results in much better accuracy which can rarely be achieved by any of the existing boosting algorithms.

Leaf wise splits lead to increase in complexity and may lead to overfitting and it can be overcome by specifying another parameter max-depth which specifies the depth to which splitting will occur.

following: Now before we dive head first into building our first Light GBM model, let us look into some of the parameters of Light GBM to have an understanding of its underlying procedures.

Our target is to predict whether a person makes <=50k or >50k annually on basis of the other information available.

There has been only a slight increase in accuracy and auc score by applying Light GBM over XGBOOST but there is a significant difference in the execution time for the training procedure.

Light GBM uses leaf wise splitting over depth wise splitting which enables it to converge much faster but also leads to overfitting.

catboost/benchmarks

CPU version we used dual-socket server with 2 Intel~Xeon CPU (E5-2650v2,~2.60GHz) and 256GB RAM and run CatBoost in 32 threads (equal to number of logical cores).

We use Epsilon dataset (400K samples for train, 100K samples for test) to give some insights of how fast our GPU implementation could train a model of fixed size.

For this dataset we measure mean tree construction time one can achieve without using feature subsampling and/or bagging by CatBoost and 2 open-source implementations of boosting with GPU support: XGBoost (we use histogram-based version, exact version is very slow) and LightGBM.

Such bin count gives the best performance and the lowest memory usage for LightGBM and CatBoost (Using 128-255 bin count usually leads both algorithms to run 2-4 times slower).

GPU implementation of CatBoost contains a mode based on classic scheme for those who need best training performance, in this benchmark we used classic scheme.

These times are very rough speed comparison, because training time of one tree construction depends on distribution of features and ensemble size.

Getting the most of xgboost and LightGBM speed: Compiler, CPU pinning

Currently, xgboost and LightGBM are the two best performing machine learning algorithms for large datasets (both in speed and metric performance).

xgboost and LightGBM were made primarily for speed: it is better to iterate quickly at high accuracy to try more different things, than waiting your neural network to finish after hours.

As we already know the answer to this question, we are going to look up for a more exotic situation: changing the compiler, and pinning CPU.

Microsoft Data Amp 2017 | Joseph Sirosh Keynote

Watch Joseph Sirosh, Corporate Vice President for the Data Group at Microsoft, give his keynote at Microsoft Data Amp 2017.

Microsoft Data Amp 2017 | Keynotes

Microsoft Data Amp features keynotes by Microsoft executives Scott Guthrie and Joseph Sirosh, who will demonstrate how data and intelligence can help you build breakthrough applications.