Ensemble Learning & its Methods in Machine Learning

Reading Time: 5 minutes

What is ensemble learning ?

Ensemble models in machine learning combine the decisions from multiple models to improve the overall performance. They operate on the similar idea as employed while buying headphones.

The main causes of error in learning models are due to noise, bias and variance.

Ensemble methods help to minimize these factors. These methods are designed to improve the stability and the accuracy of Machine Learning algorithms.


Scenario 1: Let’s suppose that you have developed a health and fitness app. Before making it public, you wish to receive critical feedback to close down the potential loopholes, if any. One can resort to one of the following methods, read and decide which method is the best:

  • Can take the opinion of your spouse or your closest friends.
  • Ask a bunch of your friends and office colleagues.
  • Launch a beta version of the app and receive feedback from the web development
    community and non-biased users.

No brownie points for guessing the answer : Yes, of course we will roll with the third option. Now, pause and think what you just did. You took multiple opinions from a large enough bunch of people and then made an informed decision based on them. This is what Ensemble methods also do.

Ensemble models in machine learning combine the decisions from multiple models to improve the overall performance. Ensembles are a divide and conquer approach used to improve performance.

Ensemble Model Types

  1. Simple : Max Voting, Averaging, Weighted Averaging
  2. Advanced : Stacking, Blending, Bootstrap Sampling , Other are Bagging & Boosting

Simple Ensemble Methods

Max Voting

The max voting method is generally used for classification problems. In this technique, multiple models are used to make predictions for each data point. The predictions by each model are considered as ‘vote’. The predictions that we get from majority of the models are used as the final predictions.

In short, it considers the mode of all predictions.The result of max voting will be like –

Model 1Model 2Model 3Model 4Model 5Final Result
545444

Average Voting

Similar to max voting, multiple predictions are made for each data point in averaging. In this method, we take average of predictions from all the models and use it to make the final predictions. Averaging can be used for making predictions in regression problems or while calculating probabilities of classification problems.

Thus the averaging method will take average of all values.

Model 1Model 2Model 3Model 4Model 5Final Result
545444.4

Weighted Average

This is extension of averaging method. All models are assigned different weights defining the importance of each model for prediction.

For example : [(5*0.23) +(4*0.23) + (5*0.18) + (4*0.18) + (4*0.18)] = 4.41

Model 1Model 2Model 3Model 4Model 5Final Result
weights0.230.230.180.180.18
predictions545444.41

Advanced Ensemble Methods

Bagging – Bootstrap Aggregating

Bootstrap Aggregating is an ensemble method. First, we create random samples of the training data set with replacement (sub sets of training data set). Then, we build a model (classifier or Decision tree) for each sample. Finally, results of these multiple models are combined using average or majority voting.

As each model is exposed to a different subset of data and we use their collective output at the end, so we are making sure that problem of over-fitting is taken care of by not clinging too closely to our training data set. Thus, Bagging helps us to reduce the variance error.

Combinations of multiple models decreases variance, especially in the case of unstable models, and may produce a more reliable prediction than a single model.
Random forest technique actually uses this concept but it goes a step ahead to further reduce the variance by randomly choosing a subset of features as well for each bootstrapped sample to make the splits while training.

Bagging Algorithms :

  1. Bagging meta-estimator
  2. Random Forest

Boosting

Boosting is an iterative technique which adjusts the weight of an observation based on the last classification. If an observation was classified incorrectly, it tries to increase the weight of this observation and vice versa.
Boosting in general decreases the bias error and builds strong predictive models.


Boosting has shown better predictive accuracy than bagging, but it also tends to over- fit the training data as well.Thus, parameter tuning becomes a crucial part of boosting algorithms to make them avoid over-fitting.

Boosting is a sequential technique in which, the first algorithm is trained on the entire data set and the subsequent algorithms are built by fitting the residuals of the first algorithm, thus giving higher weight to those observations that were poorly predicted by the previous model.
It relies on creating a series of weak learners each of which might not be good for the entire data set but is good for some part of the data set. Thus, each model actually boosts the performance of the ensemble.

Boosting Algorithms –

  1. AdaBoost
  2. GBM
  3. XGBoost
  4. LightGBM
  5. CatBoost

Difference between Bagging and Boosting

Similarities between Bagging and Boosting. Both use voting method and combines model of same type

Differentiating points between Bagging and Boosting are –

  1. In Bagging, Individual models are built separately. In Boosting, each new model is influenced by the performance of those built previously.
  2. Equal weights are given to all models in Bagging. While in Boosting, model’s weight is the contribution on basis of its performance.

Advantages of Ensemble Methods

  1. More accurate prediction results- We can compare the working of the ensemble methods to the Diversification of our financial portfolios. It is advised to keep a mixed portfolio across debt and equity to reduce the variability and hence, to minimize the risk. Similarly, the ensemble of models will give better performance on the test case scenarios (unseen data) as compared to the individual models in most of the cases.
  2. Stable and more robust model- The aggregate result of multiple models is always less noisy than the individual models. This leads to model stability and robustness.
  3. Ensemble models capture linear as well as non-linear relationships in the data.Its accomplished by using 2 different models and forming an ensemble of the two.

Disadvantages of Ensemble Methods

  1. Reduction in model interpret-ability- Using ensemble methods reduces the model interpret- ability due to increased complexity and makes it very difficult to draw any crucial business insights at the end.
  2. Computation and design time is high
  3. The selection of models for creating an ensemble is an art which is really hard to master.

Conclusion

Here we learned what ensemble learning is and went through some of its methods. Here is the sample code that contains implementation of different Algorithms mentioned above. Happy Learning !

https://github.com/Mayura-Zadane/ensemble-learning

References/Further Learning:

https://towardsdatascience.com/basic-ensemble-learning-random-forest-adaboost-gradient-boosting-step-by-step-explained-95d49d1e2725

Written by 

Working as a Sr. Software Consultant AI/ML at Knoldus. Like exploring more of Data Science and its related technology. Current learning areas are Natural Language Processing, Deep Learning and Artificial Intelligence.