Introduction to Ensemble Learning

Reading Time: 4 minutes

Ensemble methods are techniques that create multiple models and then combine them to produce improved results. Ensemble learning usually produces more accurate solutions than a single model would. This has been the case in a number of machine learning competitions and, where the winning solutions used ensemble methods.

Ensemble methods

You must ensure that your models are independent of one another and when creating a machine learning ensemble so (or as independent of each other as possible). One method is to build your ensemble from different algorithms, as shown in the preceding example.

Another ensemble method is to use instances of the same machine learning algorithms and train them on different data sets. Because for instance, you can create an ensemble composed of 12 linear regression models, each trained on a subset of your training data.

There are two key methods for sampling data from your training set. “Bootstrap aggregation,” aka “bagging,” takes random samples from the training set “with replacement.” The other method, “pasting,” draws samples “without replacement.” After training all your machine learning models, you’ll have to choose an aggregation method. If you’re tackling a classification problem, the usual aggregation method is “statistical mode,” or the class whose prediction is more than others. In regression problems, ensembles usually use the average of the predictions made by the models.

Simple Ensemble Techniques

Majority Voting

Each model makes one prediction (voting) for each test instance, and the final output prediction is. The maximum voting method is commonly use for classification questions. This technique uses multiple models to make predictions for each data point. Predictions from each model are “votes”. The predictions obtained from the majority of models are use as the final prediction. This is a common technique. As a final prediction, you can try the prediction with the highest number of votes (even if it is less than half the number of votes). In some articles, this method may be reference to as a “majority vote.”

Simple Averaging

In simple averaging method, for every instance of test dataset, the average predictions are calculated. This method often reduces overfit and creates a smoother regression model. The following pseud code shows this simple averaging method:

final_predictions = []
for row_number in len(predictions):
        mean(prediction[row_number, ])

Weighted Averaging

Weighted averaging is slightly modified version of simple averaging, where the prediction of each model is multiplied by the weight and then their average is calculated. The following pseudocode code shows the weighted averaging:

weights = [..., ..., ...] #length is equal to len(algorithms)
final_predictions = []
for row_number in len(predictions):
        mean(prediction[row_number, ]*weights)

Advanced Ensemble techniques

Stacking Ensemble Learning

Stacking is an ensemble learning technique that builds a new model by combining predictions from multiple models. This model makes predictions on the test set. Here’s a step-by-step breakdown of a simple stacked ensemble:

1. First Split the train set in 10 parts.

2. Then base model (suppose a decision tree) is fitted on 9 parts and predictions are made for the 10th part, this is done for each part of the train set.

3. The base model (in this case, decision tree) is then fitted on the whole train dataset.

4. Using this model the predictions is for the test sets completes

5 . Steps 2–4 are repeated for a different base model (say, knn), yielding a new set of predictions for the train and test sets.

6. The predictions from the training set will used as features to build a new model.

7. We use this model to make final predictions.


The idea of bagging is then simple: we want to fit several independent models and “average” their predictions in order to obtain a model with a lower variance. However, we can’t, in practice, fit fully independent models because it would require too much data. So, we rely on the good “approximate properties” of bootstrap samples (representatively and independence) to fit models that are almost independent. Visit this link to learn more about Bootstrap aggregation.

  1. Multiple subsets are design from the original dataset, selecting observations with replacement.
  2. A base model is design on each of these subsets.
  3. The models run in parallel and are independent of each other.
  4. The final predictions are combining the predictions from all the models.

Boosting Ensemble

Another popular ensemble technique is “boost”. So in the traditional ensemble , machine learning models are trained in parallel, boosting methods train them sequentially, and each new model builds on the previous model, by eliminating its inefficiencies.

AdaBoost (short for “adaptive boosting”), one of the more popular boosting methods, improves the accuracy of ensemble models by adapting new models to the mistakes of previous ones. After training your first machine learning model, you single out the training examples misclassified or wrongly predicted by the model. When training the next model, you put more emphasis on these examples. This results in a machine learning model that performs better where the previous one failed. The process repeats itself for as many models you want to add to the ensemble. The final ensemble contains several machine learning models of different accuracies, which together can provide better accuracy.


In conclusion ensemble techniques can win the competition in machine learning by developing sophisticated algorithms and producing highly accurate results, but they are often unfavorable in industries where interpretability is more important. Nevertheless, the effectiveness of these methods is undeniable and their benefits in the right application can be enormous.