Stacking in Machine Learning
There are many ways to ensemble models in machine learning, such as Bagging, Boosting, and stacking. Stacking is one of the most popular ensemble machine learning techniques used to predict multiple nodes to build a new model and improve model performance. Stacking enables us to train multiple models to solve similar problems, and based on their combined output, it builds a new model with improved performance.
In this topic, "Stacking in Machine Learning", we will discuss a few important concepts related to stacking, the general architecture of stacking, important key points to implement stacking, and how stacking differs from bagging and boosting in machine learning. Before starting this topic, first, understand the concepts of the ensemble in machine learning. So, let's start with the definition of ensemble learning in machine learning.
What is Ensemble learning in Machine Learning?
Ensemble learning is one of the most powerful machine learning techniques that use the combined output of two or more models/weak learners and solve a particular computational intelligence problem. E.g., a Random Forest algorithm is an ensemble of various decision trees combined.
Ensemble learning is primarily used to improve the model performance, such as classification, prediction, function approximation, etc. In simple words, we can summarise the ensemble learning as follows:
"An ensembled model is a machine learning model that combines the predictions from two or more models.”
There are 3 most common ensemble learning methods in machine learning. These are as follows:
However, we will mainly discuss Stacking on this topic.
Bagging is a method of ensemble modeling, which is primarily used to solve supervised machine learning problems. It is generally completed in two steps as follows:
Example: In the Random Forest method, predictions from multiple decision trees are ensembled parallelly. Further, in regression problems, we use an average of these predictions to get the final output, whereas, in classification problems, the model is selected as the predicted class.
Boosting is an ensemble method that enables each member to learn from the preceding member's mistakes and make better predictions for the future. Unlike the bagging method, in boosting, all base learners (weak) are arranged in a sequential format so that they can learn from the mistakes of their preceding learner. Hence, in this way, all weak learners get turned into strong learners and make a better predictive model with significantly improved performance.
We have a basic understanding of ensemble techniques in machine learning and their two common methods, i.e., bagging and boosting. Now, let's discuss a different paradigm of ensemble learning, i.e., Stacking.
Stacking is one of the popular ensemble modeling techniques in machine learning. Various weak learners are ensembled in a parallel manner in such a way that by combining them with Meta learners, we can predict better predictions for the future.
This ensemble technique works by applying input of combined multiple weak learners' predictions and Meta learners so that a better output prediction model can be achieved.
In stacking, an algorithm takes the outputs of sub-models as input and attempts to learn how to best combine the input predictions to make a better output prediction.
Stacking is also known as a stacked generalization and is an extended form of the Model Averaging Ensemble technique in which all sub-models equally participate as per their performance weights and build a new model with better predictions. This new model is stacked up on top of the others; this is the reason why it is named stacking.
Architecture of Stacking
The architecture of the stacking model is designed in such as way that it consists of two or more base/learner's models and a meta-model that combines the predictions of the base models. These base models are called level 0 models, and the meta-model is known as the level 1 model. So, the Stacking ensemble method includes original (training) data, primary level models, primary level prediction, secondary level model, and final prediction. The basic architecture of stacking can be represented as shown below the image.
Steps to implement Stacking models:
There are some important steps to implementing stacking models in machine learning. These are as follows:
Stacking Ensemble Family
There are some other ensemble techniques that can be considered the forerunner of the stacking method. For better understanding, we have divided them into the different frameworks of essential stacking so that we can easily understand the differences between methods and the uniqueness of each technique. Let's discuss a few commonly used ensemble techniques related to stacking.
This is one of the simplest stacking ensemble methods, which uses different algorithms to prepare all members individually. Unlike the stacking method, the voting ensemble uses simple statistics instead of learning how to best combine predictions from base models separately.
It is significant to solve regression problems where we need to predict the mean or median of the predictions from base models. Further, it is also helpful in various classification problems according to the total votes received for prediction. The label with the higher numbers of votes is referred to as hard voting, whereas the label that receives the largest sums of probability or lesser votes is referred to as soft voting.
The voting ensemble differs from than stacking ensemble in terms of weighing models based on each member's performance because here, all models are considered to have the same skill levels.
Member Assessment: In the voting ensemble, all members are assumed to have the same skill sets.
Combine with Model: Instead of using combined prediction from each member, it uses simple statistics to get the final prediction, e.g., mean or median.
Weighted Average Ensemble
The weighted average ensemble is considered the next level of the voting ensemble, which uses a diverse collection of model types as contributing members. This method uses some training datasets to find the average weight of each ensemble member based on their performance. An improvement over this naive approach is to weigh each member based on its performance on a hold-out dataset, such as a validation set or out-of-fold predictions during k-fold cross-validation. Furthermore, it may also involve tuning the coefficient weightings for each model using an optimization algorithm and performance on a holdout dataset.
Member Assessment: Weighted average ensemble method uses member performance based on the training dataset.
Combine With Model: It considers the weighted average of prediction from each member separately.
Blending is a similar approach to stacking with a specific configuration. It is considered a stacking method that uses k-fold cross-validation to prepare out-of-sample predictions for the meta-model. In this method, the training dataset is first to split into different training sets and validation sets then we train learner models on the training sets. Further, predictions are made on the validation set and sample set, where validation predictions are used as features to build a new model, which is later used to make final predictions on the test set using the prediction values as features.
Member Predictions: The blending stacking ensemble uses out-of-sample predictions on a validation set.
Combine With Model: Linear model (e.g., linear regression or logistic regression).
Super Learner Ensemble:
This method is quite similar to blending, which has a specific configuration of a stacking ensemble. It uses out-of-fold predictions from learner models and prepares a meta-model. However, it is considered a modified form of blending, which only differs in the selection of how out-of-sample predictions are prepared for the meta learner.
Summary of Stacking Ensemble
Stacking is an ensemble method that enables the model to learn how to use combine predictions given by learner models with meta-models and prepare a final model with accurate prediction. The main benefit of stacking ensemble is that it can shield the capabilities of a range of well-performing models to solve classification and regression problems. Further, it helps to prepare a better model having better predictions than all individual models. In this topic, we have learned various ensemble techniques and their definitions, the stacking ensemble method, the architecture of stacking models, and steps to implement stacking models in machine learning.