Blending vs Stacking

Introduction

Stacking and Blending are two powerful and popular ensemble methods in machine learning. They are very similar, with the difference around how to allocate the training data. They are most noticeable for their popularity and performance in winning Kaggle competitions.

Stacking

Stacking or stacked generalisation was introduced by Wolpert. In essence, stacking predicts by using a meta-model trained from a pool of base models. The base models are trained using training data and asked to give their prediction; a different meta-model is then trained to use the outputs from base models to give the final prediction.

How Stacking Works

You have Train Data and Test Data. Assume we use 4-fold cross-validation to train base models; the train_data is then divided into 4 parts.
Using the 4-part train_data, the 1st base model (assuming it's a decision tree) is fitted on 3 parts and predictions are made for the 4th part. This is done for each part of the training data.
Model 1 (decision tree) is fitted to all the training data. The trained model will be used to predict Test Data.
Steps 2 to 3 are repeated for the 2nd model (e.g. KNN) and the 3rd model (e.g. SVM). These will give train_data and test_data two more features from the predictions, pred_m2 and pred_m3.
to train the meta-model (assume it's a logistic regression), we use only the newly added features from the base models, which are [pred_m1, pred_m2, pred_m3]. Fit this meta-model on train_data.
The final prediction for test_data is given by the trained metamodel.

Example (Python):

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.Metrics import accuracy_score
from sklearn.datasets import make_classification
def stack_models(base_models, meta_model, X_train, y_train, X_test):
    base_model_predictions = np.zeros((len(X_train), len(base_models)))
    For i, model in enumerate(base_models):
        model.fit(X_train, y_train)
        base_model_predictions[:, i] = model.predict(X_train)
    meta_model.fit(base_model_predictions, y_train)
    stacked_X_test = np.zeros((len(X_test), len(base_models)))
    for i, model in enumerate(base_models):
        stacked_X_test[:, i] = model.predict(X_test)
    stacked_preds = meta_model.predict(stacked_X_test)
    return stacked_preds
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
rf_model = RandomForestClassifier(n_estimators=50, random_state=42)
gb_model = GradientBoostingClassifier(n_estimators=50, random_state=42)
meta_model = LogisticRegression()
stacked_preds = stack_models([rf_model, gb_model], meta_model, X_train, y_train, X_test)
accuracy_stacked = accuracy_score(y_test, stacked_preds)
print(f"Accuracy of the stacked model: {accuracy_stacked}")

Output:

Accuracy of the stacked model: 0.88

Blending

Blending is very similar to Stacking. It also uses base models to provide base predictions as new features, and a new meta-model is trained on the new features that give the final prediction. The only difference is that training of the meta-model is applied on a separate holdout set (e.g. 10% of train_data) rather than on a full and folded training set.

How Blending Works

The train set is split into training and validation sets.
We train the base models on the training set.
We make predictions only on the validation and test sets.
The validation predictions are used as features to build a new model.
This model makes final predictions on the test set using the prediction values as features.

Example (Python):

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.Metrics import accuracy_score
from sklearn.datasets import make_classification
def blend_models(models, X_train, y_train, X_test):
    predictions = np.zeros((len(X_test), len(models)))
    for i, model in enumerate(models):
        model.fit(X_train, y_train)
        predictions[:, i] = model.predict(X_test)
    blended_preds = np.mean(predictions, axis=1)
    return blended_preds
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
rf_model = RandomForestClassifier(n_estimators=50, random_state=42)
gb_model = GradientBoostingClassifier(n_estimators=50, random_state=42)
blended_preds = blend_models([rf_model, gb_model], X_train, y_train, X_test)
blended_preds_binary = [1 if pred >= 0.5 else 0 for pred in blended_preds]
accuracy_blended = accuracy_score(y_test, blended_preds_binary)
print(f"Accuracy of the blended model: {accuracy_blended}")

Output:

Accuracy of the blended model: 0.885

Advantages and Disadvantages

Stacking

Advantages

Performance: Stacking often performs better than any single base model.
Diversity: It can combine different models, making it flexible to various datasets.

Disadvantages

Complexity: Stacking adds a layer of complexity to your model.
Computationally expensive: It requires training multiple models, which can be computationally expensive.

Blending

Advantages

Simplicity: Blending is simpler than stacking as it avoids the need for cross-validation.
Less leakage: There is less chance of data leakage compared to stacking.

Disadvantages

Use of Data: Unlike stacking, blending uses a holdout set, which may result in under-utilization of the data.
Performance: It may not perform as well as stacking when the number of base models is large.

When to Use Stacking or Blending

The preference among stacking and mixing relies on the specific problem and the computational assets. Stacking is typically favoured for better performance if computational assets and time are not subject. However, blending may be a higher desire when you have many base models or are worried about information leakage.

Variations of Stacking

There are numerous versions of stacking, including weighted stacking, in which the predictions of the base fashions are weighted primarily based on their performance. Another variation is stacking with characteristic choice, in which a subset of the bottom model predictions is used as a capability for the meta-version.

Variations of Blending

Blending can also be varied by converting the holdout set's size or by using one-of-a-kind holdout sets for specific base models. This can help to reduce overfitting and enhance the overall performance of the blending model.

Conclusion

Both Stacking and Blending are effective ensemble strategies that may enhance the performance of gadget learning models. They paint by combining the predictions of more than one base style to make the last prediction. The preference between stacking and blending relies on the specific necessities of your system, getting to know the hassle and the available assets.

Next TopicBloom Filters

← prev next →