Ensemble Learning in machine learning integrates multiple models called weak learners to create a single effective model for prediction. This technique is used to enhance accuracy, minimizing variance and removing overfitting. Here we will learn different ensemble techniques and their algorithms.
Main types of ensemble models
1. Bagging - Bootstrap Aggregating.**
Bagging is a technique that involves creating multiple versions of a model and combining their outputs to improve overall performance.
In bagging several base models are trained on different subsets of the training data, then aggregate their predictions to make the final decision. The subsets of the data are created using bootstrapping, a statistical technique where samples are drawn with replacement, meaning some data points can appear more than once in a subset.
The final prediction from the ensemble is typically made by either:
- Averaging the predictions (for regression problems), or
- Majority voting (for classification problems).
Common Bagging Algorithms.
1. Random Forest.
- Random forest is an ensemble method based on decision trees. Multiple decision trees are trained using different bootstrapped samples of the data.
- In addition to bagging, Random Forest also introduces randomness by selecting a random subset of features at each node, further reducing variance and overfitting.
Loading data from sklearn datasets.
#loading data
from sklearn.datasets import load_iris
data = load_iris()
splitting data.
#split data
X = data.data
y = data.target
from sklearn.model_selection import train_test_split
X_train,X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state= 42)
Training the model on random forest.
#Training the random forest
from sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(
n_estimators=100, #number of trees
max_depth= 5,
random_state= 42)
rf_model.fit(X_train,y_train)
Predict.
# predict
from sklearn.metrics import accuracy_score, classification_report
print(f"Train accuracy: {accuracy_score(y_train, rf_model.predict(X_train)):.2%}")
print(f"Test accuracy: {accuracy_score(y_test, rf_model.predict(X_test)):.2%}")
2. Decision Trees.
In Bagged Decision Trees, multiple decision trees are trained using bootstrapped samples of the data.
Each tree is trained independently and the final prediction is made by averaging the predictions of all the trees in the ensemble.
Splitting data.
X = data.data
y = data.target
from sklearn.model_selection import train_test_split
X_train,X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 42)
Training model.
unconstrained tree (will overfit on the data)
from sklearn.tree import DecisionTreeClassifier, plot_tree
tree_big = DecisionTreeClassifier(random_state = 42)
tree_big.fit(X_train,y_train)
constrained tree (better generalization)
tree_small = DecisionTreeClassifier(max_depth = 3, random_state = 42)
tree_small.fit(X_train,y_train)
Predictions
print ("Constrained tree results")
print(f"Train accuracy: {accuracy_score(y_train, tree_small.predict(X_train)):.2%}")
print(f"Test accuracy: {accuracy_score(y_test, tree_small.predict(X_test)):.2%}")
Visualization
# Vizualization
import matplotlib.pyplot as plt
plt.figure (figsize= (15,6))
plot_tree(tree_small, feature_names=data.feature_names, class_names=data.target_names, filled= True, rounded= True, fontsize=10)
plt.title('decision tree(depth = 3)')
plt.show()
2. Boosting.
Boosting is an ensemble technique where multiple models are trained sequentially, with each new model attempting to correct the errors made by the previous ones.
Boosting focuses on adjusting the weights of incorrectly classified data points, so the next model pays more attention to those difficult cases. By combining the outputs of these models, boosting typically improves the accuracy of the final prediction.
Common Boosting Algorithms.
1. AdaBoost - Adaptive Boosting.
AdaBoost works by adjusting the weights of misclassified instances and combining the predictions of weak learners (usually decision trees). Each subsequent model is trained to correct the mistakes of the previous model.
AdaBoost can significantly improve the performance of weak models, especially when used for classification problems.
2. Gradient Boosting.
Gradient Boosting is a more general approach to boosting that builds models sequentially, with each new model fitting the residual errors of the previous model.
The models are trained to minimize a loss function, which can be customized based on the specific task.
We can perform regression and classification tasks using Gradient Boosting.
loading data.
#loading data
from sklearn.datasets import load_iris
iris = load_iris()
splitting data.
#split data
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split (iris.data, iris.target, test_size = 0.2, random_state = 42)
Gradient boosting training.
#Gradient boosting training
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score
gb = GradientBoostingClassifier(
n_estimators = 100,
learning_rate = 0.1, # how much each tree contributes
max_depth = 3, # shallow trees works best for boosting
random_state = 42
)
gb.fit(X_train, y_train)
print(f"Gradient boosting accuracy score:{accuracy_score(y_test, gb.predict(X_test)):.2%}")
3. XGBoost - Extreme Gradient Boosting.
XGBoost is an optimized version of gradient boosting. It includes regularization to prevent overfitting and supports parallelization to speed up training.
XGBoost has become a popular choice in machine learning competitions due to its high performance.
#create model XGBClassifier. eXtreme Gradient Boosting Classifier.
from xgboost import XGBClassifier
xgb = XGBClassifier(
n_estimators = 100,
learning_rate = 0.1,
max_depth = 3,
eval_metric = 'mlogloss',
random_state = 42,
verbosity = 0
)
#model training
xgb.fit(X_train, y_train)
print(f"XGBoost accuracy score: {accuracy_score(y_test,xgb.predict(X_test)):.2%}")
3. Stacking.
Stacking combines multiple models of different types, where each model makes independent predictions and a meta-model is trained to combine these predictions. Instead of simply averaging or voting, as in bagging and boosting, stacking trains a higher-level model (meta-model) to learn how to best combine the predictions of the base models.
In stacking, the base models are trained on the original data and their predictions are then used as features for the meta-model, which learns how to combine them effectively. The final prediction is made by the meta-model based on the combined outputs of all the base models.
Meta-model is a model that learns how to combine the predictions of the base models to generate the final output.
Common Stacking Algorithms.
1. Generalized Stacking.
In Generalized Stacking, multiple different models (e.g., decision trees, logistic regression, neural networks) are trained on the same dataset.
And a meta-model (such as a logistic regression or another decision tree) is trained on the predictions made by th andese base models.
The meta-model learns how to combine the predictions of the base models to make the final prediction.
2. Stacking with Cross-Validation.
In stacking with cross-validation, the base models are trained using cross-validation and their predictions on the validation set are used to train the meta-model.
This prevents overfitting and ensures that the meta-model is trained on unbiased data.
3. Multi-Layer Stacking
Multi-layer stacking involves multiple levels of base models, where the outputs of the first level of models are fed into a second level of base models and so on, before reaching the meta-model.
This approach creates a more complex ensemble that can capture a wider variety of patterns in the data.


Top comments (0)