Ensemble Methods: Gradient Boosting

Ensemble

According to Webster the word ensemble means "a group producing a single effect" and that is exactly what we see in ensemble methods used in machine learning. With these methods we are combining the use of multiple learning algorithms, to gain better performance than we could with a single learning algorithm alone. The three main "meta-algorithms" that combine the weak learners are Bagging, Boosting and Stacking. If you have read my previous blog on Random Forest then you are already familiar with one of the methods for Bagging, though today we are going to focus on Gradient Boosting which as you can probably guess is a Boosting algorithm.

Weak Learners

Since Boosting is a way of combining weak learners we should probably go over what that is. A weak learner would be a simple model that's predictions would be slightly better than random chance. Common weak learners would be small decision trees or a decision stump (a single level decision tree), like I said earlier any model with a prediction rate slightly over 50% would be included.

Boosting

Boosting algorithms start with a single weak learner and then:

1) Train that weak learner
2) Checks to see what this weak learner got wrong
3) Builds another weak learner to improve upon the areas that the first one got wrong
4) Continues steps 2 and 3 until the model has reached a predetermined stopping point or the performance of the model has plateaued.

Gradient Boosting

Now that we have a layout on how boosting works it is simple to understand how gradient boosting works, by adding in some gradient decent. At step 2 where the model is checking which predictions the weak learner got wrong, a gradient boosting model will calculate the residuals for each data point to determine how far off the predictions were. The model can then combine these residuals with a loss function so that it can compute the gradient of the loss. From here the model can then move on to step 3 and use these gradients and loss metrics to improve the next model.

Example

For this example I will be using the Wine dataset through Sklearn for ease of access, so let's get started with our imports.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_wine
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier

Then load in our data.

wine = load_wine()
data = pd.DataFrame(data= np.c_[wine['data'], wine['target']],
                     columns= wine['feature_names'] + ['target'])
data.describe()

We can see here that all of our columns are the same size and if you know anything about wine, we can see that we have all sorts of metrics describing the characteristics of each one.

Here we are going to begin our split and get ready to see how our gradient boosting model does against the decision stump model.

X=data.drop('target',axis=1)
y=data['target']
Xtrain, Xtest, ytrain, ytest = train_test_split(X,y)

So lets first try out the stump to get a basis for our predictions.

DT = DecisionTreeClassifier(max_depth=1,criterion='gini')

DT.fit(Xtrain,ytrain)
DT.predict(Xtest)
print("Mean Accuracy:",DT.score(Xtest,ytest))

On my Kernel I got an mean accuracy score of .71, not bad for a decision tree with a depth of 1. Now for the gradient boosting model.

GB = GradientBoostingClassifier(n_estimators=100,max_depth=1,criterion='friedman_mse')

GB.fit(Xtrain,ytrain)
GB.predict(Xtest)
print("Mean Accuracy:",GB.score(Xtest,ytest))

Here you can see I made sure to make sure the max depth was set to one too keep the models similar, and after restarting the kernel a couple of times I ended up with accuracy scores between .97 and 1.

Conclusion

As you can see from the example above it is pretty impressive the capabilities of all of these weak models working together. Gradient Boosting is an awesome advanced boosting algorithm that I would recommend trying on some of your Machine Learning project and personally I have had great success with. Just remember in the words of the great Walter Payton "We are stronger together than we are alone".