Photo by Michael Dziedzic on Unsplash
In the previous post, I briefly explained Gradient Boosting using a classification problem. Here I will do step by step explanation of how Gradient Boosting Regressor works using sklearn and Python to complement a theory given here. I did this exercise mainly to build an intuition of processes inside the Gradient boosted trees and by doing so to avoid using it as some sort of 'black box' algorithm.
I used a dataset with car prices (source) for this purpose. So, for easy tracking of the processes inside the Gradient boosted trees, I used a small portion of the data with a minimum number of the trees(m=2), and the depth of a tree(max_depth=2).
1) First, we initialize the model, by getting initial predictions Pred_0. It is calculated as Mean value of the prices in the train dataset. Then we calculated initials residuals: Res_0 = train['price']-Pred_0. See below.
2) Here we fit all data points(= each row features and Res_0) into the first tree. This tree build by using 'MSE' as a criterion.
Each Value in the Leaf are calculated by the mean values of the residuals in each leaf. Then Prediction is calculated:
Pred_1 = Pred_0 + learning_rate*output_value_1
The we calculate residuals:
Res_1 = train['price']-Pred_1
Node #2, 3, 5, and 6 Predictions and Residuals:
3)
Here we fit all data points(= each row features and Res_1) into the second tree. This tree build by using 'MSE' as a criterion as well.
Node #5 Predictions shown below.

4) We continue this iterative training process. Here I used only two trees for the simplicity.
You can refer for the detailed code and step by step in chapter 5 here.
 
 
              




 
    
Top comments (0)