*This is follow up article from our previous post.*

## Stonksmaster - Predict Stock prices using Python & ML ðŸ“ˆ

### Nirvik Agarwal ãƒ» Dec 2 ãƒ» 8 min read

**Machine Learning**and its

*regression*models for

*stock prices prediction*. Today, let us talk about

*ensemble methods*and

*boosting models*used in

*supervised Machine Learning.*

## Ensemble Methods

**Ensemble methods** is a Machine Learning technique that uses multiple machine learning algorithms together to obtain a better **predictive performance** that could not be obtained from any of the constituent algorithms alone. The way the base models are combined can range from simple methods like *averaging* or *max voting* to more complex like *Boosting* or *Stacking*.

To draw a *basic comparison* we could say, let us imagine a *parliament* filled with members voting for a particular bill, or expressing their opinion. Let the *members be models* and their opinions are the results of the models. So to decide if the bill passes or not depends on the members and their *votes*. Similarly, in Ensemble learning the final outcome is based on the outcome of each of its base models too. This could be the exact situation in the case of two-fold classification problems. But the basic idea is the same for all other ensemble methods too. We hope it would have made the picture more clear.

Over the past few years, Ensemble learning techniques have seen a huge jump in popularity and the reason behind this is quite obvious, they help us to build a really robust model from a few *weak* base models.

### Ensemble Approaches

There are multiple ways to use ensemble models. There are simple ones like max voting or averaging as well as more complex ones like boosting, bagging or stacking.

**Majority Voting**

There are three versions of majority voting, where the ensemble methods choose the class -

- on which all classifiers agree
- predicted by at least one more than half the number of classifiers
- that receives the highest number of votes, whether or not the sum of those votes exceeds 50%

**Averaging**

Averaging is a simple method that is generally used for *regression* problems. Here we simply take the average of the predictions to get a more robust result. Although this method might look pretty simple it almost always gives better results than a single model hence serving the purpose of ensemble learning.

**Weighted Average**

Averaging the predictions provides great results but it has a major flow and that is in most cases, one model has more predictive power or performs better based on the input problem than the other and hence we want to give it more weight on the final predictions.

**Bagging**

Bagging is a popular ensemble method that is used in algorithms like *Random Forest*. It gains accuracy not only by averaging the models but by also creating uncorrelated models which are made possible by giving them different training sets.

**Boosting**

Boosting is a sequential process, where each subsequent model tries to correct the errors of the previous model. Due to this reason, the succeeding models are dependent on the previous models and so we train the model sequentially instead of training them parallelly.

### Gradient Boosting Regressor

In this tutorial, we will be focusing on and using **Gradient Boosting Regression (GBR)** on the dataset we used in the previous tutorial which was used for stock prediction.

The term **gradient** in `gradient boosting`

comes from the fact that the algorithm uses *gradient descent* used to minimize the loss.

Read more on Gradient Descent :

**Decision Trees** is used as weak learners or base models in the gradient boosting algorithm. Decision Trees transform data into a tree representation. Each internal node of the tree representation denotes an attribute and each leaf node denotes a class label.

*How do decision trees look?*

*Like the Image above? Definitely not ;)*

*It looks like this :)*

Read more on Decision Trees.

**GBR** calculates the difference between the current prediction and the known correct target value & this difference is called *residual*.

After that Gradient boosting Regression trains a weak model that maps features to their residuals. This residual predicted by a weak model is added to the existing model input and thus this process nudges the model towards the correct target. Repeating this step, again and again, improves the overall model prediction.

The general steps that we follow to implement GBR are:

- Selecting a weak learner (base model)
- Using an additive model
- Defining the loss function
- Minimizing the loss function

### Advantages of Gradient Boosting

- Better accuracy and precision than most base models like Random Forests
- Less pre-processing of data is required
- Higher flexibility to produce better on a variety of types of input data

### Gradient Boosting parameters

*Number of Estimators*

It is denoted as `n_estimators`

.

The default value of this parameter is 100.

The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance. In other words, the number of estimators denotes the number of trees in the forest. More number of trees helps in learning the data better. On the other hand, more number of trees can result in higher training time. Hence we need to find the right and balanced value of `n_estimators`

for optimal performance.

*Maximum Depth*

It is denoted as `max_depth`

.

The default value of `max_depth`

is 3 and it is an optional parameter. The maximum depth is the depth of the decision tree estimator in the gradient boosting regressor. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables. We need to find the optimum value of this parameter for best performance.

*Learning Rate*

It is denoted as `learning_rate`

.

The default value of `learning_rate`

is 0.1 and it is an optional parameter. The learning rate is a hyper-parameter in gradient boosting regressor algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.

*Criterion*

It is denoted as `criterion`

.

The default value of `criterion`

is *friedman_mse* and it is an optional parameter. The function to measure the quality of a split. Criterion is used to measure the quality of a split for decision tree. *mse* stands for **mean squared error**. The default value of *friedman_mse* is generally the best as it can provide a better approximation in some cases.

*Loss*

It is denoted as `loss`

.

The default value of the loss is *ls* and it is an optional parameter. This parameter indicates the loss function to be optimized. There are various loss functions like *ls* which stands for least squares regression. The least absolute deviation abbreviated as *lad* is another loss function. *huber* is a combination of the two. *quantile* allows quantile regression (use alpha to specify the quantile).

*Subsample*

It is denoted as `subsample`

.

The default value of subsample is 1.0 and it is an optional parameter. Subsample is the fraction of samples used for fitting the individual tree learners. If `subsample`

is smaller than 1.0 this leads to a reduction of variance and an increase in bias.

*Number of Iteration no change*

It is denoted by `n_iter_no_change`

.

The default value of subsample is None and it is an optional parameter. This parameter is used to decide whether early stopping is used to terminate training when validation score is not improving with further iteration. If this parameter is enabled, it will set aside `validation_fraction`

size of the training data as validation and terminate training when validation score is not improving.

For more insight on parameters of this model, refer to this documentation

### Improvements to Basic Gradient Boosting

Gradient boosting is a greedy algorithm and can overfit a training dataset very quickly. Therefore we need to tune the model or improve it using various techniques.

Following are a few ways to enhance the performance of a basic gradient boosting algorithm :

- Tree Constraints
- Shrinkage
- Random sampling
- Penalized Learning

For reference on GBR tuning read here.

### Basic Implementation

Following is the basic implementation of Gradient Boosting Regressor used on the Iexfinance dataset imported in previous part of this article :

```
# GBR
from sklearn import ensemble
# Fit regression model
params = {'n_estimators': 500, 'max_depth': 4, 'min_samples_split': 2,
'learning_rate': 0.01, 'loss': 'ls'}
model = ensemble.GradientBoostingRegressor(**params)
model.fit(x_training_set, y_training_set)
from sklearn.metrics import mean_squared_error, r2_score
model_score = model.score(x_training_set,y_training_set)
# Have a look at R sq to give an idea of the fit ,
# Explained variance score: 1 is perfect prediction
print('R2 sq: ',model_score)
y_predicted = model.predict(x_test_set)
# The mean squared error
print("Mean squared error: %.2f"% mean_squared_error(y_test_set, y_predicted))
# Explained variance score: 1 is perfect prediction
print('Test Variance score: %.2f' % r2_score(y_test_set, y_predicted))
```

*We hope you found this insightful.*

Do visit our website to know more about us and also follow us on :

Also do not forget to like and comment.

Until then,

*stay safe, and May the Source Be With You!*

*This article is co-written by*
## Top comments (0)