Rishav Raj Kumar for GNU/Linux Users' Group, NIT Durgapur

Posted on Dec 10, 2020

Stonksmaster: Predict Stock prices using Python and ML - Part II

#machinelearning #python #tutorial #programming

*This is follow up article from our previous post.*

Stonksmaster - Predict Stock prices using Python & ML 📈

Nirvik Agarwal for GNU/Linux Users' Group, NIT Durgapur ・ Dec 2 '20

#machinelearning #python #beginners #tutorial

In the previous post we discussed the basics of **Machine Learning** and its *regression* models for *stock prices prediction*. Today, let us talk about *ensemble methods* and *boosting models* used in *supervised Machine Learning.*

Ensemble Methods

Ensemble methods is a Machine Learning technique that uses multiple machine learning algorithms together to obtain a better predictive performance that could not be obtained from any of the constituent algorithms alone. The way the base models are combined can range from simple methods like averaging or max voting to more complex like Boosting or Stacking.

To draw a basic comparison we could say, let us imagine a parliament filled with members voting for a particular bill, or expressing their opinion. Let the members be models and their opinions are the results of the models. So to decide if the bill passes or not depends on the members and their votes. Similarly, in Ensemble learning the final outcome is based on the outcome of each of its base models too. This could be the exact situation in the case of two-fold classification problems. But the basic idea is the same for all other ensemble methods too. We hope it would have made the picture more clear.

Over the past few years, Ensemble learning techniques have seen a huge jump in popularity and the reason behind this is quite obvious, they help us to build a really robust model from a few weak base models.

Ensemble Approaches

There are multiple ways to use ensemble models. There are simple ones like max voting or averaging as well as more complex ones like boosting, bagging or stacking.

Majority Voting

There are three versions of majority voting, where the ensemble methods choose the class -

on which all classifiers agree
predicted by at least one more than half the number of classifiers
that receives the highest number of votes, whether or not the sum of those votes exceeds 50%

Averaging

Averaging is a simple method that is generally used for regression problems. Here we simply take the average of the predictions to get a more robust result. Although this method might look pretty simple it almost always gives better results than a single model hence serving the purpose of ensemble learning.

Weighted Average

Averaging the predictions provides great results but it has a major flow and that is in most cases, one model has more predictive power or performs better based on the input problem than the other and hence we want to give it more weight on the final predictions.

Bagging

Bagging is a popular ensemble method that is used in algorithms like Random Forest. It gains accuracy not only by averaging the models but by also creating uncorrelated models which are made possible by giving them different training sets.

Boosting

Boosting is a sequential process, where each subsequent model tries to correct the errors of the previous model. Due to this reason, the succeeding models are dependent on the previous models and so we train the model sequentially instead of training them parallelly.

Gradient Boosting Regressor

In this tutorial, we will be focusing on and using Gradient Boosting Regression (GBR) on the dataset we used in the previous tutorial which was used for stock prediction.

The term gradient in gradient boosting comes from the fact that the algorithm uses gradient descent used to minimize the loss.
Read more on Gradient Descent :

Decision Trees is used as weak learners or base models in the gradient boosting algorithm. Decision Trees transform data into a tree representation. Each internal node of the tree representation denotes an attribute and each leaf node denotes a class label.

How do decision trees look?

Like the Image above? Definitely not ;)

It looks like this :)

Advantages of Gradient Boosting

Better accuracy and precision than most base models like Random Forests
Less pre-processing of data is required
Higher flexibility to produce better on a variety of types of input data

Gradient Boosting parameters

Number of Estimators
It is denoted as n_estimators.
The default value of this parameter is 100.
The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance. In other words, the number of estimators denotes the number of trees in the forest. More number of trees helps in learning the data better. On the other hand, more number of trees can result in higher training time. Hence we need to find the right and balanced value of n_estimators for optimal performance.

Maximum Depth
It is denoted as max_depth.
The default value of max_depth is 3 and it is an optional parameter. The maximum depth is the depth of the decision tree estimator in the gradient boosting regressor. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables. We need to find the optimum value of this parameter for best performance.

Learning Rate
It is denoted as learning_rate.
The default value of learning_rate is 0.1 and it is an optional parameter. The learning rate is a hyper-parameter in gradient boosting regressor algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.

Criterion
It is denoted as criterion.
The default value of criterion is friedman_mse and it is an optional parameter. The function to measure the quality of a split. Criterion is used to measure the quality of a split for decision tree. mse stands for mean squared error. The default value of friedman_mse is generally the best as it can provide a better approximation in some cases.

Loss
It is denoted as loss.
The default value of the loss is ls and it is an optional parameter. This parameter indicates the loss function to be optimized. There are various loss functions like ls which stands for least squares regression. The least absolute deviation abbreviated as lad is another loss function. huber is a combination of the two. quantile allows quantile regression (use alpha to specify the quantile).

Subsample
It is denoted as subsample.
The default value of subsample is 1.0 and it is an optional parameter. Subsample is the fraction of samples used for fitting the individual tree learners. If subsample is smaller than 1.0 this leads to a reduction of variance and an increase in bias.

Number of Iteration no change
It is denoted by n_iter_no_change.
The default value of subsample is None and it is an optional parameter. This parameter is used to decide whether early stopping is used to terminate training when validation score is not improving with further iteration. If this parameter is enabled, it will set aside validation_fraction size of the training data as validation and terminate training when validation score is not improving.

For more insight on parameters of this model, refer to this documentation

Improvements to Basic Gradient Boosting

Gradient boosting is a greedy algorithm and can overfit a training dataset very quickly. Therefore we need to tune the model or improve it using various techniques.

Following are a few ways to enhance the performance of a basic gradient boosting algorithm :

Tree Constraints
Shrinkage
Random sampling
Penalized Learning

For reference on GBR tuning read here.

Basic Implementation

Following is the basic implementation of Gradient Boosting Regressor used on the Iexfinance dataset imported in previous part of this article :

# GBR
from sklearn import ensemble
# Fit regression model
params = {'n_estimators': 500, 'max_depth': 4, 'min_samples_split': 2,
          'learning_rate': 0.01, 'loss': 'ls'}
model = ensemble.GradientBoostingRegressor(**params)
model.fit(x_training_set, y_training_set)

from sklearn.metrics import mean_squared_error, r2_score
model_score = model.score(x_training_set,y_training_set)
# Have a look at R sq to give an idea of the fit ,
# Explained variance score: 1 is perfect prediction
print('R2 sq: ',model_score)
y_predicted = model.predict(x_test_set)

# The mean squared error
print("Mean squared error: %.2f"% mean_squared_error(y_test_set, y_predicted))
# Explained variance score: 1 is perfect prediction
print('Test Variance score: %.2f' % r2_score(y_test_set, y_predicted))