Gradient Descent Explained, (For absolute beginners)

#machinelearning #beginners #tutorial

Dear Reader,

I hate long articles, and I know you do too.
So, this article has been broken into three parts for better understanding.
This part introduces us to the theory aspect of our topic, Gradient Descent.
Part two explains the mathematical concept,
while part three teaches us the code, enjoy.

Introduction To Gradient Descent

When training a model, our priority is to minimize the difference between the actual value and the predicted value. This is called minimizing a function.

Gradient descent is an optimization algorithm used by machine learning models to minimize a function; it allows us to find the best values for the model’s parameters.

How Does it Work

Step 1: Initializing Parameters

First, we initialize the parameters, these parameters are the variables learnt from data that affect the prediction. To initialize these parameters, we start with random values that the model uses to discover how right or how wrong it is.
Like when adding salt to a pot of soup, you have to first add a certain amount of salt to the pot of soup before you know if that is the right amount and keep adding (or reducing till you get the perfect amount).

Step 2: Calculating the Loss Function

Next, we calculate the loss function (error), this is a measurement of how wrong the model’s predicted value is, it is the difference between the actual value and the predicted value.
For example, if the model predicted $4000 for a $5000 house, the error is $1000. An error tells us how wrong or how right our model is.

Step 3: Finding the Gradient

For the next step we determine the direction and rate of change of the error with respect to each parameter.
In case you’re thinking what jargon is this now? I’ll explain

While driving a car, the more speed you use, the more distance you cover, less speed is less distance covered. See how the change in speed affects the distance covered?
To achieve a particular distance within a time frame you tweak (either increase or decrease) the speed.

In the same vein, parameters (weight, bias) affect a model’s predicted values, to change a model’s predicted value, you have to change it’s parameters.
Remember our aim is to reduce the error/loss function, to do this we have to change the predicted value, which is affected by the parameters, in this step we find out how the error changes whenever the parameters change, the rate of change of the loss function with respect to the parameters is called the gradient.

Step 4: Adjust the Parameters

After we find out how the loss function is affected by a change in parameter (the gradient), we now adjust the parameters to reduce the errors.
If the gradient is positive, we decrease the parameters
If the gradient is negative, we increase the parameters
If the gradient is zero, we have the perfect parameters to get the lowest error.

We now repeat the process until the loss function is at its minimum.

Conclusion

I hope this has given you the introduction to gradient descent you need to understand the mathematical concept of it, see you in the next article, Ciao.