DEV Community

Cover image for Gradient Descent for people in a hurry !
Shashank1998-code
Shashank1998-code

Posted on

Gradient Descent for people in a hurry !

Gradient Descent is one of the key algorithms that machine learning practitioners everywhere use, but not many of them can explain what it actually is. Surprising, isn't it ? This relatively simple concept is often muddled with so much mathematics and statistics that it can confuse the best of us. Here is a brief but simple explanation of the puzzling yet simple concept that does not require anything more that simple high school mathematics to understand.

COST FUNCTION : In Machine Learning, the primarily goal is to learn from the data available. This is usually done by minimising the errors i.e the machine will learn to identify the truth or the right choice from the false or the wrong choice. The mathematical representation of this function that allows the machine to make the right choice is called the Cost Function. Since this function works on the basis of identifying errors it is also called the Error function. So when we run our Machine Learning model, what we are basically trying to achieve is to identify "parameters" or "weights" that minimize the cost function.

GRADIENT DESCENT : Now we know that a machine learns by minimizing the cost function. But how does it do this ? Voila ! Enter Gradient Descent. Gradient Descent is that behind the scenes optimization algorithm that allows us achieve the minimum value, i.e the local or global minima of the function.
Gradient in simple words means direction.The Gradient Descent algorithm identifies the direction or the gradient the model must take to reach the minimum value. Once this direction has been identified, the algorithm computes or iterates the function with respect to a set value known as the learning rate. The learning rate is the rate at which the model must move or the distance it must cover in each iteration to reach the minima value i.e the distance covered in each iteration. As the model iterates it moves towards the minimum value and finally converges at a point where the cost function is minimized completely and the error has been neutralised.

So Gradient Descent is that magical tool that enables us the obtain the lowest value of the cost function without much trouble. The alternative to using gradient descent would be DEATH ! This is not an exaggeration. Computing the minimum error without gradient descent would mean to iteratively run an infinite number of parameters to finally arrive at the minima or the least error. This for obvious reasons would be impractical. Gradient descent, thus constantly updates the learned value and moves the machine towards the ideal value as quickly as possible making it an "integral"( :3) aspect of machine learning problems.

Top comments (0)