Rijul Rajesh

Posted on Jan 6

Learning Gradient Descent for Machine Learning with a Simple Python Example

#ai #machinelearning

In this article, we will explore gradient descent.

Gradient descent is a strategy used in machine learning to optimize models.

For example, it helps in finding the best possible values for things like the slope and intercept of a line.

Let’s go through how this works, one step at a time.

The example we will be using

Let’s assume this is our real equation:

y = 2x + 1

Slope m = 2
Intercept b = 1

However, assume we do not know the value of the intercept. Our goal is to find it.

At present, we only see the data points—just a bunch of numbers.

import numpy as np

x = np.array([0, 1, 2, 3, 4])
y = np.array([1, 3, 5, 7, 9])

1. Start with a random guess

To begin, you pick a random value for the parameter you want to find (for example, the intercept of the line).

This initial guess gives the algorithm a starting point to improve upon.

Let’s guess that the intercept b = 10.

Our model becomes:

y = 2x + 10

b = 10  # random guess
m = 2   # assume slope is known for now

y_pred = m * x + b
print(y_pred)

This gives the following output:

[10 12 14 16 18]

Comparing this with the actual y values:

[1 3 5 7 9]

This is a huge difference. But don’t worry—we will gradually fix it.

2. Measure the error (The Loss Function)

We have seen that our guess went pretty badly.
Now we need to measure how bad it was so we can start making improvements.

This is done using a loss function, specifically the sum of squared errors.

This calculation measures the difference between your model’s predictions and the actual data points.

loss = np.sum((y - y_pred) ** 2)
print(loss)

This produces the following output:

A large loss means a bad guess, and a small loss means a good guess.

Our goal is to reduce this loss as much as possible.

3. Find the slope of loss function

To know which direction to move your guess in order to improve it, you take the derivative of the loss function.

gradient_b = -2 * np.sum(y - y_pred)
print(gradient_b)

This gives the following output:

This value tells us:

which direction to move
how strongly to move

4. Taking a step

Gradient descent moves toward the lowest point of the loss function by taking steps.

The size of the step is determined by:

The slope
- A steeper slope means you are far from the goal and should take a bigger step.
- A flatter slope means you are close to the goal and should take a smaller step.
The learning rate
- This is a small number (such as 0.1 or 0.01) that you multiply the slope by to ensure the steps aren’t too large.
- If the step is too large, you might accidentally increase the error instead of decreasing it.

This is the formula:

learning_rate = 0.01

b_new = b - learning_rate * gradient_b
print(b_new)

This produces:

9.1

The value of b changed from 10 to 9.1. This is better than before, even though it is not perfect.

5. Repeat until finished

Now we repeat the same process:

Measure the loss with the new b value
Compute the gradient
Update b

The algorithm stops when one of the following happens:

The step size becomes very small
A maximum number of steps is reached

Let’s run the full version of the code.

import numpy as np

# Data
x = np.array([0, 1, 2, 3, 4])
y = np.array([1, 3, 5, 7, 9])

# Initial guess
b = 10
m = 2
learning_rate = 0.01

for step in range(30):
    y_pred = m * x + b
    loss = np.sum((y - y_pred) ** 2)

    gradient_b = -2 * np.sum(y - y_pred)
    b = b - learning_rate * gradient_b

    print(f"Step {step:2d} | b = {b:.4f} | loss = {loss:.2f}")

This produces the following output:

Step  0 | b = 9.1000 | loss = 405.00
Step  1 | b = 8.2900 | loss = 328.05
Step  2 | b = 7.5610 | loss = 265.72
Step  3 | b = 6.9049 | loss = 215.23
Step  4 | b = 6.3144 | loss = 174.34
Step  5 | b = 5.7830 | loss = 141.21
Step  6 | b = 5.3047 | loss = 114.38
Step  7 | b = 4.8742 | loss = 92.65
Step  8 | b = 4.4868 | loss = 75.05
Step  9 | b = 4.1381 | loss = 60.79
Step 10 | b = 3.8243 | loss = 49.24
Step 11 | b = 3.5419 | loss = 39.88
Step 12 | b = 3.2877 | loss = 32.31
Step 13 | b = 3.0589 | loss = 26.17
Step 14 | b = 2.8530 | loss = 21.20
Step 15 | b = 2.6677 | loss = 17.17
Step 16 | b = 2.5009 | loss = 13.91
Step 17 | b = 2.3509 | loss = 11.26
Step 18 | b = 2.2158 | loss = 9.12
Step 19 | b = 2.0942 | loss = 7.39
Step 20 | b = 1.9848 | loss = 5.99
Step 21 | b = 1.8863 | loss = 4.85
Step 22 | b = 1.7977 | loss = 3.93
Step 23 | b = 1.7179 | loss = 3.18
Step 24 | b = 1.6461 | loss = 2.58
Step 25 | b = 1.5815 | loss = 2.09
Step 26 | b = 1.5233 | loss = 1.69
Step 27 | b = 1.4710 | loss = 1.37
Step 28 | b = 1.4239 | loss = 1.11
Step 29 | b = 1.3815 | loss = 0.90

You can see that the value of b reaches 1.38, steadily converging toward the true value of 1 after 30 gradient descent steps.

Wrapping up

This is essentially how gradient descent works. It is a foundational idea that helps in understanding many core machine learning concepts.

We will build on this and move on to larger topics in the upcoming articles.

You can try out the examples on My Colab Notebook

If you’ve ever struggled with repetitive tasks, obscure commands, or debugging headaches, this platform is here to make your life easier. It’s free, open-source, and built with developers in mind.