DEV Community

Cover image for Understanding Gradients: The Engine Behind Neural Network Learning
Rijul Rajesh
Rijul Rajesh

Posted on

Understanding Gradients: The Engine Behind Neural Network Learning

In the previous article, we explored activation functions and visualized them using Python.

Now, let’s see what gradients are.

Neural networks use activation functions to transform inputs inside them.

But if a neural network gives a wrong output, how does it know what to fix?

This is where gradients come in.


What is a gradient?

Imagine you are walking on a hill. If the ground is steep, you can feel which direction goes up or down.

If the ground is almost flat, it is hard to tell where to go.

A gradient can be simply thought of as a number that tells us how steep a curve is at a point.

How does this apply in the case of neural networks? Let’s see.


Why gradients matter in neural networks

In neural networks:

  • Gradients tell us how much a parameter should change
  • The bigger the gradient, the bigger the update
  • If the gradient is 0, then learning stops

When training a neural network:

  1. We make a prediction
  2. We calculate how wrong it is
  3. We update weights to reduce the loss
  4. This update depends entirely on gradients

Gradients of activation functions

Each activation function has:

  • A curve
  • A gradient curve

Let’s check this in Python, starting first with ReLU.


Gradient of ReLU

We can define the ReLU gradient as:

def relu_grad(x):
    return np.where(x > 0, 1, 0)
Enter fullscreen mode Exit fullscreen mode

This means:

  • Gradient = 0 when input ≤ 0
  • Gradient = 1 when input > 0

Let’s plot it:

plt.figure()
plt.plot(x, relu_grad(x), label="ReLU Gradient")
plt.title("Gradient of ReLU")
plt.xlabel("Input")
plt.ylabel("Gradient")
plt.grid(True)
plt.legend()
plt.show()
Enter fullscreen mode Exit fullscreen mode

This is our gradient. From the above, you can observe the following:

  • The entire negative side has zero gradient
  • The positive side has a constant gradient of 1

From this, we can further understand that:

  • ReLU learns very fast when active
  • ReLU neurons can die if they always receive negative inputs

Gradient of Softplus

def softplus_grad(x):
    return 1 / (1 + np.exp(-x))
Enter fullscreen mode Exit fullscreen mode

Let’s plot it:

plt.figure()
plt.plot(x, softplus_grad(x), label="Softplus Gradient")
plt.title("Gradient of Softplus")
plt.xlabel("Input")
plt.ylabel("Gradient")
plt.grid(True)
plt.legend()
plt.show()
Enter fullscreen mode Exit fullscreen mode

You can observe that it is the same as the sigmoid activation function.

You can also observe that:

  • There is a smooth transition
  • Learning always continues
  • This avoids dying neurons and adds stability, but it is slower than ReLU

Gradient of Sigmoid

The sigmoid gradient looks like this:

def sigmoid_grad(x):
    s = sigmoid(x)
    return s * (1 - s)
Enter fullscreen mode Exit fullscreen mode

Let’s plot it:

plt.figure()
plt.plot(x, sigmoid_grad(x), label="Sigmoid Gradient")
plt.title("Gradient of Sigmoid")
plt.xlabel("Input")
plt.ylabel("Gradient")
plt.grid(True)
plt.legend()
plt.show()
Enter fullscreen mode Exit fullscreen mode

From the above, you can observe the following:

  • The gradient is very small at both extremes
  • It is strong only around the middle
  • It is almost zero for large positive or negative values

This leads to a famous problem called the vanishing gradient problem. We will explore this more in the next article.


You can try the examples out via the Colab notebook.

If you’ve ever struggled with repetitive tasks, obscure commands, or debugging headaches, this platform is here to make your life easier. It’s free, open-source, and built with developers in mind.

👉 Explore the tools: FreeDevTools
👉 Star the repo: freedevtools

Top comments (0)