zeromathai

Posted on May 6 • Edited on May 7 • Originally published at zeromathai.com

How Backpropagation Works — From Forward Pass to Gradient Updates

#machinelearning #ai #deeplearning #neuralnetworks

Backpropagation sounds like “sending errors backward.”

But that explanation is too vague.

The real question is:

How does a neural network know which weights caused the error?

That is what backpropagation solves.

Core Idea

Backpropagation is the mechanism that tells each parameter how much it contributed to the final loss.

A neural network first makes a prediction.

Then it measures the error.

Then it sends gradient information backward through the network.

That gradient tells the optimizer how to update the weights.

The Key Structure

The full training flow looks like this:

Input → Forward Propagation → Prediction → Loss → Backpropagation → Gradients → Weight Update

In simple terms:

Forward pass computes the output
Loss measures the error
Backpropagation computes gradients
Optimizer updates parameters

The forward pass answers:

“What did the model predict?”

The backward pass answers:

“What should change to reduce the error?”

Implementation View

At a high level, training works like this:

for each batch:
    prediction = model(input)

    loss = loss_function(prediction, target)

    gradients = backpropagate(loss)

    update_weights(gradients)

This is why backpropagation matters in practice.

Without gradients, the optimizer does not know which direction to move.

Without backpropagation, deep learning becomes guesswork.

Concrete Example

Imagine a model predicts:

prediction = 0.80

But the correct target is:

target = 1.00

The model is wrong by some amount.

But the network may have thousands or millions of weights.

Backpropagation answers a more specific question:

Which weights pushed the prediction away from the target?

Not every weight deserves the same update.

Some weights contributed more to the error.

Some contributed less.

Backpropagation distributes responsibility backward through the computation.

Forward Propagation vs Backpropagation

Forward propagation and backpropagation are opposite flows.

Forward propagation:

moves from input to output
computes activations
produces a prediction
calculates loss

Backpropagation:

moves from loss back toward earlier layers
computes gradients
assigns error responsibility
prepares weight updates

Forward pass is prediction.

Backward pass is learning.

You need both.

Why the Chain Rule Matters

Backpropagation is built on the chain rule.

A deep neural network is a chain of operations.

Each layer depends on the previous layer.

So the effect of an early weight on the final loss must pass through many steps.

The chain rule lets us compute that effect systematically.

Conceptually:

loss depends on output

output depends on hidden layers

hidden layers depend on weights

So we trace the dependency backward.

That is backpropagation.

Learning and Weight Updates

Backpropagation does not update weights by itself.

It computes gradients.

The optimizer uses those gradients to update parameters.

A simple update looks like this:

new weight = old weight - learning rate × gradient

The gradient gives the direction.

The learning rate controls the step size.

This is why training can fail even when backpropagation is correct.

If the learning rate is too large, updates can become unstable.

If it is too small, learning can be painfully slow.

Why Training Can Still Be Unstable

Understanding backpropagation does not automatically make training stable.

Gradients can be noisy.

Gradient estimates can vary from batch to batch.

This is especially visible in mini-batch training.

One batch may suggest one direction.

Another batch may suggest a slightly different direction.

That variation is called gradient variance.

This is why training loss can wobble instead of moving smoothly downward.

Deeper View: Computational Graphs

A useful way to understand backpropagation is through a computational graph.

Each operation becomes a node.

Each connection shows dependency.

Backpropagation moves backward through that graph and applies differentiation step by step.

This is also why automatic differentiation works.

Modern deep learning frameworks do not manually derive every gradient.

They build a computation graph and apply the chain rule automatically.

Recommended Learning Order

If backpropagation feels abstract, learn it in this order:

Forward Propagation
Backpropagation
Learning and Backpropagation
Gradient Estimate
Gradient Variance
Computational Graph
Automatic Differentiation

This order works because you first understand the forward computation.

Then you understand the backward learning flow.

Then you connect it to training instability and implementation tools.

Takeaway

Backpropagation is not just “error moving backward.”

It is the gradient computation system that makes neural networks trainable.

Forward propagation makes the prediction.

Loss measures the mistake.

Backpropagation computes responsibility.

The optimizer updates the weights.

If you remember one sentence, remember this:

Backpropagation tells each parameter how it should change to reduce the loss.

Discussion

When you first learned backpropagation, was the hardest part the chain rule, the gradient flow, or the connection to actual weight updates?

Originally published at zeromathai.com.
Original article: https://zeromathai.com/en/backpropagation-overview-hub-en/

GitHub Resources
AI diagrams, study notes, and visual guides:
https://github.com/zeromathai/zeromathai-ai

DEV Community