Backpropagation sounds like “sending errors backward.”
But that explanation is too vague.
The real question is:
How does a neural network know which weights caused the error?
That is what backpropagation solves.
Core Idea
Backpropagation is the mechanism that tells each parameter how much it contributed to the final loss.
A neural network first makes a prediction.
Then it measures the error.
Then it sends gradient information backward through the network.
That gradient tells the optimizer how to update the weights.
The Key Structure
The full training flow looks like this:
Input → Forward Propagation → Prediction → Loss → Backpropagation → Gradients → Weight Update
In simple terms:
- Forward pass computes the output
- Loss measures the error
- Backpropagation computes gradients
- Optimizer updates parameters
The forward pass answers:
“What did the model predict?”
The backward pass answers:
“What should change to reduce the error?”
Implementation View
At a high level, training works like this:
for each batch:
prediction = model(input)
loss = loss_function(prediction, target)
gradients = backpropagate(loss)
update_weights(gradients)
This is why backpropagation matters in practice.
Without gradients, the optimizer does not know which direction to move.
Without backpropagation, deep learning becomes guesswork.
Concrete Example
Imagine a model predicts:
prediction = 0.80
But the correct target is:
target = 1.00
The model is wrong by some amount.
But the network may have thousands or millions of weights.
Backpropagation answers a more specific question:
Which weights pushed the prediction away from the target?
Not every weight deserves the same update.
Some weights contributed more to the error.
Some contributed less.
Backpropagation distributes responsibility backward through the computation.
Forward Propagation vs Backpropagation
Forward propagation and backpropagation are opposite flows.
Forward propagation:
- moves from input to output
- computes activations
- produces a prediction
- calculates loss
Backpropagation:
- moves from loss back toward earlier layers
- computes gradients
- assigns error responsibility
- prepares weight updates
Forward pass is prediction.
Backward pass is learning.
You need both.
Why the Chain Rule Matters
Backpropagation is built on the chain rule.
A deep neural network is a chain of operations.
Each layer depends on the previous layer.
So the effect of an early weight on the final loss must pass through many steps.
The chain rule lets us compute that effect systematically.
Conceptually:
loss depends on output
output depends on hidden layers
hidden layers depend on weights
So we trace the dependency backward.
That is backpropagation.
Learning and Weight Updates
Backpropagation does not update weights by itself.
It computes gradients.
The optimizer uses those gradients to update parameters.
A simple update looks like this:
new weight = old weight - learning rate × gradient
The gradient gives the direction.
The learning rate controls the step size.
This is why training can fail even when backpropagation is correct.
If the learning rate is too large, updates can become unstable.
If it is too small, learning can be painfully slow.
Why Training Can Still Be Unstable
Understanding backpropagation does not automatically make training stable.
Gradients can be noisy.
Gradient estimates can vary from batch to batch.
This is especially visible in mini-batch training.
One batch may suggest one direction.
Another batch may suggest a slightly different direction.
That variation is called gradient variance.
This is why training loss can wobble instead of moving smoothly downward.
Deeper View: Computational Graphs
A useful way to understand backpropagation is through a computational graph.
Each operation becomes a node.
Each connection shows dependency.
Backpropagation moves backward through that graph and applies differentiation step by step.
This is also why automatic differentiation works.
Modern deep learning frameworks do not manually derive every gradient.
They build a computation graph and apply the chain rule automatically.
Recommended Learning Order
If backpropagation feels abstract, learn it in this order:
- Forward Propagation
- Backpropagation
- Learning and Backpropagation
- Gradient Estimate
- Gradient Variance
- Computational Graph
- Automatic Differentiation
This order works because you first understand the forward computation.
Then you understand the backward learning flow.
Then you connect it to training instability and implementation tools.
Takeaway
Backpropagation is not just “error moving backward.”
It is the gradient computation system that makes neural networks trainable.
Forward propagation makes the prediction.
Loss measures the mistake.
Backpropagation computes responsibility.
The optimizer updates the weights.
If you remember one sentence, remember this:
Backpropagation tells each parameter how it should change to reduce the loss.
Discussion
When you first learned backpropagation, was the hardest part the chain rule, the gradient flow, or the connection to actual weight updates?
Originally published at zeromathai.com.
Original article: https://zeromathai.com/en/backpropagation-overview-hub-en/
Top comments (0)