Backpropagation From Scratch: How a Neural Network Actually Learns

#deeplearning #machinelearning #javascript #beginners

Yesterday's neural network could make a prediction — the forward pass pushed numbers through layers and out came an answer. But it was guessing: the weights were random. Today we add the one idea that makes a network actually learn from its mistakes — backpropagation.

This is Day 4 of DeepLearningFromZero, building neural nets from scratch in plain JavaScript.

The loop in one sentence

Run the forward pass, measure how wrong it was, send that error backward to find each weight's share of the blame, then nudge every weight a little in the direction that reduces the error. Repeat thousands of times.

That loop — forward, loss, backward, nudge — is the entire engine of deep learning.

Step 1 — measure the wrongness

After the forward pass gives a prediction, compare it to the true answer with a loss. Here, squared error:

const out = forward(x);
const loss = (out - y) ** 2;   // big when very wrong

The whole goal of training is to make this number small.

Step 2 — which way is downhill?

For each weight we ask: "if I nudge you up, does the loss go up or down, and how steeply?" That's the gradient (a derivative). At the output, it's the error times the slope of the sigmoid there:

const dOut = (out - y) * out * (1 - out);

Step 3 — the chain rule sends blame backward

This is the "back" in backprop. A hidden neuron didn't touch the loss directly — it acted through the output weight. So its blame is (its weight to the output) × (the output's gradient) × (its own slope):

const dHidden = hidden.map((h, j) =>
  W2[j] * dOut * h * (1 - h));

The error flows backward through the exact same connections it flowed forward through. Add more layers and you just keep chaining the rule.

Step 4 — nudge every weight downhill

Now each weight has a gradient. Subtract a small fraction of it (the learning rate) so it moves toward lower loss:

W2[j]    -= lr * dOut * hidden[j];
W1[j][i] -= lr * dHidden[j] * x[i];

Too small a learning rate and it crawls; too big and it overshoots and the loss bounces around. (In the demo there's a slider — drag it up and watch training get unstable.)

Step 5 — repeat until it learns

One nudge barely helps. But loop over the data thousands of times and the weights settle into values that solve the task.

I trained a tiny 2 → 3 → 1 network on XOR — the classic problem a single straight line cannot solve. Watch the loss slide toward zero and the red/blue regions bend into the XOR checkerboard. That bending is only possible because the hidden layer + backprop can carve a non-linear boundary.

Why this matters

GPT, image generators, self-driving vision — every one of them is trained by this exact loop. The networks are vastly bigger, the math is vectorised on GPUs, and there are clever optimizers (Adam) and tricks (dropout, batch norm) layered on top. But strip all that away and the engine is what you just watched: forward, loss, backprop, nudge.

Once backprop clicks, deep learning stops being magic and starts being "a very patient hill-descent."

👉 Watch a network teach itself XOR (live loss + boundary, learning-rate slider): https://dev48v.infy.uk/dl/day4-backprop.html

🌐 All days: https://dev48v.infy.uk/deeplearningfromzero.php

Tomorrow: XOR — why a single layer can't solve it, and hidden layers can.