How a Neural Network Actually Learns — Training, in Plain Words

#machinelearning #ai #beginners #datascience

"The model learns from data." Everyone says it — but what does learning actually mean for a neural network? It's simpler and more mechanical than it sounds: guess, see how wrong you were, nudge, repeat. No math degree required. Here's the whole loop in plain words, with an interactive demo you can train yourself.

📉 Watch it learn: https://dev48v.infy.uk/ai/days/day5-training.html

This is Day 5 of my AIFromZero series — AI literacy, one concept a day, no code to follow.

1. It starts out knowing nothing

A fresh network's dials (its "weights") are random numbers. So its first guesses are nonsense. This matters: the network is not programmed with the answer. It has to discover it from examples.

2. Make a guess

Show it an example and let it produce an output with its current dials. No learning yet — just a prediction. In the demo, that's where a line sits relative to the data points.

3. Measure how wrong it was (the "loss")

Compare the guess to the real answer and turn the gap into a single number — the loss. High when very wrong, near zero when right. In the demo, the red lines are the loss; their total length is how badly the model is doing right now.

4. Nudge the dials downhill

Here's the only "magic." For each dial the network asks: if I turn you up a little, does the wrongness go up or down? Then it turns every dial a tiny bit in the direction that reduces the loss. That's gradient descent — always step downhill on the wrongness.

A blindfolded hiker finding the valley by always stepping in the steepest downhill direction. That's training.

5. Repeat thousands of times

One nudge barely helps. But loop over many examples, many times (each full pass is an "epoch"), and the dials inch toward values that make good guesses everywhere. In the demo, the line creeps onto the data over hundreds of tiny steps. That slow convergence is training.

6. The dials about the dials

Two things make or break it:

Learning rate (step size): too big and it overshoots and wobbles; too small and it crawls forever. In the demo, set the step size to "big" and reset a few times — you'll see it overshoot.
Overfitting: train too long on too little data and the network starts memorising instead of learning.

Why this matters

A real network does this exact loop — guess, measure, nudge — across millions of dials at once. The recipe never changes; only the number of dials does. Understand this loop and "the model was trained on X" stops being a black box and becomes a process you can actually picture.

Press Train in the demo and watch a random line snap onto the data — it was never told the rule, it found it.