People say a neural network "learns to see" or "understands images," and it sounds like sci-fi. It isn't. A neural net does something much more mechanical — and once you see the shape of it, the mystery evaporates. No math in this one, just the intuition.
This is Day 4 of AIFromZero, my concept-a-day series explaining how AI actually works.
A neuron is a tiny detector
Forget the brain metaphor. A single artificial neuron looks at some numbers, weighs them up, and outputs one number that answers: "how much do I see the thing I look for?" That's it — a little detector with a dial for how strongly it fires.
Stack them into layers, and a hierarchy appears
Here's the whole trick, in three beats:
- Early layers spot simple features. Fed raw pixels, the first layer's neurons each learn to fire on something simple — an edge here, a corner there, a blob of colour. Alone, none of them understand the picture.
- Deeper layers combine them. The next layer doesn't see pixels — it sees the first layer's findings. So it can combine "edge + edge + corner" into "a loop" or "a junction." The layer after that combines those into "an eye" or "a wheel."
- The last layer votes. Take all those high-level features and cast a vote for each possible label — "80% a cat, 15% a dog…" The highest vote wins.
Pixels → edges → parts → objects → label. Depth builds understanding, one simple step at a time.
Weights = how much each clue matters
Every connection carries a weight — a number saying how much that clue counts toward the next detector. "Has a closed loop" might count a lot toward the digit 8 and against the digit 1. A neural network is, at heart, nothing but a giant pile of these learned importances.
"Learning" just tunes the clues
Nobody hand-writes those detectors. The network starts random and sees thousands of labelled examples. Each mistake nudges the weights so the helpful clues count more and the misleading ones count less (that's backpropagation — I build it from scratch over in DeepLearningFromZero). Train long enough and useful feature detectors emerge on their own. That's the part that still amazes me: we don't program the features, we let them grow.
See it for yourself
In the interactive demo on this page, you draw a shape on a 5×5 grid. Watch it turn your pixels into a few simple measurements (top-heavy? has a centre cross? corners lit?) and then cast a vote for which shape it most resembles. It's a faked, hand-coded version — but the flow, pixels → features → vote, is exactly what a real image model does, just with millions of learned features instead of four hand-written ones.
The one takeaway
A neural network turns raw input into a stack of ever-higher-level features, then votes for an answer — and it learns those features from examples rather than being programmed with them.
That sentence covers image recognition, speech, and a surprising amount of what's inside a language model too. Not magic. Just detectors, stacked and tuned.
👉 Try the demo (draw a shape, watch the features fire and the vote land): https://dev48v.infy.uk/ai/days/day4-neural-net-intuition.html
🌐 All concepts: https://dev48v.infy.uk/aifromzero.php
Tomorrow: how a neural net actually learns — training, in plain words.
Top comments (0)