Gradient descent is the engine under almost all of machine learning — linear regression, neural nets, everything. And it's a three-line loop you can fully picture: a blindfolded hiker finding the valley by always stepping downhill.
⛰️ Interactive demo: https://dev48v.infy.uk/ml/day7-gradient-descent.html
The whole algorithm
const loss = (x) => (x - 5) ** 2; // a bowl, minimum at x = 5
const grad = (x) => 2 * (x - 5); // the slope at x
x = x - learningRate * grad(x); // step downhill — that's it
The gradient points uphill, so you subtract it. Repeat until the slope is ~0 — you're at the minimum.
Learning rate is everything
- Too small → it crawls (thousands of tiny steps).
- Too big → it overshoots the minimum and can diverge to infinity.
- There's a sweet spot. In the demo, push the rate past ~1.0 and watch the ball fly off — a real failure mode of training.
Local minima
A bowl has one bottom, so descent always finds it. Real loss landscapes are bumpy — switch the demo to the non-convex surface and the ball gets trapped in whichever valley is nearest the start. That's why initialization, momentum, and restarts matter.
It scales to everything
For millions of weights the gradient is a vector (one slope per weight, from backprop), and you step in every direction at once:
weights = weights.map((w, i) => w - lr * grads[i]);
Run the descent and feel the learning rate.
Top comments (0)