Every AI model on earth learns the same way: by rolling downhill. And the single most common reason a model fails to learn comes down to one number — the learning rate. So I built an animation that runs the exact same gradient descent three times, from the identical starting point, changing only that number.
What you're actually seeing
- A loss curve — height is how wrong the model's prediction is. The ball starts high (a random guess); the goal is the bottom.
- The cyan tangent line is the gradient — the slope under the ball. It points uphill, so the algorithm steps the opposite way.
- Three runs, one variable: too low (crawls, never arrives), tuned (settles dead-on the global minimum), too high (overshoots, swings, never settles — divergence).
- A deliberate second dip on the left — a local minimum, the trap a too-cautious model can get stuck in.
The stack
React (a small phase state machine: idle → gradient → step → done), Framer Motion for the ball + tangent, the Web Audio API for synthesized tones (no asset files), and deterministic, recording-friendly timing so every take is identical.
Three things that were trickier than expected
1. The SVG ball stuttered until I stopped animating cx/cy
Animating an SVG circle's cx/cy attributes is janky — they're not GPU-composited. The fix is to keep the circle at the origin and animate its transform (x/y) instead, which Framer Motion springs smoothly:
<motion.circle
cx={0}
cy={0}
r={10}
animate={{ x: sx(ballX), y: sy(L(ballX)) }}
transition={{ type: "spring", stiffness: 110, damping: 16 }}
/>
sx/sy map math-space (weight, loss) to screen-space pixels.
2. Recording needs identical-length runs — so I drift-corrected the clock
setTimeout drifts: a loop of await delay(2000) slowly desyncs because each step also pays for render time. For a video where every run must be the same length, I pin each step to an absolute wall-clock budget instead of a relative sleep:
// Sleep until `targetMs` have elapsed since the run started — not "wait N ms".
function waitUntil(targetMs) {
const remaining = targetMs - (Date.now() - runStartRef.current);
return after(Math.max(remaining, 0));
}
// ...inside the loop:
await waitUntil((step + 1) * STEP_BUDGET_MS);
Any per-step jitter gets absorbed, so the run always lands on the same total duration.
3. A stale closure made my "auto" sequence ignore the learning rate
To record all three runs hands-free, I built a choreographed sequence that flips the learning-rate preset between runs. It silently used the wrong rate every time. The culprit: the descent loop read the lr React state directly, but that value was captured once when the async sequence started — classic stale closure. The fix is to pass the rate in as an argument instead of reading state mid-flight:
// Before: read `lr` from state inside the loop → frozen at sequence start.
// After: the caller passes the rate explicitly.
async function runDescent(rate = lr) {
// ...
x = x - rate * gradient; // uses the rate for THIS run, not a stale one
}
await runDescent(0.02); // low
await runDescent(0.46); // good
await runDescent(0.6); // high
What I learned
- The whole lesson of the video is a one-liner in code:
x = x - rate * gradient. Everything dramatic on screen is that single update, looped. - Designing the loss landscape was its own mini-problem — I needed a function with a shallow local minimum and a deeper global one so "stuck in a local min" is visible at a glance. A tilted double-well quartic did it.
- Synthesizing audio whose pitch tracks the loss makes the descent audible — you can hear it settle (or never settle) with your eyes closed.
Watch / try it
🎥 Full walkthrough: https://youtu.be/wng0ddn0wPo
I'm animating a series — neural network training ✅, gradient descent (this one), and backpropagation is next. What should I visualize after that — transformers, CNNs, or something classic like Dijkstra? 👇
Top comments (0)