DEV Community

Cover image for I Visualized Why AI Models Fail to Train — It's One Number
Amar Gul
Amar Gul

Posted on

I Visualized Why AI Models Fail to Train — It's One Number

Every AI model on earth learns the same way: by rolling downhill. And the single most common reason a model fails to learn comes down to one number — the learning rate. So I built an animation that runs the exact same gradient descent three times, from the identical starting point, changing only that number.

What you're actually seeing

  • A loss curve — height is how wrong the model's prediction is. The ball starts high (a random guess); the goal is the bottom.
  • The cyan tangent line is the gradient — the slope under the ball. It points uphill, so the algorithm steps the opposite way.
  • Three runs, one variable: too low (crawls, never arrives), tuned (settles dead-on the global minimum), too high (overshoots, swings, never settles — divergence).
  • A deliberate second dip on the left — a local minimum, the trap a too-cautious model can get stuck in.

The stack

React (a small phase state machine: idle → gradient → step → done), Framer Motion for the ball + tangent, the Web Audio API for synthesized tones (no asset files), and deterministic, recording-friendly timing so every take is identical.

Three things that were trickier than expected

1. The SVG ball stuttered until I stopped animating cx/cy

Animating an SVG circle's cx/cy attributes is janky — they're not GPU-composited. The fix is to keep the circle at the origin and animate its transform (x/y) instead, which Framer Motion springs smoothly:

<motion.circle
  cx={0}
  cy={0}
  r={10}
  animate={{ x: sx(ballX), y: sy(L(ballX)) }}
  transition={{ type: "spring", stiffness: 110, damping: 16 }}
/>
Enter fullscreen mode Exit fullscreen mode

sx/sy map math-space (weight, loss) to screen-space pixels.

2. Recording needs identical-length runs — so I drift-corrected the clock

setTimeout drifts: a loop of await delay(2000) slowly desyncs because each step also pays for render time. For a video where every run must be the same length, I pin each step to an absolute wall-clock budget instead of a relative sleep:

// Sleep until `targetMs` have elapsed since the run started — not "wait N ms".
function waitUntil(targetMs) {
  const remaining = targetMs - (Date.now() - runStartRef.current);
  return after(Math.max(remaining, 0));
}
// ...inside the loop:
await waitUntil((step + 1) * STEP_BUDGET_MS);
Enter fullscreen mode Exit fullscreen mode

Any per-step jitter gets absorbed, so the run always lands on the same total duration.

3. A stale closure made my "auto" sequence ignore the learning rate

To record all three runs hands-free, I built a choreographed sequence that flips the learning-rate preset between runs. It silently used the wrong rate every time. The culprit: the descent loop read the lr React state directly, but that value was captured once when the async sequence started — classic stale closure. The fix is to pass the rate in as an argument instead of reading state mid-flight:

// Before: read `lr` from state inside the loop → frozen at sequence start.
// After: the caller passes the rate explicitly.
async function runDescent(rate = lr) {
  // ...
  x = x - rate * gradient; // uses the rate for THIS run, not a stale one
}

await runDescent(0.02); // low
await runDescent(0.46); // good
await runDescent(0.6);  // high
Enter fullscreen mode Exit fullscreen mode

What I learned

  • The whole lesson of the video is a one-liner in code: x = x - rate * gradient. Everything dramatic on screen is that single update, looped.
  • Designing the loss landscape was its own mini-problem — I needed a function with a shallow local minimum and a deeper global one so "stuck in a local min" is visible at a glance. A tilted double-well quartic did it.
  • Synthesizing audio whose pitch tracks the loss makes the descent audible — you can hear it settle (or never settle) with your eyes closed.

Watch / try it

🎥 Full walkthrough: https://youtu.be/wng0ddn0wPo

I'm animating a series — neural network training ✅, gradient descent (this one), and backpropagation is next. What should I visualize after that — transformers, CNNs, or something classic like Dijkstra? 👇

Top comments (0)