Everyone learns backpropagation as "apply the chain rule." Almost nobody explains why it's fast — and that "why" is the whole reason deep learning is computationally possible at all.
So I animated one full training step to show the part most explanations skip.
What you're actually seeing
- Forward pass: a single signal travels through 3 weights → a prediction → compared to the target = the loss.
- Backward pass: the error (δ) flows back through the network. δ₃ is computed at the output, then reused to get δ₂, which is reused to get δ₁ — never recalculated from scratch.
- The point: one forward pass + one backward pass produces every weight's gradient with zero redundant work, no matter how deep the network goes.
That reuse — store a result once, reuse it instead of recomputing — is the exact definition of dynamic programming. The only difference from a Fibonacci memo is that the stored value is a derivative.
The stack
-
React (a phase state machine:
idle → forward → loss → backward → done) - Framer Motion for the signal particles and edge transitions
- Web Audio API — every tone is synthesized, no audio files
- Deterministic timing so each run records identically for video
Three things that were trickier than expected
1. My animated particle was blinking on every frame
The signal dot flickered constantly. The cause: I'd defined the particle component inside the main component, so every state update created a new function identity and React remounted it.
// ❌ inside the component → new identity each render → remount → blink
function Backprop() {
function FlowParticle() { /* ... */ }
}
// ✅ module scope → stable identity → smooth animation
function FlowParticle({ x1, y1, x2, y2, color, particleKey }) {
return <motion.circle /* ... */ />;
}
Lifting it to module scope fixed it instantly.
2. The sound wouldn't play
Browsers block audio that isn't triggered by a user gesture — an AudioContext starts in a suspended state. The fix is to create it lazily and resume() it on the first real click:
function getAudio() {
if (!audioRef.current) {
const Ctx = window.AudioContext || window.webkitAudioContext;
if (Ctx) audioRef.current = new Ctx();
}
return audioRef.current;
}
function handleStart() {
const ctx = getAudio();
if (ctx && ctx.state === "suspended") ctx.resume(); // unlock on gesture
startCountdown();
}
3. Syncing a fixed-timer animation to a recorded voiceover
This was the real headache. The narration isn't uniform — it lingers on δ₃ and the first gradient, then races from δ₂ to δ₁. One global delay constant couldn't fit that, so every beat got its own duration:
// ms each ∂L/∂w stays on screen, keyed by edge — w₂ is short so δ₁
// lands right on "delta one"; w₁ is long so all gradients hold
// through the dynamic-programming wrap.
const GRAD_HOLD_BY = { 2: 8500, 1: 1000, 0: 14000 };
const gradMs = (edge) => GRAD_HOLD_BY[edge] ?? T.gradCalc;
One more gotcha: setTimeout gets throttled when the tab loses focus, so the animation drifts during a long take. Keep the recording tab foregrounded.
What I learned
- The chain rule was always true — backpropagation is just the dynamic-programming version of it. Reframing it that way made it click in a way "here's the partial derivative" never did.
- Building the network data-driven (one
HIDDENconstant drives the layout, math, and animation) meant I could change depth without touching the render code. - For explainer animations, deterministic, hand-tunable timing beats physics-based motion every time — you need each beat to land on a spoken word.
Watch it
🎥 Full 2-minute walkthrough: https://youtu.be/ZV5HKbCsdfo
This is part of my "AI, Visualized" series — neural networks → gradient descent → backprop, with Transformers next. What should I animate after that? 👇
Top comments (0)