Harish Kotra (he/him)

Posted on Jun 13

Play & Learn ML Expands: Gradient Descent, Confusion Matrix, and the Overfitting Simulator

#ai #programming #tutorial #dailybuild2026

Three new interactive modules join the playground — making optimization, evaluation metrics, and the bias-variance tradeoff as tangible as everything else.

What's New

The original Play & Learn ML shipped with 5 modules covering the ML fundamentals: linear regression, k-means, decision trees, ensemble learning, and neural networks. But I knew there were more concepts that deserved the physical-metaphor treatment.

Today I'm adding 3 more:

Module	Concept	Metaphor	What You Do
Roller Coaster	Gradient Descent	A ball rolling down the loss landscape	Adjust learning rate & momentum, press Play, watch the ball descend
Sorting Machine	Confusion Matrix	Drag prediction cards into TP/TN/FP/FN bins	Sort cards by actual vs predicted value — see error types visually
Emperor's Tailor	Overfitting	A tailor who perfectly fits the noise	Slide polynomial degree from 1 to 14, watch the curve go from underfit to wildly overfit

Each module has 5 progressive levels, the DefinitionGuide, LiveHint messages, and auto-detected completion — just like the originals.

Roller Coaster — Gradient Descent

The Metaphor

Gradient descent is hard to feel when it's just a math formula:

$$w_{t+1} = w_t - \eta \nabla L(w_t)$$

But everyone has watched a ball roll down a hill. You know intuitively that:

A steeper hill means the ball rolls faster (larger gradient)
A heavier ball is harder to stop (momentum)
If you push it too hard, it might fly off the track (divergence)
It can get stuck in a dip (local minimum) and miss the deeper valley (global minimum)

I turned this into a contour plot of a 2D loss landscape — a heatmap where darker regions mean lower loss. The ball starts at a high point and follows gradient descent to the minimum.

What You Control

Learning rate — A slider from 0.005 to 0.8. Low values make slow, steady progress. High values make fast progress — or send the ball flying off the landscape entirely.

Momentum — A toggle with a slider (0.1–0.95). Momentum accumulates velocity across steps, letting the ball roll past shallow local minima and find deeper valleys. This directly mirrors the momentum optimizer used in modern deep learning (SGD with momentum, Adam, etc.).

Technical Implementation

The loss landscape is rendered as a D3 heatmap grid:

function lossFn(x, y) {
  return 0.3 * (x * x + 2 * y * y)
       + 1.2 * Math.sin(1.8 * x) * Math.cos(1.2 * y)
       + 2 + 0.4 * Math.cos(3 * x) * Math.sin(2 * y);
}

This creates a surface with multiple local minima and one global minimum. The contour plot renders a 40×40 grid of colored rectangles using a sequential log scale (Viridis interpolation), plus contour lines for visual depth.

The ball's motion is computed via finite-difference gradients:

function gradFn(x, y) {
  const h = 0.01;
  return {
    dx: (lossFn(x + h, y) - lossFn(x - h, y)) / (2 * h),
    dy: (lossFn(x, y + h) - lossFn(x, y - h)) / (2 * h),
  };
}

And the update rule includes momentum:

v.x = mu * v.x - lr * g.dx;
v.y = mu * v.y - lr * g.dy;
pos.x += v.x;
pos.y += v.y;

The animation loop uses requestAnimationFrame with a 30ms throttle, updating the ball position, trail path, and live loss readout every other frame. React state is batched to avoid flooding the render cycle.

The 5 Levels

Level	Challenge	What It Teaches
1	Press Play and watch the ball roll	GD is just "follow the slope"
2	Reach the global minimum	Find where loss is lowest
3	Reach it with LR ≤ 0.05	Low LR = safe but slow
4	Escape local min with momentum	Momentum lets you bypass shallow valleys
5	Reach global min in ≤ 30 steps	Aggressive LR + skill = speed run

The global minimum is pre-computed by brute-forcing the landscape at 0.05 resolution. Level detection checks both position proximity to the true minimum and the step/learning-rate constraints.

Sorting Machine — Confusion Matrix

The Metaphor

A confusion matrix looks intimidating for newcomers — four boxes with cryptic abbreviations (TP, TN, FP, FN) that you're supposed to memorize by rote. But the idea is simple: did the model get it right, and if it got it wrong, what kind of wrong?

I turned it into a physical sorting task. Each item is a card showing two numbers: A (actual/ground truth) and P (predicted by the model). Your job is to drag each card into the correct bin.

The four bins have intuitive labels:

True Positive (A=1, P=1) — model said yes, and it was right
True Negative (A=0, P=0) — model said no, and it was right
False Positive (A=0, P=1) — model cried wolf (Type I error)
False Negative (A=1, P=0) — model missed it (Type II error)

Drag-and-Drop with D3

The interaction is pure D3 drag behavior:

const drag = d3.drag()
  .on("drag", function (event, d) {
    d.x = event.x;
    d.y = event.y;
    d3.select(this).attr("transform", `translate(${d.x},${d.y})`);
  })
  .on("end", function (event, d) {
    const mx = event.x + ITEM_W / 2;
    const my = event.y + ITEM_H / 2;
    for (const bin of bins) {
      if (mx >= bin.x && mx <= bin.x + bin.w &&
          my >= bin.y && my <= bin.y + bin.h) {
        handleDrop(d, bin.id);
        d3.select(this).remove();
        break;
      }
    }
  });

On drag.start, the card is raised to the top of the SVG z-order. During drag, the card follows the mouse. On drag.end, hit-testing checks if the card center falls within any bin rectangle — if so, it's removed from the unplaced pool and the bin count increments.

What You Learn

As you sort cards, the sidebar shows live metrics:

Placed / total
Correct count (cards sorted into the right bin)
Accuracy percentage (color-coded: green ≥ 80%, yellow ≥ 50%, pink < 50%)
Per-bin counts (TP, TN, FP, FN)

Each level increases the number of items (4 → 8 → 12 → 16 → 16) and the challenge is to sort more accurately. The model's accuracy is simulated via trueRate and falseRate parameters that vary by difficulty — so some levels have more ambiguous predictions.

Technical Note: Generating Items

Each card is generated with a controlled accuracy distribution:

function generateItems(count, difficulty) {
  const accBase = Math.max(0.45, 0.7 - difficulty * 0.06);
  const trueRate = Math.min(0.9, accBase + 0.15);
  // ... generate actual labels randomly, then predict correctly
  // or incorrectly based on trueRate
}

This means harder levels have noisier predictions, making the sorting task genuinely more challenging — you have to think about each card rather than pattern-matching.

Emperor's Tailor — Overfitting Simulator

The Metaphor

There's an old story about an emperor whose tailor made him a suit that fit perfectly — as long as he stood absolutely still in the exact pose he was measured in. The moment he moved, the seams ripped.

That's overfitting. A model that fits the training data perfectly (including all the noise) but fails on new data. The "Emperor's Tailor" lets you be the tailor — you control the polynomial degree, and you watch the model go from underfit (too simple) to "just right" (generalizes) to overfit (wildly wiggly).

Polynomial Regression

The core computation is polynomial regression solved via the normal equations:

$$X\theta = y \quad \Rightarrow \quad \theta = (X^T X)^{-1} X^T y$$

The Vandermonde matrix X has rows [1, x, x², ..., xᵈ] for each training point. The system is solved with Gaussian elimination (with partial pivoting) since JavaScript doesn't have numpy:

function polyfit(points, degree) {
  // Build Vandermonde matrix
  // Solve (X^T X) * coeffs = X^T y via Gaussian elimination
  // Return coefficient array
}

function polyEval(coeffs, x) {
  return coeffs.reduce((s, c, i) => s + c * Math.pow(x, i), 0);
}

For degree 14 with 16 training points, the 15×15 system is solved directly — no iteration, no approximation. This gives the exact polynomial that minimizes MSE for the given degree.

The Visualization

The D3 canvas shows:

Training points (blue circles) — used to fit the polynomial
Test points (green circles) — held out, never seen by the model
Fit curve (pink line) — the polynomial of the chosen degree
True sine wave (white dashed line) — the underlying signal, showing what the model should learn

Train MSE and Test MSE are shown in the bottom-left of the canvas and in the sidebar. The sidebar also has an underfit/good fit/overfit indicator that lights up based on the current degree.

The Bias-Variance Tradeoff in Action

Here's the magic moment every learner should experience:

Degree 1 — Straight line. Train error is high. Test error is high. The model is too simple (underfitting).
Degree 3–5 — The curve follows the sine wave without chasing noise. Train error drops. Test error drops too. The model generalizes (sweet spot).
Degree 10+ — The curve wiggles through every training point. Train error approaches zero. But test error skyrockets because the model has memorized noise that doesn't exist in the test set (overfitting).

The sidebar coefficient count also increases with degree — showing visually that the model has more "knobs" to turn, and more capacity to memorize.

The 5 Levels

Level	Challenge	What It Teaches
1	Fit any polynomial (deg ≥ 1)	Models can be tuned
2	Set degree to exactly 1	Underfitting — a line can't capture curves
3	Set degree ≥ 10	Overfitting — wiggly curves that memorize noise
4	Find degree 3–6 with test error < 0.08	The sweet spot generalizes
5	Achieve test MSE < 0.03	The best fit captures signal, not noise

What the Architecture Looks Like Now

src/workbenches/
├── LinearRegression/      # Stretchy Rope
├── KMeans/                # Magnetic Clusters
├── DecisionTrees/         # 20 Questions
├── Ensemble/              # Jury Room
├── NeuralNetworks/        # Lego Blocks (React Three Fiber 3D)
├── GradientDescent/       # Roller Coaster (contour + animation)
├── ConfusionMatrix/       # Sorting Machine (drag + drop)
└── Overfitting/           # Emperor's Tailor (polynomial regression)

All three new modules follow the exact same structural pattern as the originals:

Single self-contained .jsx file
D3.js for SVG rendering
useLevelSystem hook for progression
DefinitionGuide + LiveHint for inline education
Auto-detected level completion

Bundle Impact

The three new modules added ~50KB of JavaScript to the production bundle (gzipped). Since they're all D3-based (not Three.js), they don't increase the heavy dependencies.

Reflections

What Worked Well

The gradient descent contour plot — heatmaps are intuitive. Everyone immediately understands "dark = low, light = high" and reads it as a landscape. The animation makes the optimization process visceral in a way that static diagrams can't match.
Drag-and-drop sorting — the confusion matrix went from "memorize 4 boxes" to "play a game." Multiple people who tested it said they finally understood the difference between FP and FN after sorting just a few cards.
The polynomial degree slider — watching the curve morph smoothly from a straight line to a wiggly mess is hypnotic. The "aha moment" comes when test error goes up while train error goes down, which is the core insight of the bias-variance tradeoff.

What I'd Improve

Gradient Descent on mobile — the contour plot works but the sliders are fiddly. This is a general problem with the project's fixed 700×500 canvas sizing.
More loss landscapes — I'd love to add preset landscapes (the classic "bowl", the "Rosenbrock banana", a saddle point) that users can switch between to see how different functions affect gradient descent.
Drag-and-drop polish — the sorting machine currently snaps cards to a grid when they're placed in a bin. It would be nicer to animate them into overlapping stacks that show the count.

Try It

git clone https://github.com/harishkotra/play-learn-ml.git
cd play-learn-ml
npm install && npm run dev

Or read the source — every module is a single self-contained file. The newest ones are:

src/workbenches/GradientDescent/RollerCoaster.jsx
src/workbenches/ConfusionMatrix/SortingMachine.jsx
src/workbenches/Overfitting/EmperorsTailor.jsx

Try it here: https://play-learn-ml.vercel.app/

Code & more: https://www.dailybuild.xyz/project/162-play-learn-ml

DEV Community

Play & Learn ML Expands: Gradient Descent, Confusion Matrix, and the Overfitting Simulator

What's New

Roller Coaster — Gradient Descent

The Metaphor

What You Control

Technical Implementation

The 5 Levels

Sorting Machine — Confusion Matrix

The Metaphor

Drag-and-Drop with D3

What You Learn

Technical Note: Generating Items

Emperor's Tailor — Overfitting Simulator

The Metaphor

Polynomial Regression

The Visualization

The Bias-Variance Tradeoff in Action

The 5 Levels

What the Architecture Looks Like Now

Bundle Impact

Reflections

What Worked Well

What I'd Improve

Try It

Top comments (0)