DEV Community

Devanshu Biswas
Devanshu Biswas

Posted on

Decision Trees From Scratch: Yes/No Questions That Carve Up Data

A decision tree is the most human machine-learning model: a flowchart of yes/no questions. No black box — you can read the whole thing aloud. I built one from scratch (Gini impurity, greedy splits, recursion) and visualised exactly how it carves up data — including the moment it starts to over-fit.

🌳 Interactive demo: https://dev48v.infy.uk/ml/day5-decision-tree.html

This is Day 5 of my MachineLearningFromZero series — algorithms built from scratch, no scikit-learn.

1. Measure how "mixed" a group is (Gini)

The tree needs a way to score a group of points. Gini impurity is 0 when all one class, 0.5 when it's a coin flip:

function gini(pts) {
  const p1 = pts.filter(p => p.label === 1).length / pts.length;
  return 1 - (p1 * p1 + (1 - p1) * (1 - p1));
}
Enter fullscreen mode Exit fullscreen mode

2. Try every cut, keep the purest split

A node can only ask about one feature at a time, so each split is a straight horizontal or vertical line. Try them all and keep the one that makes the two resulting groups as pure as possible:

for (const axis of [0, 1])
  for (const thr of candidateThresholds) {
    const [L, R] = partition(pts, axis, thr);
    const impurity = (L.length * gini(L) + R.length * gini(R)) / pts.length;
    if (impurity < best.impurity) best = { axis, thr };
  }
Enter fullscreen mode Exit fullscreen mode

That's why decision regions look like rectangles, never diagonals — every cut is axis-aligned.

3. Recurse — a tree is splits all the way down

After the first cut, run the exact same routine on each side, independently:

node.left  = build(L, depth + 1);
node.right = build(R, depth + 1);
Enter fullscreen mode Exit fullscreen mode

4. Stop → make a leaf that votes

Recursion stops when a region is pure or you hit a depth limit. That region becomes a leaf predicting the majority class:

if (depth === maxDepth || gini(pts) === 0)
  return { leaf: true, label: majorityVote(pts) };
Enter fullscreen mode Exit fullscreen mode

5. Predict = walk the questions down

while (!node.leaf)
  node = x[node.axis] < node.thr ? node.left : node.right;
return node.label;
Enter fullscreen mode Exit fullscreen mode

The part you can SEE: over-fitting

Depth is the dial that controls everything. In the demo, slide it to 6 and the boundary fractures into tiny boxes hugging individual noisy points. Train accuracy climbs toward 100% — but the model has memorised noise and will generalise badly.

That's over-fitting, made visible. The fix is pruning / depth limits, and the bigger fix is averaging many trees into a Random Forest (a later lesson).

The takeaway

Greedy "purest cut" + recursion gives you a readable, surprisingly powerful classifier — and a perfect, visual intuition for the bias/variance tradeoff that haunts all of ML.

Open the demo, drag the depth slider, and watch the tree (rendered live) and its boxes change.

Top comments (0)