Random Forest From Scratch: Many Weak Trees Beat One Strong Tree

#machinelearning #javascript #ai #beginners

One deep decision tree overfits — a jagged boundary chasing every noisy point. Average a bunch of them, each trained a little differently, and the noise cancels into a smooth, robust boundary. That's a random forest, and you can watch it happen.

🌲 Add trees, watch it smooth: https://dev48v.infy.uk/ml/day11-random-forest.html

The wisdom-of-crowds idea

If many models each make DIFFERENT mistakes, averaging cancels the errors while keeping the shared signal. The trick is making the trees genuinely different — and a forest uses two kinds of randomness:

1. Bagging — random data per tree

const sample = data.map(() => randomChoice(data));  // bootstrap: sample WITH replacement

Every tree sees a slightly different dataset and grows a different shape.

2. Random features per split

const candidates = pickRandom(features, Math.sqrt(features.length));

Considering only a random feature subset at each split DE-CORRELATES the trees, so they don't all lean on the same dominant feature.

3. Vote → variance melts

return mode(forest.map(tree => tree.predict(x)));

In the demo, drag from 1 tree (jagged, overfit) to 40 (smooth, stable) — accuracy on unseen data climbs with zero tuning. Averaging crushes variance while keeping the trees' low bias.

Why it's a default workhorse

Little tuning, handles mixed feature types, no scaling needed, resists overfitting, free feature-importance. Often the first strong baseline for tabular data. Grow a forest.