One deep decision tree overfits — a jagged boundary chasing every noisy point. Average a bunch of them, each trained a little differently, and the noise cancels into a smooth, robust boundary. That's a random forest, and you can watch it happen.
🌲 Add trees, watch it smooth: https://dev48v.infy.uk/ml/day11-random-forest.html
The wisdom-of-crowds idea
If many models each make DIFFERENT mistakes, averaging cancels the errors while keeping the shared signal. The trick is making the trees genuinely different — and a forest uses two kinds of randomness:
1. Bagging — random data per tree
const sample = data.map(() => randomChoice(data)); // bootstrap: sample WITH replacement
Every tree sees a slightly different dataset and grows a different shape.
2. Random features per split
const candidates = pickRandom(features, Math.sqrt(features.length));
Considering only a random feature subset at each split DE-CORRELATES the trees, so they don't all lean on the same dominant feature.
3. Vote → variance melts
return mode(forest.map(tree => tree.predict(x)));
In the demo, drag from 1 tree (jagged, overfit) to 40 (smooth, stable) — accuracy on unseen data climbs with zero tuning. Averaging crushes variance while keeping the trees' low bias.
Why it's a default workhorse
Little tuning, handles mixed feature types, no scaling needed, resists overfitting, free feature-importance. Often the first strong baseline for tabular data. Grow a forest.
Top comments (0)