Same data, two different "nearest neighbours" — just because one feature is measured in tens of thousands and another in tens. That's why feature scaling matters, and it's a one-liner most beginners skip.
⚖️ Toggle scaling, watch the answer flip: https://dev48v.infy.uk/ml/day10-feature-scaling.html
The problem
Many models measure DISTANCE (k-NN, k-means, SVM, PCA) or walk a loss surface (gradient descent). In all of them, a feature's numeric SIZE — not its importance — decides its influence.
d = Math.sqrt((a.income - b.income)**2 + (a.age - b.age)**2);
Income spans ~80,000; age ~30. Squared, income's gap is a million-to-one bigger — so "nearest neighbour" silently becomes "nearest income" and age is ignored. The model isn't broken; the scales are.
The fix: two one-liners
const z = (x, mean, std) => (x - mean) / std; // standardization (z-score): mean 0, std 1
const minmax = (x, lo, hi) => (x - lo) / (hi - lo); // squash into [0, 1]
Z-score is the default; min-max when you need bounded inputs.
Two gotchas
- Fit on train, apply to test. Compute mean/std from training data only and reuse those numbers — scaling with the test set's stats leaks information.
- Trees don't care. Decision trees and forests split one feature at a time, so they're scale-immune. If scaling changes your result, you're using a distance/gradient model and MUST scale.
Flip the toggle and watch the nearest neighbour change.
Top comments (0)