A flexible model fit to noisy data will wiggle through every point and generalize terribly. Regularization is the fix: penalize big coefficients so the model prefers a simpler, smoother answer. Here it is, live, with L1 vs L2.
🪢 Drag λ and watch overfitting tame: https://dev48v.infy.uk/ml/day17-regularization.html
The idea
Add a penalty on coefficient size to the loss:
- L2 (Ridge): penalize the sum of squares → shrinks all coefficients smoothly toward zero.
- L1 (Lasso): penalize the sum of absolute values → drives some coefficients to exactly zero (automatic feature selection).
In the demo a degree-9 polynomial overfits 12 noisy points at λ=0 (coefficients in the thousands). Slide λ up and the curve smooths toward the true shape; push too far and it underfits. Watch the coefficient bars: L2 shrinks them all, L1 zeroes several out.
λ is the dial
λ controls the bias-variance trade-off. Too small → overfit (high variance). Too big → underfit (high bias). The sweet spot minimizes held-out error — pick it with cross-validation.
Practical notes
Standardize your features first (penalties are scale-sensitive). Ridge has a clean closed form; Lasso needs an iterative solver. Elastic Net blends both.
🔨 Built from scratch (polynomial features → ridge closed-form with λI → L1 coordinate descent) on the page: https://dev48v.infy.uk/ml/day17-regularization.html
Part of MachineLearningFromZero. 🌐 https://dev48v.infy.uk
Top comments (0)