DEV Community

Devanshu Biswas
Devanshu Biswas

Posted on

Regularization From Scratch: L1 vs L2, Visualized

A flexible model fit to noisy data will wiggle through every point and generalize terribly. Regularization is the fix: penalize big coefficients so the model prefers a simpler, smoother answer. Here it is, live, with L1 vs L2.

🪢 Drag λ and watch overfitting tame: https://dev48v.infy.uk/ml/day17-regularization.html

The idea

Add a penalty on coefficient size to the loss:

  • L2 (Ridge): penalize the sum of squares → shrinks all coefficients smoothly toward zero.
  • L1 (Lasso): penalize the sum of absolute values → drives some coefficients to exactly zero (automatic feature selection).

In the demo a degree-9 polynomial overfits 12 noisy points at λ=0 (coefficients in the thousands). Slide λ up and the curve smooths toward the true shape; push too far and it underfits. Watch the coefficient bars: L2 shrinks them all, L1 zeroes several out.

λ is the dial

λ controls the bias-variance trade-off. Too small → overfit (high variance). Too big → underfit (high bias). The sweet spot minimizes held-out error — pick it with cross-validation.

Practical notes

Standardize your features first (penalties are scale-sensitive). Ridge has a clean closed form; Lasso needs an iterative solver. Elastic Net blends both.

🔨 Built from scratch (polynomial features → ridge closed-form with λI → L1 coordinate descent) on the page: https://dev48v.infy.uk/ml/day17-regularization.html

Part of MachineLearningFromZero. 🌐 https://dev48v.infy.uk

Top comments (0)