DEV Community

shangkyu shin
shangkyu shin

Posted on • Originally published at zeromathai.com

Regularization in Machine Learning — How to Actually Prevent Overfitting (L1, L2, Dropout)

What is regularization in machine learning, and how do you actually prevent overfitting in practice? This guide explains L1 vs L2, dropout, and early stopping with real-world intuition and code.

Cross-posted from Zeromath. Original article: https://zeromathai.com/en/regularization-generalization-en/


The Problem Every ML Engineer Hits

You train a model:

  • training loss → near zero
  • validation loss → terrible

This is not a bug.

It’s overfitting.

Powerful models memorize by default.


The Core Idea

E_aug(w) = E_train(w) + λΩ(w)

Fit the data, but control complexity.


L2 Regularization (Start Here)

  • smooth weights
  • stable training
  • works almost everywhere

L1 Regularization

  • sparse weights
  • feature selection

L1 vs L2 (Quick Decision)

L1 → sparse

L2 → stable


Early Stopping

Stop when validation loss increases.

  • early → generalization
  • late → memorization

Dropout

  • disables neurons randomly
  • reduces co-adaptation

Practical Setup (PyTorch)

optimizer = torch.optim.AdamW(
model.parameters(),
lr=1e-3,
weight_decay=1e-4
)

best_val = float("inf")

for epoch in range(epochs):
train(...)
val = validate(...)

if val < best_val:
    best_val = val
else:
    patience_counter += 1
    if patience_counter > patience:
        break
Enter fullscreen mode Exit fullscreen mode

What Should You Actually Do? (Real Guide)

  1. Start with L2 (weight decay)
  2. Add early stopping
  3. If still overfitting → add dropout
  4. If sparsity needed → use L1

Common Mistakes (Important)

  • stacking dropout + strong L2 + early stopping together
  • assuming more regularization is always better
  • tuning λ without validation

Too much regularization = underfitting.


Real Insight

Regularization is not about reducing error.

It is about controlling model behavior.


Final Thought

Overfitting is not a bug.

It’s what models do by default.

Regularization is how you control it.


What worked best for you — weight decay, dropout, or early stopping?

Top comments (0)