What is regularization in machine learning, and how do you actually prevent overfitting in practice? This guide explains L1 vs L2, dropout, and early stopping with real-world intuition and code.
Cross-posted from Zeromath. Original article: https://zeromathai.com/en/regularization-generalization-en/
The Problem Every ML Engineer Hits
You train a model:
- training loss → near zero
- validation loss → terrible
This is not a bug.
It’s overfitting.
Powerful models memorize by default.
The Core Idea
E_aug(w) = E_train(w) + λΩ(w)
Fit the data, but control complexity.
L2 Regularization (Start Here)
- smooth weights
- stable training
- works almost everywhere
L1 Regularization
- sparse weights
- feature selection
L1 vs L2 (Quick Decision)
L1 → sparse
L2 → stable
Early Stopping
Stop when validation loss increases.
- early → generalization
- late → memorization
Dropout
- disables neurons randomly
- reduces co-adaptation
Practical Setup (PyTorch)
optimizer = torch.optim.AdamW(
model.parameters(),
lr=1e-3,
weight_decay=1e-4
)
best_val = float("inf")
for epoch in range(epochs):
train(...)
val = validate(...)
if val < best_val:
best_val = val
else:
patience_counter += 1
if patience_counter > patience:
break
What Should You Actually Do? (Real Guide)
- Start with L2 (weight decay)
- Add early stopping
- If still overfitting → add dropout
- If sparsity needed → use L1
Common Mistakes (Important)
- stacking dropout + strong L2 + early stopping together
- assuming more regularization is always better
- tuning λ without validation
Too much regularization = underfitting.
Real Insight
Regularization is not about reducing error.
It is about controlling model behavior.
Final Thought
Overfitting is not a bug.
It’s what models do by default.
Regularization is how you control it.
What worked best for you — weight decay, dropout, or early stopping?
Top comments (0)