DEV Community

shangkyu shin
shangkyu shin

Posted on • Originally published at zeromathai.com

Optimization vs Regularization — The Real Reason Your Model Overfits (and How to Fix It)

Most deep learning problems are not architecture problems.

They are training problems.

Specifically:

  • Optimization issues
  • Missing regularization

Cross-posted from Zeromath. Original article:
https://zeromathai.com/en/dl-optimization-regularization-integrated-lecture-en/


Optimization = Learning

Training loop:

  1. Compute gradient
  2. Update weights
  3. Repeat

Why Training Becomes Unstable

Real issues:

  • Noisy gradients (minibatch)
  • Complex loss landscape
  • Different parameter scales

Fix:

  • Momentum → smoother updates
  • RMSProp → adaptive step size
  • Adam → default optimizer

Learning Rate — #1 Debugging Variable

Too large:

  • Loss explodes

Too small:

  • Training stalls

Practical rule

Start high → decay over time


Overfitting — The Main Failure Mode

Pattern:

  • Training loss ↓
  • Validation loss ↓ then ↑

Meaning:

  • Model memorized
  • Generalization failed

Regularization = Control

Goal:
Limit model flexibility


Tools You Actually Use

L2 (weight decay)

  • Stabilizes weights

L1

  • Sparse models

Dropout

  • Prevents dependency

Early stopping

  • Stops over-training

Debugging Checklist

If model overfits:

  1. Add L2
  2. Add dropout
  3. Use early stopping

If model doesn’t learn:

  1. Check learning rate
  2. Try Adam
  3. Normalize inputs

Mental Model

  • Optimization = engine
  • Regularization = brakes

You need both.


Common Mistakes

  • Training longer always helps → ❌
  • Bigger model always better → ❌
  • Adam solves everything → ❌

Final Thought

Good models are not just trained.

They are controlled.


Discussion

What helped your model the most?

  • Learning rate tuning?
  • Regularization?
  • Optimizer choice?

Let’s discuss.

Top comments (0)