DEV Community

Girma
Girma

Posted on

The Heart of Machine Learning: Underfitting, Overfitting, and How Models Actually Learn

Imagine you’re teaching a kid math.

If the kid just memorizes every single example you give → he aces the homework but bombs the test.
If the kid barely understands anything → he fails both homework and the test.

That’s exactly what happens with machine learning models.
These three ideas — underfitting, overfitting, and generalization — are the real “physics” behind why some models work in the real world and others don’t.
Let’s break them down in the simplest, clearest way possible (with pictures so your brain doesn’t hurt).

  1. Generalization – The Only Thing That Actually Matters Generalization = the model performs well on new, unseen data, not just the data it was trained on. You train on Dataset A (D_train). You test on Dataset B (D_test). If the accuracy is almost the same → great generalization. If it crashes on the test set → poor generalization. The model isn’t learning your specific photos, numbers, or sentences. It’s trying to learn the hidden rules (the underlying distribution) of the world.
  2. Overfitting – The Student Who Memorized Everything The model becomes a parrot. It learns:

Real patterns ✅
Random noise ❌

Result?
Training error → almost zero
Test error → sky high
Classic signs:

Way too many parameters (a huge neural net)
Not enough training data
No regularization

Think of it as a student who memorizes every past exam question word-for-word instead of understanding the concepts.
medium.commedium.com

  1. Underfitting – The Student Who Gave Up The model is too dumb or too lazy. It can’t even capture the basic patterns in the training data. You get:

High training error
High test error

Causes:

Model too simple (tiny linear regression on complex data)
Training stopped too early
Bad features

It’s like trying to predict house prices using only the color of the front door.
superannotate.comOverfitting and underfitting in machine learning | SuperAnnotate

  1. Bias–Variance Tradeoff (The Real Engine) This is the fundamental law. Bias = error because your model makes wrong assumptions (too simple) Variance = error because your model is too sensitive to small changes in the training data (too complex) Here’s the famous picture that explains everything: oaconn.medium.comConceptualizing the Bias-Variance Trade-Off | by Orin Conn | Medium Sweet spot in the middle = best generalization.
  2. The Modern Surprise: Double Descent Old textbooks said: “More complexity = worse generalization after a point.” Deep learning laughed at them. Today we see the double descent curve: Error goes down → up (classic overfitting) → then down again when the model becomes ridiculously huge. This is why GPT-4, Stable Diffusion, etc. work at all. medium.comBeyond Overfitting and Beyond Silicon: The double descent curve | by LightOn | Medium
  3. The 4 Pillars That Create Generalization Generalization doesn’t come from magic. It emerges from four things working together:

Data → size, quality, diversity
Model Architecture → right inductive bias (CNNs love images, Transformers love sequences)
Objective Function → loss + regularization terms
Optimization → SGD, Adam, learning rate tricks

Change any one pillar and the whole building shakes.
Best Books – From Zero to Research Level
Beginner (build intuition)

Hands-On Machine Learning – Aurélien Géron (practical gold)
Pattern Recognition and Machine Learning – Christopher Bishop (bias-variance explained perfectly)

Intermediate (theory)

Understanding Machine Learning: From Theory to Algorithms – Shalev-Shwartz & Ben-David
The Elements of Statistical Learning – Hastie, Tibshirani, Friedman (the bible)

Advanced / Research (what experts read)

Deep Learning – Goodfellow, Bengio, Courville
Deep Learning Generalization: Theoretical Foundations and Practical Strategies – Liu Peng (the book that goes deep into double descent, NTK, overparameterization)
Information Theory, Inference, and Learning Algorithms – David MacKay

Recommended learning order
Géron → Bishop → Shalev-Shwartz → Goodfellow → Liu Peng

If you found this article helpful and want to dive deeper into machine learning, deep learning, and practical projects, you can connect with me,

Kaggle – Explore my notebooks, datasets, and competitions:
https://www.kaggle.com/girmawakeyo
GitHub – Check out my code, experiments, and open-source projects:
https://github.com/Girma35
X Follow for insights, updates, and discussions on AI and software development:
https://x.com/Girma880731631

Feel free to follow, explore, or reach out. I look forward to sharing knowledge and building projects together.

Top comments (0)