The One-Line Summary: Underfitting = didn't learn enough. Overfitting = learned too much (including the wrong things). Your job is to find the middle ground.
The Story of Three Chefs
Let me tell you about three chefs learning to cook pasta.
Chef 1: The Lazy Learner
Chef 1 reads one recipe and concludes: "Pasta is just boiling stuff in water."
Now every dish is the same. Spaghetti? Boiled. Lasagna? Boiled. Ravioli? You guessed it — boiled.
Customers complain. Chef 1 says, "But I followed the rule!"
The problem: Chef 1 learned too little. The rule is too simple.
Chef 2: The Obsessive Memorizer
Chef 2 memorizes everything from cooking school:
- "On March 15th at 2:47 PM, I added salt and it was perfect"
- "The pasta was great when I wore my blue apron"
- "It worked when the customer had brown hair"
Now Chef 2 can only cook well under those exact conditions. Different day? Different apron? Disaster.
The problem: Chef 2 learned too much — including things that don't matter.
Chef 3: The Smart Learner
Chef 3 understands the principles:
- Al dente means slight resistance when bitten
- Salt the water generously
- Save pasta water for the sauce
- Different shapes hold different sauces
Chef 3 can cook any pasta dish, even ones never seen before.
This is what we want.
Now here's the thing: Your machine learning model is one of these chefs.
- Chef 1 = Underfitting
- Chef 2 = Overfitting
- Chef 3 = Just right
Let's dive deeper.
What is Underfitting?
Underfitting happens when your model is too simple to capture the patterns in your data.
It's like trying to describe a rainbow with just one color. Sure, "it's colorful" is technically true, but you're missing so much.
Real-Life Examples of Underfitting
- Predicting exam scores using only "hours studied" — ignoring sleep, stress, teaching quality, prior knowledge
- Diagnosing illness based only on temperature — ignoring symptoms, history, age, lifestyle
- Predicting traffic using only time of day — ignoring weather, accidents, events, construction
The model under-learns. It's too lazy to capture complexity.
What Underfitting Looks Like
Your model's logic:
Input: Complex real-world data
↓
Model: "Everything is average"
↓
Output: Wrong predictions (on everything)
The Key Sign
Both training AND test performance are bad.
The model hasn't even learned the training data properly. It's failing at everything.
What is Overfitting?
Overfitting happens when your model memorizes the training data instead of learning the underlying patterns.
It's like a student who memorizes that "the answer to question 3 is B" instead of understanding why it's B. Change the question slightly, and they're lost.
Real-Life Examples of Overfitting
- Memorizing that "John bought milk on Tuesday" — and expecting ALL Johns to buy milk on Tuesdays
- Noticing your team won when you wore lucky socks — and believing socks determine game outcomes
- A spam filter that learns "emails from Bob are spam" — because the only spam in training was from someone named Bob
The model over-learns. It memorizes noise as if it were signal.
What Overfitting Looks Like
Your model's logic:
Training data: "Learn this!"
↓
Model: "I memorized EVERYTHING, including:
- The typos
- The outliers
- The measurement errors
- The random coincidences"
↓
New data: "Predict this!"
↓
Model: "I've never seen this exact thing before. PANIC!"
↓
Output: Wrong predictions (on new data)
The Key Sign
Training performance is GREAT, but test performance is BAD.
The model aced the practice test by memorizing answers. But the real exam has different questions.
The Perfect Analogy: Studying for an Exam
This is my favorite way to explain it.
| Student Type | Study Method | Practice Test | Real Exam | ML Term |
|---|---|---|---|---|
| Lazy Student | Skimmed notes once | 55% | 52% | Underfitting |
| Memorizer | Memorized everything word-for-word | 99% | 45% | Overfitting |
| Smart Student | Understood concepts | 88% | 85% | Good fit |
The Lazy Student (Underfitting):
- "History is about old stuff"
- Fails practice test
- Fails real exam
- Never actually learned anything
The Memorizer (Overfitting):
- "Page 47, paragraph 3, says..."
- Aces practice test (it's the same material!)
- Fails real exam (questions are rephrased)
- Learned the wrong things
The Smart Student (Good Fit):
- "The French Revolution happened because of economic inequality and..."
- Does well on practice test
- Does well on real exam
- Learned the right things
How to Detect Overfitting and Underfitting
This is actually simple. You just need to compare two numbers:
- Training performance — How well does the model do on data it learned from?
- Test performance — How well does the model do on data it's never seen?
The Detection Cheat Sheet
| Training Score | Test Score | Gap | Diagnosis |
|---|---|---|---|
| Low (60%) | Low (58%) | Small | Underfitting |
| High (99%) | Low (50%) | Large | Overfitting |
| Good (87%) | Good (84%) | Small | Good fit |
The gap between training and test is everything.
- Small gap + both low = Underfitting
- Large gap = Overfitting
- Small gap + both high = Perfect!
Visual Detection: Learning Curves
One of the best ways to detect these problems is by plotting learning curves.
What You Plot
- X-axis: Training set size (or training epochs)
- Y-axis: Performance (accuracy, loss, etc.)
- Two lines: Training score and Validation score
Underfitting Pattern
Performance
|
80% |
|
60% | - - - - - - - - - - - - Training
| . . . . . . . . . . . . Validation
40% |
| (Both stuck low, close together)
|
└─────────────────────────────────
Training Progress →
Both lines are low and flat. The model can't learn no matter how much data you give it.
Overfitting Pattern
Performance
|
| __________________ Training (99%)
95% |
|
75% |
| BIG GAP!
55% | .................. Validation (55%)
|
└─────────────────────────────────
Training Progress →
Training keeps improving, but validation plateaus or gets worse. The gap keeps growing.
Good Fit Pattern
Performance
|
| __________________ Training (88%)
85% | .................. Validation (84%)
| (Small gap, both high)
65% |
|
45% |
|
└─────────────────────────────────
Training Progress →
Both lines converge to high values with a small gap. Beautiful!
Let's See It In Code
Here's how to detect these problems in Python:
import numpy as np
from sklearn.model_selection import train_test_split, learning_curve
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
import matplotlib.pyplot as plt
# Generate sample data
np.random.seed(42)
X = np.linspace(0, 10, 200).reshape(-1, 1)
y = 3 * X.squeeze() + np.sin(X.squeeze() * 3) * 2 + np.random.randn(200)
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Model 1: Underfitting (too simple)
model_underfit = LinearRegression()
model_underfit.fit(X_train, y_train)
print("=== UNDERFITTING MODEL ===")
print(f"Training Score: {model_underfit.score(X_train, y_train):.3f}")
print(f"Test Score: {model_underfit.score(X_test, y_test):.3f}")
print(f"Gap: {abs(model_underfit.score(X_train, y_train) - model_underfit.score(X_test, y_test)):.3f}")
print()
# Model 2: Overfitting (too complex)
model_overfit = make_pipeline(PolynomialFeatures(degree=20), LinearRegression())
model_overfit.fit(X_train, y_train)
print("=== OVERFITTING MODEL ===")
print(f"Training Score: {model_overfit.score(X_train, y_train):.3f}")
print(f"Test Score: {model_overfit.score(X_test, y_test):.3f}")
print(f"Gap: {abs(model_overfit.score(X_train, y_train) - model_overfit.score(X_test, y_test)):.3f}")
print()
# Model 3: Good fit
model_good = make_pipeline(PolynomialFeatures(degree=4), LinearRegression())
model_good.fit(X_train, y_train)
print("=== GOOD FIT MODEL ===")
print(f"Training Score: {model_good.score(X_train, y_train):.3f}")
print(f"Test Score: {model_good.score(X_test, y_test):.3f}")
print(f"Gap: {abs(model_good.score(X_train, y_train) - model_good.score(X_test, y_test)):.3f}")
Output:
=== UNDERFITTING MODEL ===
Training Score: 0.891
Test Score: 0.879
Gap: 0.012 <- Small gap, but scores could be better
=== OVERFITTING MODEL ===
Training Score: 0.987
Test Score: 0.642
Gap: 0.345 <- HUGE gap! Overfitting!
=== GOOD FIT MODEL ===
Training Score: 0.953
Test Score: 0.941
Gap: 0.012 <- Small gap, high scores. Perfect!
How to Fix Underfitting
Your model is too simple. It needs to be smarter.
Solution 1: Add More Features
Give your model more information to work with.
# Before: Only using square footage
features = ['square_feet']
# After: Rich feature set
features = ['square_feet', 'bedrooms', 'bathrooms',
'location', 'age', 'garage', 'school_rating']
Solution 2: Use a More Complex Model
Upgrade your model's brain.
# Before: Simple linear model
from sklearn.linear_model import LinearRegression
model = LinearRegression()
# After: More powerful model
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=100)
Solution 3: Reduce Regularization
If you're using regularization, you might be holding your model back too much.
# Before: Heavy regularization
from sklearn.linear_model import Ridge
model = Ridge(alpha=100) # Too restrictive
# After: Lighter regularization
model = Ridge(alpha=0.1) # More freedom to learn
Solution 4: Train Longer
Sometimes your model just needs more time.
# Before: Not enough training
model.fit(X_train, y_train, epochs=10)
# After: More training
model.fit(X_train, y_train, epochs=100)
Solution 5: Engineer Better Features
Create new features that capture hidden patterns.
# Before: Raw features
df['price_per_sqft'] = df['price'] / df['square_feet']
df['age_category'] = pd.cut(df['age'], bins=[0, 10, 30, 100])
df['location_score'] = df['school_rating'] * df['crime_safety']
How to Fix Overfitting
Your model is memorizing. It needs to calm down.
Solution 1: Get More Training Data
The more examples you have, the harder it is to memorize them all.
# With 100 samples: Easy to memorize
# With 100,000 samples: Must learn patterns instead
# Techniques to get more data:
# - Collect more real data
# - Data augmentation (images: flip, rotate, crop)
# - Synthetic data generation
Solution 2: Simplify Your Model
Use a less complex model.
# Before: Overly complex
model = make_pipeline(PolynomialFeatures(degree=15), LinearRegression())
# After: Simpler
model = make_pipeline(PolynomialFeatures(degree=3), LinearRegression())
Solution 3: Add Regularization
Penalize complexity to prevent memorization.
# L2 Regularization (Ridge) - Shrinks coefficients
from sklearn.linear_model import Ridge
model = Ridge(alpha=1.0)
# L1 Regularization (Lasso) - Can zero out features
from sklearn.linear_model import Lasso
model = Lasso(alpha=0.1)
# ElasticNet - Combination of both
from sklearn.linear_model import ElasticNet
model = ElasticNet(alpha=0.1, l1_ratio=0.5)
Solution 4: Use Dropout (Neural Networks)
Randomly ignore neurons during training.
from tensorflow.keras.layers import Dense, Dropout
model = Sequential([
Dense(128, activation='relu'),
Dropout(0.3), # Randomly drop 30% of neurons
Dense(64, activation='relu'),
Dropout(0.3),
Dense(1)
])
Solution 5: Early Stopping
Stop training before overfitting kicks in.
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(
monitor='val_loss',
patience=10, # Stop if no improvement for 10 epochs
restore_best_weights=True
)
model.fit(X_train, y_train,
validation_split=0.2,
epochs=1000, # Will stop early
callbacks=[early_stop])
Solution 6: Cross-Validation
Test on multiple different splits to get honest performance.
from sklearn.model_selection import cross_val_score
# Instead of one train/test split, use 5 different splits
scores = cross_val_score(model, X, y, cv=5)
print(f"Scores: {scores}")
print(f"Mean: {scores.mean():.3f} (+/- {scores.std() * 2:.3f})")
Solution 7: Remove Noisy Features
Less noise = less to memorize.
# Remove features that don't help
# - Features with low correlation to target
# - Features with high correlation to each other
# - Features that are mostly noise
from sklearn.feature_selection import SelectKBest, f_regression
selector = SelectKBest(f_regression, k=10)
X_selected = selector.fit_transform(X, y)
Quick Diagnosis Flowchart
Here's how to diagnose your model in 30 seconds:
Step 1: Check training score
- If training score is LOW → Underfitting. Stop here.
- If training score is HIGH → Go to Step 2.
Step 2: Check test score
- If test score is also HIGH (similar to training) → Good fit!
- If test score is LOW (much worse than training) → Overfitting.
Step 3: Calculate the gap
- Gap > 10-15% → Likely overfitting
- Gap < 5% but both scores low → Underfitting
- Gap < 5% and both scores high → You're done!
The Fix-It Cheat Sheet
Save this for reference:
Underfitting Fixes
| Problem | Solution |
|---|---|
| Too few features | Add more relevant features |
| Model too simple | Use more complex model |
| Too much regularization | Reduce regularization strength |
| Not enough training | Train for more epochs |
| Poor features | Engineer better features |
Overfitting Fixes
| Problem | Solution |
|---|---|
| Too little data | Collect more data / augment |
| Model too complex | Use simpler model |
| No regularization | Add L1/L2 regularization |
| Training too long | Use early stopping |
| Too many features | Feature selection / removal |
| Neural network | Add dropout layers |
Real-World Example: Image Classification
Let's see this in a realistic scenario.
The Task
Build a model to classify cats vs dogs.
Attempt 1: Underfitting
# Using a tiny model
model = Sequential([
Flatten(input_shape=(150, 150, 3)),
Dense(2, activation='softmax') # Way too simple!
])
# Results:
# Training accuracy: 62%
# Validation accuracy: 60%
# Diagnosis: UNDERFITTING - model can't learn the patterns
Fix: Add more layers and neurons.
Attempt 2: Overfitting
# Using a huge model with no regularization
model = Sequential([
Flatten(input_shape=(150, 150, 3)),
Dense(4096, activation='relu'),
Dense(4096, activation='relu'),
Dense(4096, activation='relu'),
Dense(2, activation='softmax')
])
# Results:
# Training accuracy: 99.8%
# Validation accuracy: 58%
# Diagnosis: OVERFITTING - memorized training images
Fix: Add dropout, regularization, get more data.
Attempt 3: Good Fit
# Balanced model with regularization
model = Sequential([
Conv2D(32, (3,3), activation='relu'),
MaxPooling2D(2,2),
Conv2D(64, (3,3), activation='relu'),
MaxPooling2D(2,2),
Flatten(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(2, activation='softmax')
])
# With data augmentation and early stopping
# Results:
# Training accuracy: 92%
# Validation accuracy: 89%
# Diagnosis: GOOD FIT!
The Connection to Bias-Variance
If you've read about bias-variance tradeoff, here's how it connects:
| Concept | Bias | Variance |
|---|---|---|
| Underfitting | HIGH | LOW |
| Overfitting | LOW | HIGH |
| Good fit | LOW | LOW |
- Underfitting = High bias (wrong assumptions, too simple)
- Overfitting = High variance (too sensitive to training data)
They're two sides of the same coin!
Common Mistakes to Avoid
Mistake 1: Only Looking at Training Accuracy
# WRONG: "My model has 99% accuracy!"
print(f"Training accuracy: {model.score(X_train, y_train)}")
# RIGHT: Check both
print(f"Training accuracy: {model.score(X_train, y_train)}")
print(f"Test accuracy: {model.score(X_test, y_test)}")
Mistake 2: Testing on Training Data
# WRONG: Using same data for train and test
model.fit(X, y)
model.score(X, y) # This is meaningless!
# RIGHT: Proper train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model.fit(X_train, y_train)
model.score(X_test, y_test) # Honest evaluation
Mistake 3: Data Leakage
# WRONG: Scaling before splitting
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X) # Sees ALL data
X_train, X_test = train_test_split(X_scaled)
# RIGHT: Scale after splitting
X_train, X_test = train_test_split(X)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train) # Only sees training
X_test_scaled = scaler.transform(X_test) # Uses training statistics
Key Takeaways
Let's summarize everything:
- Underfitting = Model is too simple, fails on everything
- Overfitting = Model memorized training data, fails on new data
- Detection = Compare training vs test performance
- The gap = Large gap means overfitting, small gap with low scores means underfitting
- Fix underfitting = More features, complex model, less regularization
- Fix overfitting = More data, simpler model, regularization, dropout
Your Action Items
After reading this article:
- Train a model on any dataset
- Check both training and test scores
- Calculate the gap between them
- Diagnose using the cheat sheet
- Apply the fix based on the diagnosis
- Repeat until you get a good fit
What's Next?
Now that you understand overfitting and underfitting, you're ready to learn:
- Regularization (L1, L2) — The overfitting killer
- Cross-validation — Better ways to evaluate
- Ensemble methods — Combining models for stability
- Hyperparameter tuning — Finding the sweet spot automatically
Follow me for the next article in this series!
Let's Connect!
If this helped you understand overfitting and underfitting, drop a heart!
Questions? Ask in the comments — I read and respond to every one.
Found a mistake? Let me know. I love improving these articles.
The difference between a junior and senior ML engineer? The senior knows how to diagnose and fix these problems quickly. Now you have that skill too.
Share this with a friend learning ML. These concepts are fundamental, and the right explanation makes all the difference.
Happy learning!
Top comments (0)