Sachin Kr. Rajput

Posted on Jan 13

Overfitting & Underfitting: Why Your Model Aced the Practice Test But Failed the Real Exam

#ai #beginners #machinelearning #datascience

The One-Line Summary: Underfitting = didn't learn enough. Overfitting = learned too much (including the wrong things). Your job is to find the middle ground.

The Story of Three Chefs

Let me tell you about three chefs learning to cook pasta.

Chef 1: The Lazy Learner

Chef 1 reads one recipe and concludes: "Pasta is just boiling stuff in water."

Now every dish is the same. Spaghetti? Boiled. Lasagna? Boiled. Ravioli? You guessed it — boiled.

Customers complain. Chef 1 says, "But I followed the rule!"

The problem: Chef 1 learned too little. The rule is too simple.

Chef 2: The Obsessive Memorizer

Chef 2 memorizes everything from cooking school:

"On March 15th at 2:47 PM, I added salt and it was perfect"
"The pasta was great when I wore my blue apron"
"It worked when the customer had brown hair"

Now Chef 2 can only cook well under those exact conditions. Different day? Different apron? Disaster.

The problem: Chef 2 learned too much — including things that don't matter.

Chef 3: The Smart Learner

Chef 3 understands the principles:

Al dente means slight resistance when bitten
Salt the water generously
Save pasta water for the sauce
Different shapes hold different sauces

Chef 3 can cook any pasta dish, even ones never seen before.

This is what we want.

Now here's the thing: Your machine learning model is one of these chefs.

Chef 1 = Underfitting
Chef 2 = Overfitting
Chef 3 = Just right

Let's dive deeper.

What is Underfitting?

Underfitting happens when your model is too simple to capture the patterns in your data.

It's like trying to describe a rainbow with just one color. Sure, "it's colorful" is technically true, but you're missing so much.

Real-Life Examples of Underfitting

Predicting exam scores using only "hours studied" — ignoring sleep, stress, teaching quality, prior knowledge
Diagnosing illness based only on temperature — ignoring symptoms, history, age, lifestyle
Predicting traffic using only time of day — ignoring weather, accidents, events, construction

The model under-learns. It's too lazy to capture complexity.

What Underfitting Looks Like

Your model's logic:

Input: Complex real-world data
       ↓
Model: "Everything is average"
       ↓
Output: Wrong predictions (on everything)

The Key Sign

Both training AND test performance are bad.

The model hasn't even learned the training data properly. It's failing at everything.

What is Overfitting?

Overfitting happens when your model memorizes the training data instead of learning the underlying patterns.

It's like a student who memorizes that "the answer to question 3 is B" instead of understanding why it's B. Change the question slightly, and they're lost.

Real-Life Examples of Overfitting

Memorizing that "John bought milk on Tuesday" — and expecting ALL Johns to buy milk on Tuesdays
Noticing your team won when you wore lucky socks — and believing socks determine game outcomes
A spam filter that learns "emails from Bob are spam" — because the only spam in training was from someone named Bob

The model over-learns. It memorizes noise as if it were signal.

What Overfitting Looks Like

Your model's logic:

Training data: "Learn this!"
               ↓
Model: "I memorized EVERYTHING, including:
        - The typos
        - The outliers  
        - The measurement errors
        - The random coincidences"
               ↓
New data: "Predict this!"
               ↓
Model: "I've never seen this exact thing before. PANIC!"
               ↓
Output: Wrong predictions (on new data)

The Key Sign

Training performance is GREAT, but test performance is BAD.

The model aced the practice test by memorizing answers. But the real exam has different questions.

The Perfect Analogy: Studying for an Exam

This is my favorite way to explain it.

Student Type	Study Method	Practice Test	Real Exam	ML Term
Lazy Student	Skimmed notes once	55%	52%	Underfitting
Memorizer	Memorized everything word-for-word	99%	45%	Overfitting
Smart Student	Understood concepts	88%	85%	Good fit

The Lazy Student (Underfitting):

"History is about old stuff"
Fails practice test
Fails real exam
Never actually learned anything

The Memorizer (Overfitting):

"Page 47, paragraph 3, says..."
Aces practice test (it's the same material!)
Fails real exam (questions are rephrased)
Learned the wrong things

The Smart Student (Good Fit):

"The French Revolution happened because of economic inequality and..."
Does well on practice test
Does well on real exam
Learned the right things

How to Detect Overfitting and Underfitting

This is actually simple. You just need to compare two numbers:

Training performance — How well does the model do on data it learned from?
Test performance — How well does the model do on data it's never seen?

The Detection Cheat Sheet

Training Score	Test Score	Gap	Diagnosis
Low (60%)	Low (58%)	Small	Underfitting
High (99%)	Low (50%)	Large	Overfitting
Good (87%)	Good (84%)	Small	Good fit

The gap between training and test is everything.

Small gap + both low = Underfitting
Large gap = Overfitting
Small gap + both high = Perfect!

Visual Detection: Learning Curves

One of the best ways to detect these problems is by plotting learning curves.

What You Plot

X-axis: Training set size (or training epochs)
Y-axis: Performance (accuracy, loss, etc.)
Two lines: Training score and Validation score

Underfitting Pattern

Performance
    |
80% |
    |
60% |  - - - - - - - - - - - - Training
    |  . . . . . . . . . . . . Validation  
40% |
    |  (Both stuck low, close together)
    |
    └─────────────────────────────────
              Training Progress →

Both lines are low and flat. The model can't learn no matter how much data you give it.

Overfitting Pattern

Performance
    |
    |  __________________ Training (99%)
95% |
    |
75% |
    |        BIG GAP!
55% |  .................. Validation (55%)
    |
    └─────────────────────────────────
              Training Progress →

Training keeps improving, but validation plateaus or gets worse. The gap keeps growing.

Good Fit Pattern

Performance
    |
    |  __________________ Training (88%)
85% |  .................. Validation (84%)
    |     (Small gap, both high)
65% |
    |
45% |
    |
    └─────────────────────────────────
              Training Progress →

Both lines converge to high values with a small gap. Beautiful!

Let's See It In Code

Here's how to detect these problems in Python:

import numpy as np
from sklearn.model_selection import train_test_split, learning_curve
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
import matplotlib.pyplot as plt

# Generate sample data
np.random.seed(42)
X = np.linspace(0, 10, 200).reshape(-1, 1)
y = 3 * X.squeeze() + np.sin(X.squeeze() * 3) * 2 + np.random.randn(200)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Model 1: Underfitting (too simple)
model_underfit = LinearRegression()
model_underfit.fit(X_train, y_train)
print("=== UNDERFITTING MODEL ===")
print(f"Training Score: {model_underfit.score(X_train, y_train):.3f}")
print(f"Test Score:     {model_underfit.score(X_test, y_test):.3f}")
print(f"Gap:            {abs(model_underfit.score(X_train, y_train) - model_underfit.score(X_test, y_test)):.3f}")
print()

# Model 2: Overfitting (too complex)
model_overfit = make_pipeline(PolynomialFeatures(degree=20), LinearRegression())
model_overfit.fit(X_train, y_train)
print("=== OVERFITTING MODEL ===")
print(f"Training Score: {model_overfit.score(X_train, y_train):.3f}")
print(f"Test Score:     {model_overfit.score(X_test, y_test):.3f}")
print(f"Gap:            {abs(model_overfit.score(X_train, y_train) - model_overfit.score(X_test, y_test)):.3f}")
print()

# Model 3: Good fit
model_good = make_pipeline(PolynomialFeatures(degree=4), LinearRegression())
model_good.fit(X_train, y_train)
print("=== GOOD FIT MODEL ===")
print(f"Training Score: {model_good.score(X_train, y_train):.3f}")
print(f"Test Score:     {model_good.score(X_test, y_test):.3f}")
print(f"Gap:            {abs(model_good.score(X_train, y_train) - model_good.score(X_test, y_test)):.3f}")

Output:

=== UNDERFITTING MODEL ===
Training Score: 0.891
Test Score:     0.879
Gap:            0.012   <- Small gap, but scores could be better

=== OVERFITTING MODEL ===
Training Score: 0.987
Test Score:     0.642
Gap:            0.345   <- HUGE gap! Overfitting!

=== GOOD FIT MODEL ===
Training Score: 0.953
Test Score:     0.941
Gap:            0.012   <- Small gap, high scores. Perfect!

How to Fix Underfitting

Your model is too simple. It needs to be smarter.

Solution 1: Add More Features

Give your model more information to work with.

# Before: Only using square footage
features = ['square_feet']

# After: Rich feature set
features = ['square_feet', 'bedrooms', 'bathrooms', 
            'location', 'age', 'garage', 'school_rating']

Solution 2: Use a More Complex Model

Upgrade your model's brain.

# Before: Simple linear model
from sklearn.linear_model import LinearRegression
model = LinearRegression()

# After: More powerful model
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=100)

Solution 3: Reduce Regularization

If you're using regularization, you might be holding your model back too much.

# Before: Heavy regularization
from sklearn.linear_model import Ridge
model = Ridge(alpha=100)  # Too restrictive

# After: Lighter regularization
model = Ridge(alpha=0.1)  # More freedom to learn

Solution 4: Train Longer

Sometimes your model just needs more time.

# Before: Not enough training
model.fit(X_train, y_train, epochs=10)

# After: More training
model.fit(X_train, y_train, epochs=100)

Solution 5: Engineer Better Features

Create new features that capture hidden patterns.

# Before: Raw features
df['price_per_sqft'] = df['price'] / df['square_feet']
df['age_category'] = pd.cut(df['age'], bins=[0, 10, 30, 100])
df['location_score'] = df['school_rating'] * df['crime_safety']

How to Fix Overfitting

Your model is memorizing. It needs to calm down.

Solution 1: Get More Training Data

The more examples you have, the harder it is to memorize them all.

# With 100 samples: Easy to memorize
# With 100,000 samples: Must learn patterns instead

# Techniques to get more data:
# - Collect more real data
# - Data augmentation (images: flip, rotate, crop)
# - Synthetic data generation

Solution 2: Simplify Your Model

Use a less complex model.

# Before: Overly complex
model = make_pipeline(PolynomialFeatures(degree=15), LinearRegression())

# After: Simpler
model = make_pipeline(PolynomialFeatures(degree=3), LinearRegression())

Solution 3: Add Regularization

Penalize complexity to prevent memorization.

# L2 Regularization (Ridge) - Shrinks coefficients
from sklearn.linear_model import Ridge
model = Ridge(alpha=1.0)

# L1 Regularization (Lasso) - Can zero out features
from sklearn.linear_model import Lasso
model = Lasso(alpha=0.1)

# ElasticNet - Combination of both
from sklearn.linear_model import ElasticNet
model = ElasticNet(alpha=0.1, l1_ratio=0.5)

Solution 4: Use Dropout (Neural Networks)

Randomly ignore neurons during training.

from tensorflow.keras.layers import Dense, Dropout

model = Sequential([
    Dense(128, activation='relu'),
    Dropout(0.3),  # Randomly drop 30% of neurons
    Dense(64, activation='relu'),
    Dropout(0.3),
    Dense(1)
])

Solution 5: Early Stopping

Stop training before overfitting kicks in.

from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(
    monitor='val_loss',
    patience=10,  # Stop if no improvement for 10 epochs
    restore_best_weights=True
)

model.fit(X_train, y_train, 
          validation_split=0.2,
          epochs=1000,  # Will stop early
          callbacks=[early_stop])

Solution 6: Cross-Validation

Test on multiple different splits to get honest performance.

from sklearn.model_selection import cross_val_score

# Instead of one train/test split, use 5 different splits
scores = cross_val_score(model, X, y, cv=5)
print(f"Scores: {scores}")
print(f"Mean: {scores.mean():.3f} (+/- {scores.std() * 2:.3f})")

Solution 7: Remove Noisy Features

Less noise = less to memorize.

# Remove features that don't help
# - Features with low correlation to target
# - Features with high correlation to each other
# - Features that are mostly noise

from sklearn.feature_selection import SelectKBest, f_regression

selector = SelectKBest(f_regression, k=10)
X_selected = selector.fit_transform(X, y)

Quick Diagnosis Flowchart

Here's how to diagnose your model in 30 seconds:

Step 1: Check training score

If training score is LOW → Underfitting. Stop here.
If training score is HIGH → Go to Step 2.

Step 2: Check test score

If test score is also HIGH (similar to training) → Good fit!
If test score is LOW (much worse than training) → Overfitting.

Step 3: Calculate the gap

Gap > 10-15% → Likely overfitting
Gap < 5% but both scores low → Underfitting
Gap < 5% and both scores high → You're done!

The Fix-It Cheat Sheet

Save this for reference:

Underfitting Fixes

Problem	Solution
Too few features	Add more relevant features
Model too simple	Use more complex model
Too much regularization	Reduce regularization strength
Not enough training	Train for more epochs
Poor features	Engineer better features

Overfitting Fixes

Problem	Solution
Too little data	Collect more data / augment
Model too complex	Use simpler model
No regularization	Add L1/L2 regularization
Training too long	Use early stopping
Too many features	Feature selection / removal
Neural network	Add dropout layers

Real-World Example: Image Classification

Let's see this in a realistic scenario.

The Task

Build a model to classify cats vs dogs.

Attempt 1: Underfitting

# Using a tiny model
model = Sequential([
    Flatten(input_shape=(150, 150, 3)),
    Dense(2, activation='softmax')  # Way too simple!
])

# Results:
# Training accuracy: 62%
# Validation accuracy: 60%
# Diagnosis: UNDERFITTING - model can't learn the patterns

Fix: Add more layers and neurons.

Attempt 2: Overfitting

# Using a huge model with no regularization
model = Sequential([
    Flatten(input_shape=(150, 150, 3)),
    Dense(4096, activation='relu'),
    Dense(4096, activation='relu'),
    Dense(4096, activation='relu'),
    Dense(2, activation='softmax')
])

# Results:
# Training accuracy: 99.8%
# Validation accuracy: 58%
# Diagnosis: OVERFITTING - memorized training images

Fix: Add dropout, regularization, get more data.

Attempt 3: Good Fit

# Balanced model with regularization
model = Sequential([
    Conv2D(32, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(2, activation='softmax')
])

# With data augmentation and early stopping

# Results:
# Training accuracy: 92%
# Validation accuracy: 89%
# Diagnosis: GOOD FIT!

The Connection to Bias-Variance

If you've read about bias-variance tradeoff, here's how it connects:

Concept	Bias	Variance
Underfitting	HIGH	LOW
Overfitting	LOW	HIGH
Good fit	LOW	LOW

Underfitting = High bias (wrong assumptions, too simple)
Overfitting = High variance (too sensitive to training data)

They're two sides of the same coin!

Common Mistakes to Avoid

Mistake 1: Only Looking at Training Accuracy

# WRONG: "My model has 99% accuracy!"
print(f"Training accuracy: {model.score(X_train, y_train)}")

# RIGHT: Check both
print(f"Training accuracy: {model.score(X_train, y_train)}")
print(f"Test accuracy: {model.score(X_test, y_test)}")

Mistake 2: Testing on Training Data

# WRONG: Using same data for train and test
model.fit(X, y)
model.score(X, y)  # This is meaningless!

# RIGHT: Proper train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model.fit(X_train, y_train)
model.score(X_test, y_test)  # Honest evaluation

Mistake 3: Data Leakage

# WRONG: Scaling before splitting
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # Sees ALL data
X_train, X_test = train_test_split(X_scaled)

# RIGHT: Scale after splitting
X_train, X_test = train_test_split(X)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # Only sees training
X_test_scaled = scaler.transform(X_test)  # Uses training statistics

Key Takeaways

Let's summarize everything:

Underfitting = Model is too simple, fails on everything
Overfitting = Model memorized training data, fails on new data
Detection = Compare training vs test performance
The gap = Large gap means overfitting, small gap with low scores means underfitting
Fix underfitting = More features, complex model, less regularization
Fix overfitting = More data, simpler model, regularization, dropout

Your Action Items

After reading this article:

Train a model on any dataset
Check both training and test scores
Calculate the gap between them
Diagnose using the cheat sheet
Apply the fix based on the diagnosis
Repeat until you get a good fit

What's Next?

Now that you understand overfitting and underfitting, you're ready to learn:

Regularization (L1, L2) — The overfitting killer
Cross-validation — Better ways to evaluate
Ensemble methods — Combining models for stability
Hyperparameter tuning — Finding the sweet spot automatically

Follow me for the next article in this series!

Let's Connect!

If this helped you understand overfitting and underfitting, drop a heart!

Questions? Ask in the comments — I read and respond to every one.

Found a mistake? Let me know. I love improving these articles.

The difference between a junior and senior ML engineer? The senior knows how to diagnose and fix these problems quickly. Now you have that skill too.

Share this with a friend learning ML. These concepts are fundamental, and the right explanation makes all the difference.

Happy learning!