DEV Community

Cover image for Overfitting & Underfitting: Why Your Model Aced the Practice Test But Failed the Real Exam
Sachin Kr. Rajput
Sachin Kr. Rajput

Posted on

Overfitting & Underfitting: Why Your Model Aced the Practice Test But Failed the Real Exam

The One-Line Summary: Underfitting = didn't learn enough. Overfitting = learned too much (including the wrong things). Your job is to find the middle ground.


The Story of Three Chefs

Let me tell you about three chefs learning to cook pasta.

Chef 1: The Lazy Learner

Chef 1 reads one recipe and concludes: "Pasta is just boiling stuff in water."

Now every dish is the same. Spaghetti? Boiled. Lasagna? Boiled. Ravioli? You guessed it — boiled.

Customers complain. Chef 1 says, "But I followed the rule!"

The problem: Chef 1 learned too little. The rule is too simple.


Chef 2: The Obsessive Memorizer

Chef 2 memorizes everything from cooking school:

  • "On March 15th at 2:47 PM, I added salt and it was perfect"
  • "The pasta was great when I wore my blue apron"
  • "It worked when the customer had brown hair"

Now Chef 2 can only cook well under those exact conditions. Different day? Different apron? Disaster.

The problem: Chef 2 learned too much — including things that don't matter.


Chef 3: The Smart Learner

Chef 3 understands the principles:

  • Al dente means slight resistance when bitten
  • Salt the water generously
  • Save pasta water for the sauce
  • Different shapes hold different sauces

Chef 3 can cook any pasta dish, even ones never seen before.

This is what we want.


Now here's the thing: Your machine learning model is one of these chefs.

  • Chef 1 = Underfitting
  • Chef 2 = Overfitting
  • Chef 3 = Just right

Let's dive deeper.


What is Underfitting?

Underfitting happens when your model is too simple to capture the patterns in your data.

It's like trying to describe a rainbow with just one color. Sure, "it's colorful" is technically true, but you're missing so much.

Real-Life Examples of Underfitting

  • Predicting exam scores using only "hours studied" — ignoring sleep, stress, teaching quality, prior knowledge
  • Diagnosing illness based only on temperature — ignoring symptoms, history, age, lifestyle
  • Predicting traffic using only time of day — ignoring weather, accidents, events, construction

The model under-learns. It's too lazy to capture complexity.

What Underfitting Looks Like

Your model's logic:

Input: Complex real-world data
       ↓
Model: "Everything is average"
       ↓
Output: Wrong predictions (on everything)
Enter fullscreen mode Exit fullscreen mode

The Key Sign

Both training AND test performance are bad.

The model hasn't even learned the training data properly. It's failing at everything.


What is Overfitting?

Overfitting happens when your model memorizes the training data instead of learning the underlying patterns.

It's like a student who memorizes that "the answer to question 3 is B" instead of understanding why it's B. Change the question slightly, and they're lost.

Real-Life Examples of Overfitting

  • Memorizing that "John bought milk on Tuesday" — and expecting ALL Johns to buy milk on Tuesdays
  • Noticing your team won when you wore lucky socks — and believing socks determine game outcomes
  • A spam filter that learns "emails from Bob are spam" — because the only spam in training was from someone named Bob

The model over-learns. It memorizes noise as if it were signal.

What Overfitting Looks Like

Your model's logic:

Training data: "Learn this!"
               ↓
Model: "I memorized EVERYTHING, including:
        - The typos
        - The outliers  
        - The measurement errors
        - The random coincidences"
               ↓
New data: "Predict this!"
               ↓
Model: "I've never seen this exact thing before. PANIC!"
               ↓
Output: Wrong predictions (on new data)
Enter fullscreen mode Exit fullscreen mode

The Key Sign

Training performance is GREAT, but test performance is BAD.

The model aced the practice test by memorizing answers. But the real exam has different questions.


The Perfect Analogy: Studying for an Exam

This is my favorite way to explain it.

Student Type Study Method Practice Test Real Exam ML Term
Lazy Student Skimmed notes once 55% 52% Underfitting
Memorizer Memorized everything word-for-word 99% 45% Overfitting
Smart Student Understood concepts 88% 85% Good fit

The Lazy Student (Underfitting):

  • "History is about old stuff"
  • Fails practice test
  • Fails real exam
  • Never actually learned anything

The Memorizer (Overfitting):

  • "Page 47, paragraph 3, says..."
  • Aces practice test (it's the same material!)
  • Fails real exam (questions are rephrased)
  • Learned the wrong things

The Smart Student (Good Fit):

  • "The French Revolution happened because of economic inequality and..."
  • Does well on practice test
  • Does well on real exam
  • Learned the right things

How to Detect Overfitting and Underfitting

This is actually simple. You just need to compare two numbers:

  1. Training performance — How well does the model do on data it learned from?
  2. Test performance — How well does the model do on data it's never seen?

The Detection Cheat Sheet

Training Score Test Score Gap Diagnosis
Low (60%) Low (58%) Small Underfitting
High (99%) Low (50%) Large Overfitting
Good (87%) Good (84%) Small Good fit

The gap between training and test is everything.

  • Small gap + both low = Underfitting
  • Large gap = Overfitting
  • Small gap + both high = Perfect!

Visual Detection: Learning Curves

One of the best ways to detect these problems is by plotting learning curves.

What You Plot

  • X-axis: Training set size (or training epochs)
  • Y-axis: Performance (accuracy, loss, etc.)
  • Two lines: Training score and Validation score

Underfitting Pattern

Performance
    |
80% |
    |
60% |  - - - - - - - - - - - - Training
    |  . . . . . . . . . . . . Validation  
40% |
    |  (Both stuck low, close together)
    |
    └─────────────────────────────────
              Training Progress →
Enter fullscreen mode Exit fullscreen mode

Both lines are low and flat. The model can't learn no matter how much data you give it.

Overfitting Pattern

Performance
    |
    |  __________________ Training (99%)
95% |
    |
75% |
    |        BIG GAP!
55% |  .................. Validation (55%)
    |
    └─────────────────────────────────
              Training Progress →
Enter fullscreen mode Exit fullscreen mode

Training keeps improving, but validation plateaus or gets worse. The gap keeps growing.

Good Fit Pattern

Performance
    |
    |  __________________ Training (88%)
85% |  .................. Validation (84%)
    |     (Small gap, both high)
65% |
    |
45% |
    |
    └─────────────────────────────────
              Training Progress →
Enter fullscreen mode Exit fullscreen mode

Both lines converge to high values with a small gap. Beautiful!


Let's See It In Code

Here's how to detect these problems in Python:

import numpy as np
from sklearn.model_selection import train_test_split, learning_curve
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
import matplotlib.pyplot as plt

# Generate sample data
np.random.seed(42)
X = np.linspace(0, 10, 200).reshape(-1, 1)
y = 3 * X.squeeze() + np.sin(X.squeeze() * 3) * 2 + np.random.randn(200)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Model 1: Underfitting (too simple)
model_underfit = LinearRegression()
model_underfit.fit(X_train, y_train)
print("=== UNDERFITTING MODEL ===")
print(f"Training Score: {model_underfit.score(X_train, y_train):.3f}")
print(f"Test Score:     {model_underfit.score(X_test, y_test):.3f}")
print(f"Gap:            {abs(model_underfit.score(X_train, y_train) - model_underfit.score(X_test, y_test)):.3f}")
print()

# Model 2: Overfitting (too complex)
model_overfit = make_pipeline(PolynomialFeatures(degree=20), LinearRegression())
model_overfit.fit(X_train, y_train)
print("=== OVERFITTING MODEL ===")
print(f"Training Score: {model_overfit.score(X_train, y_train):.3f}")
print(f"Test Score:     {model_overfit.score(X_test, y_test):.3f}")
print(f"Gap:            {abs(model_overfit.score(X_train, y_train) - model_overfit.score(X_test, y_test)):.3f}")
print()

# Model 3: Good fit
model_good = make_pipeline(PolynomialFeatures(degree=4), LinearRegression())
model_good.fit(X_train, y_train)
print("=== GOOD FIT MODEL ===")
print(f"Training Score: {model_good.score(X_train, y_train):.3f}")
print(f"Test Score:     {model_good.score(X_test, y_test):.3f}")
print(f"Gap:            {abs(model_good.score(X_train, y_train) - model_good.score(X_test, y_test)):.3f}")
Enter fullscreen mode Exit fullscreen mode

Output:

=== UNDERFITTING MODEL ===
Training Score: 0.891
Test Score:     0.879
Gap:            0.012   <- Small gap, but scores could be better

=== OVERFITTING MODEL ===
Training Score: 0.987
Test Score:     0.642
Gap:            0.345   <- HUGE gap! Overfitting!

=== GOOD FIT MODEL ===
Training Score: 0.953
Test Score:     0.941
Gap:            0.012   <- Small gap, high scores. Perfect!
Enter fullscreen mode Exit fullscreen mode

How to Fix Underfitting

Your model is too simple. It needs to be smarter.

Solution 1: Add More Features

Give your model more information to work with.

# Before: Only using square footage
features = ['square_feet']

# After: Rich feature set
features = ['square_feet', 'bedrooms', 'bathrooms', 
            'location', 'age', 'garage', 'school_rating']
Enter fullscreen mode Exit fullscreen mode

Solution 2: Use a More Complex Model

Upgrade your model's brain.

# Before: Simple linear model
from sklearn.linear_model import LinearRegression
model = LinearRegression()

# After: More powerful model
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=100)
Enter fullscreen mode Exit fullscreen mode

Solution 3: Reduce Regularization

If you're using regularization, you might be holding your model back too much.

# Before: Heavy regularization
from sklearn.linear_model import Ridge
model = Ridge(alpha=100)  # Too restrictive

# After: Lighter regularization
model = Ridge(alpha=0.1)  # More freedom to learn
Enter fullscreen mode Exit fullscreen mode

Solution 4: Train Longer

Sometimes your model just needs more time.

# Before: Not enough training
model.fit(X_train, y_train, epochs=10)

# After: More training
model.fit(X_train, y_train, epochs=100)
Enter fullscreen mode Exit fullscreen mode

Solution 5: Engineer Better Features

Create new features that capture hidden patterns.

# Before: Raw features
df['price_per_sqft'] = df['price'] / df['square_feet']
df['age_category'] = pd.cut(df['age'], bins=[0, 10, 30, 100])
df['location_score'] = df['school_rating'] * df['crime_safety']
Enter fullscreen mode Exit fullscreen mode

How to Fix Overfitting

Your model is memorizing. It needs to calm down.

Solution 1: Get More Training Data

The more examples you have, the harder it is to memorize them all.

# With 100 samples: Easy to memorize
# With 100,000 samples: Must learn patterns instead

# Techniques to get more data:
# - Collect more real data
# - Data augmentation (images: flip, rotate, crop)
# - Synthetic data generation
Enter fullscreen mode Exit fullscreen mode

Solution 2: Simplify Your Model

Use a less complex model.

# Before: Overly complex
model = make_pipeline(PolynomialFeatures(degree=15), LinearRegression())

# After: Simpler
model = make_pipeline(PolynomialFeatures(degree=3), LinearRegression())
Enter fullscreen mode Exit fullscreen mode

Solution 3: Add Regularization

Penalize complexity to prevent memorization.

# L2 Regularization (Ridge) - Shrinks coefficients
from sklearn.linear_model import Ridge
model = Ridge(alpha=1.0)

# L1 Regularization (Lasso) - Can zero out features
from sklearn.linear_model import Lasso
model = Lasso(alpha=0.1)

# ElasticNet - Combination of both
from sklearn.linear_model import ElasticNet
model = ElasticNet(alpha=0.1, l1_ratio=0.5)
Enter fullscreen mode Exit fullscreen mode

Solution 4: Use Dropout (Neural Networks)

Randomly ignore neurons during training.

from tensorflow.keras.layers import Dense, Dropout

model = Sequential([
    Dense(128, activation='relu'),
    Dropout(0.3),  # Randomly drop 30% of neurons
    Dense(64, activation='relu'),
    Dropout(0.3),
    Dense(1)
])
Enter fullscreen mode Exit fullscreen mode

Solution 5: Early Stopping

Stop training before overfitting kicks in.

from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(
    monitor='val_loss',
    patience=10,  # Stop if no improvement for 10 epochs
    restore_best_weights=True
)

model.fit(X_train, y_train, 
          validation_split=0.2,
          epochs=1000,  # Will stop early
          callbacks=[early_stop])
Enter fullscreen mode Exit fullscreen mode

Solution 6: Cross-Validation

Test on multiple different splits to get honest performance.

from sklearn.model_selection import cross_val_score

# Instead of one train/test split, use 5 different splits
scores = cross_val_score(model, X, y, cv=5)
print(f"Scores: {scores}")
print(f"Mean: {scores.mean():.3f} (+/- {scores.std() * 2:.3f})")
Enter fullscreen mode Exit fullscreen mode

Solution 7: Remove Noisy Features

Less noise = less to memorize.

# Remove features that don't help
# - Features with low correlation to target
# - Features with high correlation to each other
# - Features that are mostly noise

from sklearn.feature_selection import SelectKBest, f_regression

selector = SelectKBest(f_regression, k=10)
X_selected = selector.fit_transform(X, y)
Enter fullscreen mode Exit fullscreen mode

Quick Diagnosis Flowchart

Here's how to diagnose your model in 30 seconds:

Step 1: Check training score

  • If training score is LOW → Underfitting. Stop here.
  • If training score is HIGH → Go to Step 2.

Step 2: Check test score

  • If test score is also HIGH (similar to training) → Good fit!
  • If test score is LOW (much worse than training) → Overfitting.

Step 3: Calculate the gap

  • Gap > 10-15% → Likely overfitting
  • Gap < 5% but both scores low → Underfitting
  • Gap < 5% and both scores high → You're done!

The Fix-It Cheat Sheet

Save this for reference:

Underfitting Fixes

Problem Solution
Too few features Add more relevant features
Model too simple Use more complex model
Too much regularization Reduce regularization strength
Not enough training Train for more epochs
Poor features Engineer better features

Overfitting Fixes

Problem Solution
Too little data Collect more data / augment
Model too complex Use simpler model
No regularization Add L1/L2 regularization
Training too long Use early stopping
Too many features Feature selection / removal
Neural network Add dropout layers

Real-World Example: Image Classification

Let's see this in a realistic scenario.

The Task

Build a model to classify cats vs dogs.

Attempt 1: Underfitting

# Using a tiny model
model = Sequential([
    Flatten(input_shape=(150, 150, 3)),
    Dense(2, activation='softmax')  # Way too simple!
])

# Results:
# Training accuracy: 62%
# Validation accuracy: 60%
# Diagnosis: UNDERFITTING - model can't learn the patterns
Enter fullscreen mode Exit fullscreen mode

Fix: Add more layers and neurons.

Attempt 2: Overfitting

# Using a huge model with no regularization
model = Sequential([
    Flatten(input_shape=(150, 150, 3)),
    Dense(4096, activation='relu'),
    Dense(4096, activation='relu'),
    Dense(4096, activation='relu'),
    Dense(2, activation='softmax')
])

# Results:
# Training accuracy: 99.8%
# Validation accuracy: 58%
# Diagnosis: OVERFITTING - memorized training images
Enter fullscreen mode Exit fullscreen mode

Fix: Add dropout, regularization, get more data.

Attempt 3: Good Fit

# Balanced model with regularization
model = Sequential([
    Conv2D(32, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(2, activation='softmax')
])

# With data augmentation and early stopping

# Results:
# Training accuracy: 92%
# Validation accuracy: 89%
# Diagnosis: GOOD FIT!
Enter fullscreen mode Exit fullscreen mode

The Connection to Bias-Variance

If you've read about bias-variance tradeoff, here's how it connects:

Concept Bias Variance
Underfitting HIGH LOW
Overfitting LOW HIGH
Good fit LOW LOW
  • Underfitting = High bias (wrong assumptions, too simple)
  • Overfitting = High variance (too sensitive to training data)

They're two sides of the same coin!


Common Mistakes to Avoid

Mistake 1: Only Looking at Training Accuracy

# WRONG: "My model has 99% accuracy!"
print(f"Training accuracy: {model.score(X_train, y_train)}")

# RIGHT: Check both
print(f"Training accuracy: {model.score(X_train, y_train)}")
print(f"Test accuracy: {model.score(X_test, y_test)}")
Enter fullscreen mode Exit fullscreen mode

Mistake 2: Testing on Training Data

# WRONG: Using same data for train and test
model.fit(X, y)
model.score(X, y)  # This is meaningless!

# RIGHT: Proper train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model.fit(X_train, y_train)
model.score(X_test, y_test)  # Honest evaluation
Enter fullscreen mode Exit fullscreen mode

Mistake 3: Data Leakage

# WRONG: Scaling before splitting
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # Sees ALL data
X_train, X_test = train_test_split(X_scaled)

# RIGHT: Scale after splitting
X_train, X_test = train_test_split(X)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # Only sees training
X_test_scaled = scaler.transform(X_test)  # Uses training statistics
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

Let's summarize everything:

  1. Underfitting = Model is too simple, fails on everything
  2. Overfitting = Model memorized training data, fails on new data
  3. Detection = Compare training vs test performance
  4. The gap = Large gap means overfitting, small gap with low scores means underfitting
  5. Fix underfitting = More features, complex model, less regularization
  6. Fix overfitting = More data, simpler model, regularization, dropout

Your Action Items

After reading this article:

  1. Train a model on any dataset
  2. Check both training and test scores
  3. Calculate the gap between them
  4. Diagnose using the cheat sheet
  5. Apply the fix based on the diagnosis
  6. Repeat until you get a good fit

What's Next?

Now that you understand overfitting and underfitting, you're ready to learn:

  • Regularization (L1, L2) — The overfitting killer
  • Cross-validation — Better ways to evaluate
  • Ensemble methods — Combining models for stability
  • Hyperparameter tuning — Finding the sweet spot automatically

Follow me for the next article in this series!


Let's Connect!

If this helped you understand overfitting and underfitting, drop a heart!

Questions? Ask in the comments — I read and respond to every one.

Found a mistake? Let me know. I love improving these articles.


The difference between a junior and senior ML engineer? The senior knows how to diagnose and fix these problems quickly. Now you have that skill too.


Share this with a friend learning ML. These concepts are fundamental, and the right explanation makes all the difference.

Happy learning!

Top comments (0)