pythonassignmenthelp.com

Posted on Apr 24

Why My First Convolutional Neural Network Kept Overfitting (And How I Fixed It)

#programming #ai #machinelearning #computerscience

You’re staring at your assignment deadline, watching your Convolutional Neural Network (CNN) run through epochs. The accuracy on your training set climbs to almost 100%—you’re feeling good. But then you check the validation set accuracy and your heart sinks: it’s barely above random guessing. If you’ve been here, trust me, you’re not alone. When I built my first CNN for image classification, I thought high training accuracy was all I needed. Turns out, that’s usually a sign you’re overfitting—memorizing the training data instead of learning to generalize.

This is the story of how I spotted overfitting in my first image classifier, what I did to fix it, and how you can save yourself a lot of frustration by catching these issues early.

What Overfitting Looks Like (And Why It Happens)

Before we jump into code, let’s break down what’s going wrong.

Overfitting is when your model does great on the data it’s seen (training data), but poorly on new, unseen data (validation or test data). Imagine studying only past exam questions, then facing totally new ones on the real test—you’d get stuck, because you never learned the underlying concepts.

CNNs are powerful, but if you’re not careful, they’ll just memorize the training images. This happens especially when:

Your dataset is small
Your model is too complex (too many parameters)
You train for too long (too many epochs)
There’s not enough variety in your training data

Here’s what this looked like for me:

Training accuracy: 99%
Validation accuracy: 55%
Validation loss: Actually going up as training loss went down

Sound familiar? If so, here’s what you can do.

Step 1: Visualize the Problem

The first thing I learned is that staring at loss and accuracy numbers isn’t enough. It helps a lot to plot them.

Code Example: Plotting Training vs Validation Performance

import matplotlib.pyplot as plt

# Suppose you have these lists from your training loop
history = model.fit(
    train_images, train_labels,
    epochs=20,
    validation_data=(val_images, val_labels)
)

# Plot loss
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.legend()
plt.title('Loss over epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show()

# Plot accuracy
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.legend()
plt.title('Accuracy over epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.show()

What to look for:

If your training loss is dropping steadily but your validation loss starts increasing after a few epochs, you’re overfitting. Similarly, if training accuracy skyrockets while validation accuracy plateaus or drops, that’s another red flag.

This visualization step is crucial. I missed it at first—don’t make my mistake.

Step 2: Simplify Your Model

My first instinct was to build a big, deep network because “more layers = better,” right? Honestly, that’s usually not true, especially with small datasets.

Here’s an example of a simple CNN architecture that’s much less prone to overfitting:

Code Example: A Smaller, Simpler CNN

from tensorflow.keras import layers, models

model = models.Sequential([
    layers.Conv2D(16, (3,3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(32, (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Flatten(),
    layers.Dense(32, activation='relu'),  # Fewer dense units
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Notice how there are only two convolutional layers and a small dense layer. When I switched to something like this, my model started generalizing better, especially on small datasets.

Why does this help?

With fewer parameters, the network is forced to learn general patterns, not memorize every pixel.

Step 3: Use Data Augmentation

If your dataset is small, your model doesn’t see enough variety. Data augmentation creates new, slightly altered copies of your existing images—rotating, flipping, zooming, etc.

Code Example: Adding Data Augmentation

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=15,      # Randomly rotate images by up to 15 degrees
    width_shift_range=0.1,  # Shift images horizontally by 10%
    height_shift_range=0.1, # Shift images vertically by 10%
    horizontal_flip=True    # Randomly flip images horizontally
)

# Fit the generator to your data if needed (for some datasets)
datagen.fit(train_images)

# Use the generator in model training
model.fit(
    datagen.flow(train_images, train_labels, batch_size=32),
    epochs=20,
    validation_data=(val_images, val_labels)
)

After adding augmentation, my validation accuracy stopped flatlining. Now my model was seeing more diverse examples, and it wasn’t so easy for it to memorize the dataset.

Why does this help?

Augmentation makes it harder for the model to just memorize, and it exposes the network to more realistic “noise” and variety.

Step 4: Add Regularization

There are a couple ways to “penalize” your model for getting too confident or complex:

Dropout: Randomly sets some activations to zero during training, so the network can’t rely on any single feature.
L2 regularization: Adds a penalty to large weights in the network.

Here’s how you add dropout:

from tensorflow.keras import layers, models

model = models.Sequential([
    layers.Conv2D(16, (3,3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(32, (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Flatten(),
    layers.Dense(32, activation='relu'),
    layers.Dropout(0.5),  # Drops 50% of nodes randomly during training
    layers.Dense(10, activation='softmax')
])

When I first started, I skipped dropout because I didn’t understand what it was for. After adding it, my model finally stopped overfitting so badly.

Step 5: Use Early Stopping

This is a safety net: if your validation loss starts increasing, you can automatically stop training before things get worse.

from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

model.fit(
    datagen.flow(train_images, train_labels, batch_size=32),
    epochs=50,
    validation_data=(val_images, val_labels),
    callbacks=[early_stop]
)

What’s going on here?

monitor='val_loss': Watch the validation loss
patience=3: Wait 3 epochs to see if it improves before stopping
restore_best_weights=True: Go back to the best weights

This helped me avoid wasting time and over-training.

Step 6: Check Your Data

This sounds basic, but I’ve tutored students who spent hours tweaking their model when their data was actually the problem—labels were swapped, or images were corrupted. Always check a few samples and labels visually.

If you’re stuck on a similar Deep Learning project, this resource has helped students work through these concepts with more example projects and code walkthroughs. Sometimes seeing another full project helps everything click.

Common Mistakes Students Make

Even after learning about overfitting, I still fell into some traps. Here are a few I see most often:

Ignoring validation loss:

Don’t just look at training accuracy. If you never check validation performance, you’ll think your model is great—until it fails in the real world.
Training for too many epochs:

More training isn’t always better. If your validation loss starts increasing, stop training or use early stopping.
Skipping data augmentation:

Especially with small datasets, not augmenting your data almost guarantees overfitting. Even simple flips and rotations can help a lot.

Key Takeaways

Overfitting is when your model memorizes training data but can’t handle new data—watch for a gap between training and validation accuracy/loss.
Use simple model architectures first, especially with small datasets.
Data augmentation and dropout are two of the easiest, most effective ways to prevent overfitting.
Always plot and compare both training and validation performance during training.
Early stopping can prevent wasted time and worse overfitting.

Building good models takes practice, but every frustrating bug is a chance to learn. Don’t be discouraged—everyone hits the overfitting wall at some point. Keep experimenting, and you’ll break through.

Want more Deep Learning tutorials and project walkthroughs? Check out https://pythonassignmenthelp.com/programming-help/deep-learning.

DEV Community