DEV Community

Cover image for Understanding Overfitting in Neural Networks (TensorFlow- CNN)
Stephen Ogundero
Stephen Ogundero

Posted on

Understanding Overfitting in Neural Networks (TensorFlow- CNN)

πŸ“˜ Understanding Overfitting in Neural Networks and Techniques to Prevent It

Using Fashion-MNIST Experiments

Overfitting is a fundamental challenge when developing neural networks. A model that performs extremely well on the training dataset may fail to generalize to unseen data, leading to poor real-world performance. This post presents a structured investigation of overfitting using the Fashion-MNIST dataset and evaluates several mitigation strategies, including Dropout, L2 Regularisation, and Early Stopping.

All experiments, code, and plots in this post are taken directly from the accompanying notebook.


πŸ“‚ Dataset Overview: Fashion-MNIST

The Fashion-MNIST dataset contains:

  • 60,000 training images
  • 10,000 test images
  • 28Γ—28 grayscale format
  • 10 output classes

A significantly smaller subset of the training data is intentionally used to make overfitting behaviour more visible.


🧠 Model Architecture Used Throughout

All experiments share the same CNN architecture, with optional L2 regularisation and Dropout:

def create_cnn_model(l2_lambda=0.0, dropout_rate=0.0):
    model = keras.Sequential([
        layers.Conv2D(32, (3,3), activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.MaxPooling2D((2,2)),
        layers.Conv2D(64, (3,3), activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.MaxPooling2D((2,2)),
        layers.Flatten(),
        layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
        layers.Dropout(dropout_rate),
        layers.Dense(10, activation='softmax')
    ])

    model.compile(
        optimizer="adam",
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"]
    )
    return model
Enter fullscreen mode Exit fullscreen mode

πŸ“Š Plotting Function

All performance diagrams were generated using the following utility:

def plot_history(history, title_prefix=""):
    hist = history.history
    plt.figure(figsize=(12,5))

    plt.subplot(1,2,1)
    plt.plot(hist["loss"], label="Train Loss")
    plt.plot(hist["val_loss"], label="Val Loss")
    plt.title(f"{title_prefix} Loss")
    plt.legend()

    plt.subplot(1,2,2)
    plt.plot(hist["accuracy"], label="Train Accuracy")
    plt.plot(hist["val_accuracy"], label="Val Accuracy")
    plt.title(f"{title_prefix} Accuracy")
    plt.legend()

    plt.tight_layout()
    plt.show()
Enter fullscreen mode Exit fullscreen mode

πŸ” 1. Baseline Model (No Regularisation)

baseline_model = create_cnn_model(l2_lambda=0.0, dropout_rate=0.0)
history_baseline = baseline_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_baseline, title_prefix="Baseline (no regularisation)")
Enter fullscreen mode Exit fullscreen mode

Baseline Performance Plot

Observations

  • Training accuracy continues to increase steadily.
  • Validation accuracy peaks early and then declines.
  • Training loss decreases, while validation loss increases.

➑ This is clear evidence of overfitting.


πŸ›  2. Dropout (0.5 Rate)

dropout_model = create_cnn_model(dropout_rate=0.5)
history_dropout = dropout_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_dropout, title_prefix="Dropout (0.5)")
Enter fullscreen mode Exit fullscreen mode

Dropout Plot

Observations

  • Training accuracy increases more slowly (expected due to Dropout).
  • Validation accuracy tracks the training curve more closely.
  • Divergence between training and validation loss is significantly reduced.

➑ Dropout is highly effective in this experiment, producing noticeably improved generalisation.


🧱 3. L2 Regularisation (λ = 0.001)

l2_model = create_cnn_model(l2_lambda=0.001)
history_l2 = l2_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20
)
plot_history(history_l2, title_prefix="L2 Regularisation")
Enter fullscreen mode Exit fullscreen mode

L2 Plot

Observations

  • Training loss is noticeably higher due to weight penalisation.
  • Validation loss trends are more stable compared to the baseline.
  • Validation accuracy improves moderately.

➑ L2 regularisation produces smoother learning dynamics and alleviates overfitting, though its impact is milder than Dropout in this setup.


⏳ 4. Early Stopping

earlystop_model = create_cnn_model()
early_stop = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=3,
    restore_best_weights=True
)

history_early = earlystop_model.fit(
    x_train_small, y_train_small,
    validation_split=0.2,
    epochs=20,
    callbacks=[early_stop]
)
plot_history(history_early, title_prefix="Early Stopping")
Enter fullscreen mode Exit fullscreen mode

Early Stopping Plot

Observations

  • Training terminates after validation loss stops improving.
  • Avoids the late-epoch overfitting seen in the baseline.
  • Produces one of the cleanest validation curves among all models.

➑ Early stopping is a simple and effective generalisation technique.


πŸ“¦ (Optional) TensorFlow Lite Conversion

converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
tflite_model = converter.convert()
print("Quantised model size (bytes):", len(tflite_model))
Enter fullscreen mode Exit fullscreen mode

This step demonstrates model size reduction for deployment purposes, although it is not a regularisation strategy.


🧾 Conclusion

The experimental results highlight the following:

  • The baseline model exhibits clear overfitting.
  • Dropout provides the largest improvement in validation behaviour.
  • L2 regularisation helps stabilise training dynamics.
  • Early Stopping prevents late-epoch divergence and improves generalisation.

Combining Dropout + Early Stopping produces the most robust performance on the reduced Fashion-MNIST dataset.


Top comments (0)