π Understanding Overfitting in Neural Networks and Techniques to Prevent It
Using Fashion-MNIST Experiments
Overfitting is a fundamental challenge when developing neural networks. A model that performs extremely well on the training dataset may fail to generalize to unseen data, leading to poor real-world performance. This post presents a structured investigation of overfitting using the Fashion-MNIST dataset and evaluates several mitigation strategies, including Dropout, L2 Regularisation, and Early Stopping.
All experiments, code, and plots in this post are taken directly from the accompanying notebook.
π Dataset Overview: Fashion-MNIST
The Fashion-MNIST dataset contains:
- 60,000 training images
- 10,000 test images
- 28Γ28 grayscale format
- 10 output classes
A significantly smaller subset of the training data is intentionally used to make overfitting behaviour more visible.
π§ Model Architecture Used Throughout
All experiments share the same CNN architecture, with optional L2 regularisation and Dropout:
def create_cnn_model(l2_lambda=0.0, dropout_rate=0.0):
model = keras.Sequential([
layers.Conv2D(32, (3,3), activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
layers.MaxPooling2D((2,2)),
layers.Conv2D(64, (3,3), activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
layers.MaxPooling2D((2,2)),
layers.Flatten(),
layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
layers.Dropout(dropout_rate),
layers.Dense(10, activation='softmax')
])
model.compile(
optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
return model
π Plotting Function
All performance diagrams were generated using the following utility:
def plot_history(history, title_prefix=""):
hist = history.history
plt.figure(figsize=(12,5))
plt.subplot(1,2,1)
plt.plot(hist["loss"], label="Train Loss")
plt.plot(hist["val_loss"], label="Val Loss")
plt.title(f"{title_prefix} Loss")
plt.legend()
plt.subplot(1,2,2)
plt.plot(hist["accuracy"], label="Train Accuracy")
plt.plot(hist["val_accuracy"], label="Val Accuracy")
plt.title(f"{title_prefix} Accuracy")
plt.legend()
plt.tight_layout()
plt.show()
π 1. Baseline Model (No Regularisation)
baseline_model = create_cnn_model(l2_lambda=0.0, dropout_rate=0.0)
history_baseline = baseline_model.fit(
x_train_small, y_train_small,
validation_split=0.2,
epochs=20
)
plot_history(history_baseline, title_prefix="Baseline (no regularisation)")
Baseline Performance Plot
Observations
- Training accuracy continues to increase steadily.
- Validation accuracy peaks early and then declines.
- Training loss decreases, while validation loss increases.
β‘ This is clear evidence of overfitting.
π 2. Dropout (0.5 Rate)
dropout_model = create_cnn_model(dropout_rate=0.5)
history_dropout = dropout_model.fit(
x_train_small, y_train_small,
validation_split=0.2,
epochs=20
)
plot_history(history_dropout, title_prefix="Dropout (0.5)")
Dropout Plot
Observations
- Training accuracy increases more slowly (expected due to Dropout).
- Validation accuracy tracks the training curve more closely.
- Divergence between training and validation loss is significantly reduced.
β‘ Dropout is highly effective in this experiment, producing noticeably improved generalisation.
π§± 3. L2 Regularisation (Ξ» = 0.001)
l2_model = create_cnn_model(l2_lambda=0.001)
history_l2 = l2_model.fit(
x_train_small, y_train_small,
validation_split=0.2,
epochs=20
)
plot_history(history_l2, title_prefix="L2 Regularisation")
L2 Plot
Observations
- Training loss is noticeably higher due to weight penalisation.
- Validation loss trends are more stable compared to the baseline.
- Validation accuracy improves moderately.
β‘ L2 regularisation produces smoother learning dynamics and alleviates overfitting, though its impact is milder than Dropout in this setup.
β³ 4. Early Stopping
earlystop_model = create_cnn_model()
early_stop = keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=3,
restore_best_weights=True
)
history_early = earlystop_model.fit(
x_train_small, y_train_small,
validation_split=0.2,
epochs=20,
callbacks=[early_stop]
)
plot_history(history_early, title_prefix="Early Stopping")
Early Stopping Plot
Observations
- Training terminates after validation loss stops improving.
- Avoids the late-epoch overfitting seen in the baseline.
- Produces one of the cleanest validation curves among all models.
β‘ Early stopping is a simple and effective generalisation technique.
π¦ (Optional) TensorFlow Lite Conversion
converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
tflite_model = converter.convert()
print("Quantised model size (bytes):", len(tflite_model))
This step demonstrates model size reduction for deployment purposes, although it is not a regularisation strategy.
π§Ύ Conclusion
The experimental results highlight the following:
- The baseline model exhibits clear overfitting.
- Dropout provides the largest improvement in validation behaviour.
- L2 regularisation helps stabilise training dynamics.
- Early Stopping prevents late-epoch divergence and improves generalisation.
Combining Dropout + Early Stopping produces the most robust performance on the reduced Fashion-MNIST dataset.




Top comments (0)