🧠 Understanding CNN Generalisation with Data Augmentation (TensorFlow – CIFAR-10)

Maxwell Ororho — Wed, 25 Mar 2026 19:55:20 +0000

📘 Data Augmentation in CNNs and the impact on Generalisation (Using CIFAR-10 Experiments)

Data augmentation is widely used when training convolutional neural networks, especially for image classification tasks.

The idea is simple: by transforming training images — rotating, flipping, or shifting them, we can introduce more variation and help the model generalise better.

However, one question that is often overlooked is:

👉 Does more augmentation always improve performance?

In this post, I investigate how different levels of data augmentation affect a CNN trained on the CIFAR-10 dataset.

All experiments, code, and plots shown here are taken directly from my notebook.

📂 Dataset Overview: CIFAR-10

The CIFAR-10 dataset contains:

60,000 colour images
10 output classes
32×32 resolution
A balanced distribution across classes

One key detail is the image resolution.

At 32×32 pixels, fine details are limited, and some classes (like cats and dogs) can look very similar. This becomes important when analysing model performance later.

⚙️ Data Preparation

Before training, the dataset was preprocessed to ensure stable learning.

# Load the CIFAR-10 dataset
(x_train_full, y_train_full), (x_test, y_test) = cifar10.load_data()

# Define the class names
class_names = [
    "airplane", "automobile", "bird", "cat", "deer",
    "dog", "frog", "horse", "ship", "truck"
]

# Scale pixel values to the range [0, 1]
x_train_full = x_train_full.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Convert class labels to one-hot encoded format
y_train_full_cat = to_categorical(y_train_full, 10)
y_test_cat = to_categorical(y_test, 10)

# Split the training data into training and validation sets
x_train, x_val, y_train_cat, y_val_cat, y_train, y_val = train_test_split(
    x_train_full,
    y_train_full_cat,
    y_train_full,
    test_size=0.2,
    random_state=42,
    stratify=y_train_full
)

# Print dataset shapes
print("Training set shape:", x_train.shape)
print("Validation set shape:", x_val.shape)
print("Test set shape:", x_test.shape)

Pixel values are scaled to [0, 1]
Labels are converted into one-hot encoding
Data is later split into training and validation sets

These steps ensure that the model trains efficiently and can be evaluated properly.

🧠 Model Architecture Used

All experiments use the same CNN architecture to ensure a fair comparison.

def build_cnn_model():
    model = Sequential([
        Conv2D(
            32, (3, 3), activation="relu",
            padding="same", input_shape=(32, 32, 3)
        ),
        BatchNormalization(),
        MaxPooling2D((2, 2)),

        Conv2D(64, (3, 3), activation="relu", padding="same"),
        BatchNormalization(),
        MaxPooling2D((2, 2)),

        Conv2D(128, (3, 3), activation="relu", padding="same"),
        BatchNormalization(),
        MaxPooling2D((2, 2)),

        Flatten(),
        Dense(128, activation="relu"),
        Dropout(0.5),
        Dense(10, activation="softmax")
    ])

    model.compile(
        optimizer="adam",
        loss="categorical_crossentropy",
        metrics=["accuracy"]
    )

    return model

epochs = 15
batch_size = 64

🔍 Experiment Setup

Three models were trained:

Baseline → No augmentation
Light augmentation → small transformations
Strong augmentation → larger transformations

This setup allows us to isolate the effect of augmentation.

🛠 Augmentation Setup

# Create a light augmentation generator
light_datagen = ImageDataGenerator(
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True
)

# Create a stronger augmentation generator
strong_datagen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.15,
    height_shift_range=0.15,
    zoom_range=0.2,
    horizontal_flip=True
)

These parameters control the transformation strength.

Smaller values → subtle variation
Larger values → stronger distortion

📊 Results

Test accuracy comparison across models

Observations

Baseline → 0.752
Light augmentation → 0.750
Strong augmentation → 0.692

Interpretation

Light augmentation had almost no effect
Strong augmentation reduced performance

➡ More augmentation does not always mean better performance.

🔍 Model Behaviour (Confusion Matrix)

Confusion matrix showing class-level performance

Observations

Strong performance:
- airplane, ship, truck
Weak performance:
- cat vs dog
- automobile vs truck

Insight

Errors are often due to:

low resolution
visual similarity between classes

➡ Some limitations come from the dataset itself.

🧾 Conclusion

The experiments highlight the following:

The baseline model already performs well
Light augmentation has minimal impact
Strong augmentation reduces performance
Augmentation must be applied carefully

DEV Community: Maxwell Ororho

🧠 Understanding CNN Generalisation with Data Augmentation (TensorFlow – CIFAR-10)

📘 Data Augmentation in CNNs and the impact on Generalisation (Using CIFAR-10 Experiments)

📂 Dataset Overview: CIFAR-10

⚙️ Data Preparation

🧠 Model Architecture Used

🔍 Experiment Setup

🛠 Augmentation Setup

📊 Results

Observations

Interpretation

🔍 Model Behaviour (Confusion Matrix)

Observations

Insight

🧾 Conclusion