DEV Community

Cover image for Building a Food Classifier: What I Learned from Overfitting
Mahi Dhiman
Mahi Dhiman

Posted on

Building a Food Classifier: What I Learned from Overfitting

I Built a Food Classifier That Can't Tell Ramen from... Ramen? (Part 1)

Or: What I learned from my first overfitting disaster (and why transfer learning saved me)


You know that feeling when you follow a recipe, it looks perfect in the pan, and then you taste it and realize you've created something that only you (or your mother) could love? That's basically what happened with my first food classification model.

Let me tell you a story about ambition, augmentation, and why teaching a neural network the difference between my favorite foods turned into a masterclass in overfitting.

What You'll Need

Before we dive in, here's what you should have:

  • Python 3.8+
  • PyTorch installed
  • Basic understanding of neural networks (but I'll explain as we go!)
  • ~2GB of disk space for the dataset
  • A GPU (or Google Colab) - training on CPU will take forever
  • About 30 minutes to follow along

The Idea

Build a model that could classify the foods I actually care about. Not the entire Food-101 dataset with 101 classes of foods I've never even heard of, but my foods:

  • 🍜 Ramen (obviously)
  • 🍦 Ice cream (essential)
  • 🧀 Nachos (comfort food supreme)
  • 🥞 Pancakes (breakfast champion)

Four classes. How hard could it be?

Narrator: It was harder than she thought.


Part 1: Building the Dataset

Downloading Food-101

First, I imported the essentials:

import torch
from torchvision import datasets, transforms
from pathlib import Path
import zipfile
import requests

# Setup data directory
data_dir = Path("Data/")
image_dir = data_dir / "food101"
Enter fullscreen mode Exit fullscreen mode

Breaking this down:

  • data_dir points to a folder called "Data/" where all my datasets will live
  • image_dir is specifically for the Food-101 dataset
  • I used the / operator to join paths - much cleaner than string concatenation!
# Check if directory exists (being polite to my internet connection)
if image_dir.is_dir():
    print(f"{image_dir} directory already exists.. skipping download")
else:
    print(f"Did not find {image_dir} directory, creating one...")
    image_dir.mkdir(parents=True, exist_ok=True)
Enter fullscreen mode Exit fullscreen mode

Before downloading gigabytes of food images, I check if the directory already exists:

  • parents=True - creates any missing parent directories
  • exist_ok=True - doesn't throw an error if the directory already exists
# Download Food-101 dataset
train_data = datasets.Food101(root=data_dir,
                              split="train",
                              download=True)
test_data = datasets.Food101(root=data_dir,
                             split="test",
                             download=True)
Enter fullscreen mode Exit fullscreen mode

PyTorch's datasets.Food101 handles everything:

  • root=data_dir - where to save everything
  • split="train" or split="test" - Food-101 comes pre-split
  • download=True - downloads if not already present

Reality check: This downloads ~5GB of data. First run? Go grab a coffee. Maybe two.

Picking My Classes

class_names = train_data.classes
print(f"Total classes available: {len(class_names)}")

# Out of all 101, I chose my favorites
target_classes = ["ice_cream", "pancakes", "ramen", "nachos"]
Enter fullscreen mode Exit fullscreen mode

Each class in Food-101 has about 750 training images and 250 test images. But I didn't want all of them (my laptop would cry), so I grabbed a subset.

The Manual Extraction Process

import random

data_path = data_dir / "food-101" / "images"
target_classes = ["ice_cream", "pancakes", "ramen", "nachos"]

# Taking 20% of available data per class
amount_to_get = 0.2
Enter fullscreen mode Exit fullscreen mode

The amount_to_get = 0.2 means I'm taking 20% of available images - enough to train on, but not so much that my laptop starts smoking.

The Subset Selection Function

def get_subset(image_path=data_path,
               data_splits=["train", "test"],
               target_classes=["ice_cream", "pancakes", "ramen", "nachos"],
               amount=0.1,
               seed=42):

    random.seed(seed)  # For reproducibility
    label_splits = {}

    for data_split in data_splits:
        print(f"[INFO] Creating image split for: {data_split}...")

        # Food-101 provides text files listing train/test images
        label_path = data_dir / "food-101" / "meta" / f"{data_split}.txt"

        # Read and filter for our target classes
        with open(label_path, "r") as f:
            labels = [line.strip() for line in f.readlines() 
                     if line.split("/")[0] in target_classes]

        # Calculate sample size (20% of available)
        number_to_sample = round(amount * len(labels))
        print(f"[INFO] Getting random subset of {number_to_sample} images...")

        # Randomly sample
        sampled_images = random.sample(labels, k=number_to_sample)

        # Convert to full file paths
        image_paths = [image_path / f"{sample_image}.jpg" 
                      for sample_image in sampled_images]

        label_splits[data_split] = image_paths

    return label_splits
Enter fullscreen mode Exit fullscreen mode

Breaking this down:

  • random.seed(42) - This ensures I get the same "random" results every time. Reproducibility is crucial in ML!
  • The label files - Food-101 provides .txt files listing which images belong to train/test. Each line looks like "ramen/123456.jpg"
  • Filtering - line.split("/")[0] grabs the class name (the part before the /), keeping only my target classes
  • Sampling - random.sample() picks exactly number_to_sample random items with no duplicates
  • Path building - Converts labels like "ramen/123456" into full paths
# Run the function
label_splits = get_subset(amount=amount_to_get)
print(f"Training images: {len(label_splits['train'])}")
print(f"Test images: {len(label_splits['test'])}")
Enter fullscreen mode Exit fullscreen mode

This gave me:

  • Training: ~600 images (150 per class)
  • Test: ~200 images (50 per class)

Creating the Custom Dataset Directory

# Create target directory with descriptive name
target_dir_name = f"Data/ic_pancake_ramen_nachos{str(int(amount_to_get*100))}_percent"
print(f"Creating directory: '{target_dir_name}'")

target_dir = Path(target_dir_name)
target_dir.mkdir(parents=True, exist_ok=True)
Enter fullscreen mode Exit fullscreen mode

I'm creating a directory name that tells me exactly what's in it: ic_pancake_ramen_nachos20_percent

Pro tip: Descriptive directory names are your future self's best friend. When you have 5 different dataset versions, you'll thank yourself!

Copying the Files

import shutil

for image_split in label_splits.keys():  # "train" and "test"
    for image_path in label_splits[str(image_split)]:
        # Build destination path
        dest_dir = target_dir / image_split / image_path.parent.stem / image_path.name

        # Create directory if needed
        if not dest_dir.parent.is_dir():
            dest_dir.parent.mkdir(parents=True, exist_ok=True)

        print(f"[INFO] Copying {image_path} to {dest_dir}...")
        shutil.copy2(image_path, dest_dir)
Enter fullscreen mode Exit fullscreen mode

The dest_dir construction creates paths like:
ic_pancake_ramen_nachos20_percent/train/ramen/123456.jpg

Reality check: This took about 5-10 minutes to copy all 800 images. Watching the progress messages scroll by was oddly therapeutic.

Packaging Everything

# Create a zip file for easy sharing
zip_file_name = data_dir / f"ic_pancake_ramen_nachos{str(int(amount_to_get*100))}_percent"
shutil.make_archive(zip_file_name,
                    format="zip",
                    root_dir=target_dir)

print(f"Created {zip_file_name}.zip!")
Enter fullscreen mode Exit fullscreen mode

Now I had a portable dataset I could upload to GitHub and share!

Final folder structure:

ic_pancake_ramen_nachos20_percent/
  ├── train/
  │   ├── ramen/ (150 images)
  │   ├── ice_cream/ (150 images)
  │   ├── nachos/ (150 images)
  │   └── pancakes/ (150 images)
  └── test/
      ├── ramen/ (50 images)
      ├── ice_cream/ (50 images)
      ├── nachos/ (50 images)
      └── pancakes/ (50 images)
Enter fullscreen mode Exit fullscreen mode

Resources:


Part 2: Getting Ready to Train

Device Setup: GPU or CPU?

import torch

# Device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
Enter fullscreen mode Exit fullscreen mode

This checks if you have a GPU available (CUDA). Training on GPU is 10-50x faster than CPU. Think of it like checking if you have a sports car before a road trip - if yes, great! If not, the regular car still works.

My setup: I used Google Colab's free T4 GPU. Training time per epoch: ~30 seconds on GPU vs ~10 minutes on CPU.

Downloading from GitHub

Since I uploaded my dataset to GitHub, I needed to download it:

import requests
import zipfile

data_path = Path("Data/")
image_path = data_path / "ic_pancake_ramen_nachos"

# Check if folder exists
if image_path.is_dir():
    print(f"{image_path} already exists")
else:
    print(f"Did not find {image_path}, creating it...")
    image_path.mkdir(parents=True, exist_ok=True)

# Download from GitHub
url = "https://raw.githubusercontent.com/mahidhiman12/Deep_learning_with_PyTorch/main/ic_pancake_ramen_nachos20_percent.zip"

with open(data_path / "ic_pancake_ramen_nachos20_percent.zip", "wb") as f:
    request = requests.get(url)
    f.write(request.content)

print("Download complete!")

# Unzip
with zipfile.ZipFile(data_path / "ic_pancake_ramen_nachos20_percent.zip", "r") as zip_ref:
    print(f"Unzipping to {image_path}")
    zip_ref.extractall(image_path)
Enter fullscreen mode Exit fullscreen mode

The "wb" mode means "write binary" - crucial for zip files!

Exploring the Dataset

import os

def walkthrough_dir(dir_path):
    """Walk through directory and print info"""
    for dirpath, dirnames, filenames in os.walk(dir_path):
        print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'")

walkthrough_dir(image_path)
Enter fullscreen mode Exit fullscreen mode

Output:

There are 2 directories and 0 images in 'Data/ic_pancake_ramen_nachos'
There are 4 directories and 0 images in 'Data/ic_pancake_ramen_nachos/train'
There are 0 directories and 150 images in 'Data/ic_pancake_ramen_nachos/train/ice_cream'
There are 0 directories and 150 images in 'Data/ic_pancake_ramen_nachos/train/nachos'
...
Enter fullscreen mode Exit fullscreen mode

Perfect! Everything's organized correctly.

# Set up train and test directories
train_dir = image_path / "train"
test_dir = image_path / "test"
Enter fullscreen mode Exit fullscreen mode

Visualizing the Data

Always look at your data before training:

import random
from PIL import Image
import matplotlib.pyplot as plt

# Get all image paths
image_path_list = list(image_path.glob("*/*/*.jpg"))
print(f"Total images: {len(image_path_list)}")

# Display random images
random_image_path = random.choice(image_path_list)
img = Image.open(random_image_path)

plt.figure(figsize=(8, 6))
plt.imshow(img)
plt.title(f"Class: {random_image_path.parent.stem}")
plt.axis('off')
plt.show()

print(f"Image dimensions: {img.height}x{img.width}")
Enter fullscreen mode Exit fullscreen mode

Key observation: Images have varying sizes (512x384, 384x512, 640x480, etc.). This is why we need to resize everything - neural networks require consistent input dimensions.

Preprocessing & Augmentation

from torchvision import transforms

# Version 1: Basic transforms (what I started with)
basic_transform = transforms.Compose([
    transforms.Resize(size=(64, 64)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor()
])
Enter fullscreen mode Exit fullscreen mode

Breaking it down:

  • Compose([]) - Chains transformations together, applied in order
  • Resize((64, 64)) - Squishes/stretches all images to 64x64 pixels (I started small for faster training - this was a mistake!)
  • RandomHorizontalFlip(p=0.5) - Randomly flips images 50% of the time. A ramen bowl looks the same flipped, right? This is data augmentation.
  • ToTensor() - Converts PIL images to PyTorch tensors AND normalizes pixel values from 0-255 to 0.0-1.0

Later, when fighting overfitting, I upgraded to:

# Version 2: Enhanced transforms (added when model was overfitting)
enhanced_transform = transforms.Compose([
    transforms.Resize(size=(64, 64)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(15),  # Random rotation ±15 degrees
    transforms.ColorJitter(brightness=0.2, contrast=0.2),  # Vary brightness/contrast
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],  # ImageNet stats
                         std=[0.229, 0.224, 0.225])
])
Enter fullscreen mode Exit fullscreen mode

The Normalize() uses ImageNet's mean and standard deviation - like speaking the same language the neural network understands.

Visualizing Transformations

I created a function to see the effect of transforms:

def plot_transformed_images(image_paths, transform, n=3, seed=42):
    """Plot original vs transformed images side by side"""
    random.seed(seed)
    random_image_paths = random.sample(image_paths, k=n)

    for image_path in random_image_paths:
        with Image.open(image_path) as f:
            fig, ax = plt.subplots(1, 2, figsize=(10, 5))

            # Original
            ax[0].imshow(f)
            ax[0].set_title(f"Original\nSize: {f.size}")
            ax[0].axis("off")

            # Transformed
            transformed = transform(f).permute(1, 2, 0)  # CHW -> HWC for matplotlib
            ax[1].imshow(transformed)
            ax[1].set_title(f"Transformed\nSize: {tuple(transformed.shape)}")
            ax[1].axis("off")

            fig.suptitle(f"Class: {image_path.parent.stem}", fontsize=16)
            plt.tight_layout()

plot_transformed_images(image_path_list, basic_transform, n=3)
Enter fullscreen mode Exit fullscreen mode

The .permute(1, 2, 0) is crucial! PyTorch tensors are (Channels, Height, Width), but matplotlib expects (Height, Width, Channels).

Creating Datasets and DataLoaders

from torchvision import datasets

# Using basic transforms to start
train_dataset = datasets.ImageFolder(root=train_dir,
                                     transform=basic_transform)

test_dataset = datasets.ImageFolder(root=test_dir,
                                    transform=basic_transform)

class_names = train_dataset.classes
class_to_idx = train_dataset.class_to_idx

print(f"Train dataset: {len(train_dataset)} images")
print(f"Test dataset: {len(test_dataset)} images")
print(f"Classes: {class_names}")
print(f"Class to index mapping: {class_to_idx}")
Enter fullscreen mode Exit fullscreen mode

ImageFolder is magical! It automatically:

  • Creates labels based on folder names
  • Maps classes to indices
  • Applies transforms to each image

As long as your data follows the train/class_name/image.jpg structure, it just works.

from torch.utils.data import DataLoader

BATCH_SIZE = 32

train_dataloader = DataLoader(dataset=train_dataset,
                              batch_size=BATCH_SIZE,
                              shuffle=True,
                              num_workers=2)

test_dataloader = DataLoader(dataset=test_dataset,
                             batch_size=BATCH_SIZE,
                             shuffle=False,
                             num_workers=2)

print(f"Length of train dataloader: {len(train_dataloader)} batches")
print(f"Length of test dataloader: {len(test_dataloader)} batches")
Enter fullscreen mode Exit fullscreen mode

DataLoader explained:

  • batch_size=32 - Process 32 images at once (faster than one-by-one)
  • shuffle=True - Randomize training data each epoch (helps prevent overfitting)
  • num_workers=2 - Parallel data loading (speeds things up)

Let's peek at a batch:

img, label = next(iter(train_dataloader))
print(f"Image batch shape: {img.shape}")  # torch.Size([32, 3, 64, 64])
print(f"Label batch shape: {label.shape}")  # torch.Size([32])
Enter fullscreen mode Exit fullscreen mode

Perfect! 32 images, 3 color channels (RGB), 64x64 pixels each.


Part 3: Building the Model

The Architecture: TinyVGG

I based this on the CNN Explainer website's model - replicating existing architectures is common practice in ML!

import torch
from torch import nn

class TinyVGG(nn.Module):
    def __init__(self, input_shape, hidden_units, output_shape):
        super().__init__()

        # Convolutional block 1
        self.conv_block_1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape,
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units,
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )

        # Convolutional block 2
        self.conv_block_2 = nn.Sequential(
            nn.Conv2d(in_channels=hidden_units,
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units,
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )

        # Classifier
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=hidden_units * 16 * 16,
                      out_features=output_shape)
        )

    def forward(self, x):
        x = self.conv_block_1(x)
        x = self.conv_block_2(x)
        x = self.classifier(x)
        return x
Enter fullscreen mode Exit fullscreen mode

Architecture breakdown:

Each convolutional block:

  1. Conv2d - Detects patterns (edges, textures) using 3x3 filters
  2. ReLU() - Activation function (adds non-linearity)
  3. Another Conv2d - Learns more complex patterns
  4. MaxPool2d(2) - Shrinks spatial dimensions by 2x (64→32→16)

The classifier:

  1. Flatten() - Converts 2D feature maps into 1D vector
  2. Linear() - Final layer outputs 4 values (one per class)

The hidden_units * 16 * 16 mystery:

Where did 16x16 come from? Here's the debug trick:

# Create dummy data matching your input shape
dummy_x = torch.randn(size=[1, 3, 64, 64]).to(device)

# Try passing through model (will error if dimensions wrong)
# The error message tells you the actual dimensions needed!
model_0(dummy_x)
Enter fullscreen mode Exit fullscreen mode

The math: Start with 64x64 → MaxPool2d twice → 64/2/2 = 16x16

Creating the Model

model_0 = TinyVGG(input_shape=3,      # RGB channels
                  hidden_units=10,     # Number of feature maps
                  output_shape=len(class_names))  # 4 classes

model_0 = model_0.to(device)

print(model_0)
print(f"Number of parameters: {sum(p.numel() for p in model_0.parameters()):,}")
Enter fullscreen mode Exit fullscreen mode

I started with just 10 hidden units to keep it simple. This was my first mistake - the model was too simple for the task!


Part 4: Training Pipeline

The Training Step

def train_step(model: torch.nn.Module,
               dataloader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               optimizer: torch.optim.Optimizer,
               device=device):

    model.train()  # Enable dropout, batch norm, etc.
    train_loss, train_acc = 0, 0

    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # 1. Forward pass
        y_pred = model(X)
        loss = loss_fn(y_pred, y)
        train_loss += loss.item()

        # 2. Backward pass
        optimizer.zero_grad()  # Clear old gradients
        loss.backward()        # Calculate new gradients
        optimizer.step()       # Update weights

        # 3. Calculate accuracy
        y_pred_class = torch.argmax(y_pred, dim=1)
        train_acc += (y_pred_class == y).sum().item() / len(y_pred)

    # Average loss and accuracy
    train_loss /= len(dataloader)
    train_acc /= len(dataloader)

    return train_loss, train_acc
Enter fullscreen mode Exit fullscreen mode

The training loop:

  1. Forward pass - Feed data through model, calculate loss
  2. Backward pass - The magic of backpropagation:
    • zero_grad() - Clear previous gradients (they accumulate!)
    • backward() - Calculate how wrong we were
    • step() - Update weights to be less wrong
  3. Calculate accuracy - Convert predictions to class labels and compare

The Test Step

def test_step(model: torch.nn.Module,
              dataloader: torch.utils.data.DataLoader,
              loss_fn: torch.nn.Module,
              device=device):

    model.eval()  # Disable dropout
    test_loss, test_acc = 0, 0

    with torch.inference_mode():  # Disable gradient tracking (saves memory)
        for batch, (X, y) in enumerate(dataloader):
            X, y = X.to(device), y.to(device)

            # Forward pass only
            test_pred = model(X)
            loss = loss_fn(test_pred, y)
            test_loss += loss.item()

            # Calculate accuracy
            test_pred_labels = torch.argmax(test_pred, dim=1)
            test_acc += (test_pred_labels == y).sum().item() / len(test_pred)

    test_loss /= len(dataloader)
    test_acc /= len(dataloader)

    return test_loss, test_acc
Enter fullscreen mode Exit fullscreen mode

Key differences from training:

  1. model.eval() - Disables dropout (we want consistency)
  2. torch.inference_mode() - Saves memory by not tracking gradients
  3. No optimizer - We're not updating weights, just evaluating

The Main Training Loop

from tqdm.auto import tqdm

def train(model: torch.nn.Module,
          train_dataloader: torch.utils.data.DataLoader,
          test_dataloader: torch.utils.data.DataLoader,
          optimizer: torch.optim.Optimizer,
          loss_fn: torch.nn.Module = nn.CrossEntropyLoss(),
          epochs: int = 5,
          device=device):

    results = {
        "train_loss": [],
        "train_accuracy": [],
        "test_loss": [],
        "test_accuracy": []
    }

    for epoch in tqdm(range(epochs)):
        train_loss, train_acc = train_step(model=model,
                                           dataloader=train_dataloader,
                                           loss_fn=loss_fn,
                                           optimizer=optimizer,
                                           device=device)

        test_loss, test_acc = test_step(model=model,
                                        dataloader=test_dataloader,
                                        loss_fn=loss_fn,
                                        device=device)

        print(f"Epoch: {epoch+1} | "
              f"Train loss: {train_loss:.4f} | "
              f"Train Acc: {train_acc:.4f} | "
              f"Test loss: {test_loss:.4f} | "
              f"Test Acc: {test_acc:.4f}")

        # Store results
        results["train_loss"].append(train_loss)
        results["train_accuracy"].append(train_acc)
        results["test_loss"].append(test_loss)
        results["test_accuracy"].append(test_acc)

    return results
Enter fullscreen mode Exit fullscreen mode

This orchestrates everything:

  • Trains for the specified number of epochs
  • Tests after each epoch
  • Prints metrics (watching numbers is addictive! 📈)
  • Stores everything for later plotting

Let's Train!

# Set up optimizer and loss function
optimizer = torch.optim.Adam(model_0.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()

# Start training
torch.manual_seed(42)

results_0 = train(model=model_0,
                  train_dataloader=train_dataloader,
                  test_dataloader=test_dataloader,
                  optimizer=optimizer,
                  loss_fn=loss_fn,
                  epochs=20,
                  device=device)
Enter fullscreen mode Exit fullscreen mode

I chose Adam optimizer with learning rate 0.001 - a solid default for most problems.


Part 5: The Overfitting Disaster

The Results That Made Me Cry

Here's what happened after 20 epochs:

Epoch: 17 | Train loss: 1.0987 | Train Acc: 0.5280 | Test loss: 1.1463 | Test Acc: 0.5045
Epoch: 18 | Train loss: 1.0703 | Train Acc: 0.5439 | Test loss: 1.1714 | Test Acc: 0.5179
Epoch: 19 | Train loss: 1.1060 | Train Acc: 0.5351 | Test loss: 1.2098 | Test Acc: 0.4911
Epoch: 20 | Train loss: 1.0435 | Train Acc: 0.5609 | Test loss: 1.1909 | Test Acc: 0.5089
Enter fullscreen mode Exit fullscreen mode

56% training accuracy. 51% test accuracy.

For a 4-class problem, random guessing would give 25%. So I was doing better than random... barely.

Let's visualize the carnage:

import matplotlib.pyplot as plt

def plot_loss_curves(results):
    """Plot training and test loss/accuracy curves"""
    epochs = range(len(results["train_loss"]))

    plt.figure(figsize=(15, 5))

    # Loss
    plt.subplot(1, 2, 1)
    plt.plot(epochs, results["train_loss"], label="Train Loss")
    plt.plot(epochs, results["test_loss"], label="Test Loss")
    plt.title("Loss")
    plt.xlabel("Epochs")
    plt.legend()

    # Accuracy
    plt.subplot(1, 2, 2)
    plt.plot(epochs, results["train_accuracy"], label="Train Accuracy")
    plt.plot(epochs, results["test_accuracy"], label="Test Accuracy")
    plt.title("Accuracy")
    plt.xlabel("Epochs")
    plt.legend()

    plt.tight_layout()
    plt.show()

plot_loss_curves(results_0)
Enter fullscreen mode Exit fullscreen mode

Training curves showing struggling performance

Look at those curves! The training and test lines are practically on top of each other, bouncing around aimlessly. This isn't overfitting - this is underfitting. The model is too simple to learn the patterns.

My Desperate Attempts to Fix It

I tried everything I could think of:

Attempt 1: More Augmentation

# Added rotation, color jitter, normalization
enhanced_transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(15),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
Enter fullscreen mode Exit fullscreen mode

Result: Accuracy improved to ~55%. Still terrible.

Attempt 2: More Hidden Units

# Increased from 10 to 32 hidden units
model_1 = TinyVGG(input_shape=3, hidden_units=32, output_shape=4).to(device)
Enter fullscreen mode Exit fullscreen mode

Result: Training accuracy improved to ~62%, test accuracy stuck at ~53%. Better, but not good enough.

Attempt 3: Adding Dropout

# Added Dropout(0.4) after each pooling layer and before final classifier
self.conv_block_1 = nn.Sequential(
    nn.Conv2d(...),
    nn.ReLU(),
    nn.Conv2d(...),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=2),
    nn.Dropout(p=0.4)  # NEW
)
Enter fullscreen mode Exit fullscreen mode

Result: Worse! Accuracy dropped to ~48%. Dropout was hurting because the model was already struggling.

Attempt 4: Bigger Images

# Increased from 64x64 to 128x128
transforms.Resize((128, 128))
Enter fullscreen mode Exit fullscreen mode

Result: Training slowed to a crawl (4x longer per epoch), accuracy improved to ~58%. Still not worth it.

Attempt 5: Different Learning Rates

Tried 0.01, 0.0001, 0.00001... nothing made a meaningful difference.

The Moment of Clarity

After hours of frustration, I finally understood the problem:

💡 The Real Issue: I only had 600 training images split across 4 classes. That's just 150 images per class to learn from scratch. My tiny model couldn't extract meaningful features from such limited data.

The solution wasn't more augmentation or more layers. I needed either:

  1. Way more data (thousands of images per class)
  2. Transfer learning (use a model pre-trained on millions of images)

Gathering thousands more images? That would take weeks. Transfer learning? That could work today.


Part 6: The Solution - Transfer Learning

Coming in Part 2!

I know, I know , I left you on a cliffhanger. But trust me, the transfer learning solution is worth its own dedicated post.

Until then, try building your own classifier from scratch. Experience the pain. Appreciate the solution even more.

Quick Recap: What We Covered in Part 1

  • ✅ Downloaded and prepared the Food-101 dataset
  • ✅ Created a custom subset (4 classes, 800 images)
  • ✅ Built a TinyVGG model from scratch
  • ✅ Trained for 20 epochs and got... 56% accuracy
  • ✅ Tried EVERYTHING to improve it (spoiler: nothing worked)
  • ✅ Realized the fundamental problem: not enough data to train from scratch

Resources

Code & Dataset:

Learning Resources:


📢 Don't Miss Part 2!

Follow me here on dev.to to get notified when Part 2 drops!

In the meantime, check out my other ML projects on GitHub


Your Turn!

Have you dealt with small datasets? What worked for you? Have any transfer learning horror stories or success stories?

Drop your experiences in the comments! I'd love to hear about your ML journeys, especially the failures that taught you something. 👇

And remember: The best ML projects are often the ones that don't work perfectly on the first try - because that's when you actually learn something.


Thanks for reading! If you found this helpful, consider giving it a ❤️ and following for more ML adventures (and misadventures).

Tags: #machinelearning #pytorch #python #beginners #deeplearning #tutorial #computerVision #transferlearning

Top comments (0)