I Built a Food Classifier That Can't Tell Ramen from... Ramen? (Part 1)
Or: What I learned from my first overfitting disaster (and why transfer learning saved me)
You know that feeling when you follow a recipe, it looks perfect in the pan, and then you taste it and realize you've created something that only you (or your mother) could love? That's basically what happened with my first food classification model.
Let me tell you a story about ambition, augmentation, and why teaching a neural network the difference between my favorite foods turned into a masterclass in overfitting.
What You'll Need
Before we dive in, here's what you should have:
- Python 3.8+
- PyTorch installed
- Basic understanding of neural networks (but I'll explain as we go!)
- ~2GB of disk space for the dataset
- A GPU (or Google Colab) - training on CPU will take forever
- About 30 minutes to follow along
The Idea
Build a model that could classify the foods I actually care about. Not the entire Food-101 dataset with 101 classes of foods I've never even heard of, but my foods:
- 🍜 Ramen (obviously)
- 🍦 Ice cream (essential)
- 🧀 Nachos (comfort food supreme)
- 🥞 Pancakes (breakfast champion)
Four classes. How hard could it be?
Narrator: It was harder than she thought.
Part 1: Building the Dataset
Downloading Food-101
First, I imported the essentials:
import torch
from torchvision import datasets, transforms
from pathlib import Path
import zipfile
import requests
# Setup data directory
data_dir = Path("Data/")
image_dir = data_dir / "food101"
Breaking this down:
-
data_dirpoints to a folder called "Data/" where all my datasets will live -
image_diris specifically for the Food-101 dataset - I used the
/operator to join paths - much cleaner than string concatenation!
# Check if directory exists (being polite to my internet connection)
if image_dir.is_dir():
print(f"{image_dir} directory already exists.. skipping download")
else:
print(f"Did not find {image_dir} directory, creating one...")
image_dir.mkdir(parents=True, exist_ok=True)
Before downloading gigabytes of food images, I check if the directory already exists:
-
parents=True- creates any missing parent directories -
exist_ok=True- doesn't throw an error if the directory already exists
# Download Food-101 dataset
train_data = datasets.Food101(root=data_dir,
split="train",
download=True)
test_data = datasets.Food101(root=data_dir,
split="test",
download=True)
PyTorch's datasets.Food101 handles everything:
-
root=data_dir- where to save everything -
split="train"orsplit="test"- Food-101 comes pre-split -
download=True- downloads if not already present
Reality check: This downloads ~5GB of data. First run? Go grab a coffee. Maybe two.
Picking My Classes
class_names = train_data.classes
print(f"Total classes available: {len(class_names)}")
# Out of all 101, I chose my favorites
target_classes = ["ice_cream", "pancakes", "ramen", "nachos"]
Each class in Food-101 has about 750 training images and 250 test images. But I didn't want all of them (my laptop would cry), so I grabbed a subset.
The Manual Extraction Process
import random
data_path = data_dir / "food-101" / "images"
target_classes = ["ice_cream", "pancakes", "ramen", "nachos"]
# Taking 20% of available data per class
amount_to_get = 0.2
The amount_to_get = 0.2 means I'm taking 20% of available images - enough to train on, but not so much that my laptop starts smoking.
The Subset Selection Function
def get_subset(image_path=data_path,
data_splits=["train", "test"],
target_classes=["ice_cream", "pancakes", "ramen", "nachos"],
amount=0.1,
seed=42):
random.seed(seed) # For reproducibility
label_splits = {}
for data_split in data_splits:
print(f"[INFO] Creating image split for: {data_split}...")
# Food-101 provides text files listing train/test images
label_path = data_dir / "food-101" / "meta" / f"{data_split}.txt"
# Read and filter for our target classes
with open(label_path, "r") as f:
labels = [line.strip() for line in f.readlines()
if line.split("/")[0] in target_classes]
# Calculate sample size (20% of available)
number_to_sample = round(amount * len(labels))
print(f"[INFO] Getting random subset of {number_to_sample} images...")
# Randomly sample
sampled_images = random.sample(labels, k=number_to_sample)
# Convert to full file paths
image_paths = [image_path / f"{sample_image}.jpg"
for sample_image in sampled_images]
label_splits[data_split] = image_paths
return label_splits
Breaking this down:
-
random.seed(42)- This ensures I get the same "random" results every time. Reproducibility is crucial in ML! -
The label files - Food-101 provides
.txtfiles listing which images belong to train/test. Each line looks like "ramen/123456.jpg" -
Filtering -
line.split("/")[0]grabs the class name (the part before the /), keeping only my target classes -
Sampling -
random.sample()picks exactlynumber_to_samplerandom items with no duplicates - Path building - Converts labels like "ramen/123456" into full paths
# Run the function
label_splits = get_subset(amount=amount_to_get)
print(f"Training images: {len(label_splits['train'])}")
print(f"Test images: {len(label_splits['test'])}")
This gave me:
- Training: ~600 images (150 per class)
- Test: ~200 images (50 per class)
Creating the Custom Dataset Directory
# Create target directory with descriptive name
target_dir_name = f"Data/ic_pancake_ramen_nachos{str(int(amount_to_get*100))}_percent"
print(f"Creating directory: '{target_dir_name}'")
target_dir = Path(target_dir_name)
target_dir.mkdir(parents=True, exist_ok=True)
I'm creating a directory name that tells me exactly what's in it: ic_pancake_ramen_nachos20_percent
Pro tip: Descriptive directory names are your future self's best friend. When you have 5 different dataset versions, you'll thank yourself!
Copying the Files
import shutil
for image_split in label_splits.keys(): # "train" and "test"
for image_path in label_splits[str(image_split)]:
# Build destination path
dest_dir = target_dir / image_split / image_path.parent.stem / image_path.name
# Create directory if needed
if not dest_dir.parent.is_dir():
dest_dir.parent.mkdir(parents=True, exist_ok=True)
print(f"[INFO] Copying {image_path} to {dest_dir}...")
shutil.copy2(image_path, dest_dir)
The dest_dir construction creates paths like:
ic_pancake_ramen_nachos20_percent/train/ramen/123456.jpg
Reality check: This took about 5-10 minutes to copy all 800 images. Watching the progress messages scroll by was oddly therapeutic.
Packaging Everything
# Create a zip file for easy sharing
zip_file_name = data_dir / f"ic_pancake_ramen_nachos{str(int(amount_to_get*100))}_percent"
shutil.make_archive(zip_file_name,
format="zip",
root_dir=target_dir)
print(f"Created {zip_file_name}.zip!")
Now I had a portable dataset I could upload to GitHub and share!
Final folder structure:
ic_pancake_ramen_nachos20_percent/
├── train/
│ ├── ramen/ (150 images)
│ ├── ice_cream/ (150 images)
│ ├── nachos/ (150 images)
│ └── pancakes/ (150 images)
└── test/
├── ramen/ (50 images)
├── ice_cream/ (50 images)
├── nachos/ (50 images)
└── pancakes/ (50 images)
Resources:
Part 2: Getting Ready to Train
Device Setup: GPU or CPU?
import torch
# Device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
This checks if you have a GPU available (CUDA). Training on GPU is 10-50x faster than CPU. Think of it like checking if you have a sports car before a road trip - if yes, great! If not, the regular car still works.
My setup: I used Google Colab's free T4 GPU. Training time per epoch: ~30 seconds on GPU vs ~10 minutes on CPU.
Downloading from GitHub
Since I uploaded my dataset to GitHub, I needed to download it:
import requests
import zipfile
data_path = Path("Data/")
image_path = data_path / "ic_pancake_ramen_nachos"
# Check if folder exists
if image_path.is_dir():
print(f"{image_path} already exists")
else:
print(f"Did not find {image_path}, creating it...")
image_path.mkdir(parents=True, exist_ok=True)
# Download from GitHub
url = "https://raw.githubusercontent.com/mahidhiman12/Deep_learning_with_PyTorch/main/ic_pancake_ramen_nachos20_percent.zip"
with open(data_path / "ic_pancake_ramen_nachos20_percent.zip", "wb") as f:
request = requests.get(url)
f.write(request.content)
print("Download complete!")
# Unzip
with zipfile.ZipFile(data_path / "ic_pancake_ramen_nachos20_percent.zip", "r") as zip_ref:
print(f"Unzipping to {image_path}")
zip_ref.extractall(image_path)
The "wb" mode means "write binary" - crucial for zip files!
Exploring the Dataset
import os
def walkthrough_dir(dir_path):
"""Walk through directory and print info"""
for dirpath, dirnames, filenames in os.walk(dir_path):
print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'")
walkthrough_dir(image_path)
Output:
There are 2 directories and 0 images in 'Data/ic_pancake_ramen_nachos'
There are 4 directories and 0 images in 'Data/ic_pancake_ramen_nachos/train'
There are 0 directories and 150 images in 'Data/ic_pancake_ramen_nachos/train/ice_cream'
There are 0 directories and 150 images in 'Data/ic_pancake_ramen_nachos/train/nachos'
...
Perfect! Everything's organized correctly.
# Set up train and test directories
train_dir = image_path / "train"
test_dir = image_path / "test"
Visualizing the Data
Always look at your data before training:
import random
from PIL import Image
import matplotlib.pyplot as plt
# Get all image paths
image_path_list = list(image_path.glob("*/*/*.jpg"))
print(f"Total images: {len(image_path_list)}")
# Display random images
random_image_path = random.choice(image_path_list)
img = Image.open(random_image_path)
plt.figure(figsize=(8, 6))
plt.imshow(img)
plt.title(f"Class: {random_image_path.parent.stem}")
plt.axis('off')
plt.show()
print(f"Image dimensions: {img.height}x{img.width}")
Key observation: Images have varying sizes (512x384, 384x512, 640x480, etc.). This is why we need to resize everything - neural networks require consistent input dimensions.
Preprocessing & Augmentation
from torchvision import transforms
# Version 1: Basic transforms (what I started with)
basic_transform = transforms.Compose([
transforms.Resize(size=(64, 64)),
transforms.RandomHorizontalFlip(p=0.5),
transforms.ToTensor()
])
Breaking it down:
-
Compose([])- Chains transformations together, applied in order -
Resize((64, 64))- Squishes/stretches all images to 64x64 pixels (I started small for faster training - this was a mistake!) -
RandomHorizontalFlip(p=0.5)- Randomly flips images 50% of the time. A ramen bowl looks the same flipped, right? This is data augmentation. -
ToTensor()- Converts PIL images to PyTorch tensors AND normalizes pixel values from 0-255 to 0.0-1.0
Later, when fighting overfitting, I upgraded to:
# Version 2: Enhanced transforms (added when model was overfitting)
enhanced_transform = transforms.Compose([
transforms.Resize(size=(64, 64)),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomRotation(15), # Random rotation ±15 degrees
transforms.ColorJitter(brightness=0.2, contrast=0.2), # Vary brightness/contrast
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], # ImageNet stats
std=[0.229, 0.224, 0.225])
])
The Normalize() uses ImageNet's mean and standard deviation - like speaking the same language the neural network understands.
Visualizing Transformations
I created a function to see the effect of transforms:
def plot_transformed_images(image_paths, transform, n=3, seed=42):
"""Plot original vs transformed images side by side"""
random.seed(seed)
random_image_paths = random.sample(image_paths, k=n)
for image_path in random_image_paths:
with Image.open(image_path) as f:
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
# Original
ax[0].imshow(f)
ax[0].set_title(f"Original\nSize: {f.size}")
ax[0].axis("off")
# Transformed
transformed = transform(f).permute(1, 2, 0) # CHW -> HWC for matplotlib
ax[1].imshow(transformed)
ax[1].set_title(f"Transformed\nSize: {tuple(transformed.shape)}")
ax[1].axis("off")
fig.suptitle(f"Class: {image_path.parent.stem}", fontsize=16)
plt.tight_layout()
plot_transformed_images(image_path_list, basic_transform, n=3)
The .permute(1, 2, 0) is crucial! PyTorch tensors are (Channels, Height, Width), but matplotlib expects (Height, Width, Channels).
Creating Datasets and DataLoaders
from torchvision import datasets
# Using basic transforms to start
train_dataset = datasets.ImageFolder(root=train_dir,
transform=basic_transform)
test_dataset = datasets.ImageFolder(root=test_dir,
transform=basic_transform)
class_names = train_dataset.classes
class_to_idx = train_dataset.class_to_idx
print(f"Train dataset: {len(train_dataset)} images")
print(f"Test dataset: {len(test_dataset)} images")
print(f"Classes: {class_names}")
print(f"Class to index mapping: {class_to_idx}")
ImageFolder is magical! It automatically:
- Creates labels based on folder names
- Maps classes to indices
- Applies transforms to each image
As long as your data follows the train/class_name/image.jpg structure, it just works.
from torch.utils.data import DataLoader
BATCH_SIZE = 32
train_dataloader = DataLoader(dataset=train_dataset,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=2)
test_dataloader = DataLoader(dataset=test_dataset,
batch_size=BATCH_SIZE,
shuffle=False,
num_workers=2)
print(f"Length of train dataloader: {len(train_dataloader)} batches")
print(f"Length of test dataloader: {len(test_dataloader)} batches")
DataLoader explained:
-
batch_size=32- Process 32 images at once (faster than one-by-one) -
shuffle=True- Randomize training data each epoch (helps prevent overfitting) -
num_workers=2- Parallel data loading (speeds things up)
Let's peek at a batch:
img, label = next(iter(train_dataloader))
print(f"Image batch shape: {img.shape}") # torch.Size([32, 3, 64, 64])
print(f"Label batch shape: {label.shape}") # torch.Size([32])
Perfect! 32 images, 3 color channels (RGB), 64x64 pixels each.
Part 3: Building the Model
The Architecture: TinyVGG
I based this on the CNN Explainer website's model - replicating existing architectures is common practice in ML!
import torch
from torch import nn
class TinyVGG(nn.Module):
def __init__(self, input_shape, hidden_units, output_shape):
super().__init__()
# Convolutional block 1
self.conv_block_1 = nn.Sequential(
nn.Conv2d(in_channels=input_shape,
out_channels=hidden_units,
kernel_size=3,
stride=1,
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units,
out_channels=hidden_units,
kernel_size=3,
stride=1,
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2)
)
# Convolutional block 2
self.conv_block_2 = nn.Sequential(
nn.Conv2d(in_channels=hidden_units,
out_channels=hidden_units,
kernel_size=3,
stride=1,
padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units,
out_channels=hidden_units,
kernel_size=3,
stride=1,
padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2)
)
# Classifier
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(in_features=hidden_units * 16 * 16,
out_features=output_shape)
)
def forward(self, x):
x = self.conv_block_1(x)
x = self.conv_block_2(x)
x = self.classifier(x)
return x
Architecture breakdown:
Each convolutional block:
-
Conv2d- Detects patterns (edges, textures) using 3x3 filters -
ReLU()- Activation function (adds non-linearity) - Another
Conv2d- Learns more complex patterns -
MaxPool2d(2)- Shrinks spatial dimensions by 2x (64→32→16)
The classifier:
-
Flatten()- Converts 2D feature maps into 1D vector -
Linear()- Final layer outputs 4 values (one per class)
The hidden_units * 16 * 16 mystery:
Where did 16x16 come from? Here's the debug trick:
# Create dummy data matching your input shape
dummy_x = torch.randn(size=[1, 3, 64, 64]).to(device)
# Try passing through model (will error if dimensions wrong)
# The error message tells you the actual dimensions needed!
model_0(dummy_x)
The math: Start with 64x64 → MaxPool2d twice → 64/2/2 = 16x16
Creating the Model
model_0 = TinyVGG(input_shape=3, # RGB channels
hidden_units=10, # Number of feature maps
output_shape=len(class_names)) # 4 classes
model_0 = model_0.to(device)
print(model_0)
print(f"Number of parameters: {sum(p.numel() for p in model_0.parameters()):,}")
I started with just 10 hidden units to keep it simple. This was my first mistake - the model was too simple for the task!
Part 4: Training Pipeline
The Training Step
def train_step(model: torch.nn.Module,
dataloader: torch.utils.data.DataLoader,
loss_fn: torch.nn.Module,
optimizer: torch.optim.Optimizer,
device=device):
model.train() # Enable dropout, batch norm, etc.
train_loss, train_acc = 0, 0
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
# 1. Forward pass
y_pred = model(X)
loss = loss_fn(y_pred, y)
train_loss += loss.item()
# 2. Backward pass
optimizer.zero_grad() # Clear old gradients
loss.backward() # Calculate new gradients
optimizer.step() # Update weights
# 3. Calculate accuracy
y_pred_class = torch.argmax(y_pred, dim=1)
train_acc += (y_pred_class == y).sum().item() / len(y_pred)
# Average loss and accuracy
train_loss /= len(dataloader)
train_acc /= len(dataloader)
return train_loss, train_acc
The training loop:
- Forward pass - Feed data through model, calculate loss
-
Backward pass - The magic of backpropagation:
-
zero_grad()- Clear previous gradients (they accumulate!) -
backward()- Calculate how wrong we were -
step()- Update weights to be less wrong
-
- Calculate accuracy - Convert predictions to class labels and compare
The Test Step
def test_step(model: torch.nn.Module,
dataloader: torch.utils.data.DataLoader,
loss_fn: torch.nn.Module,
device=device):
model.eval() # Disable dropout
test_loss, test_acc = 0, 0
with torch.inference_mode(): # Disable gradient tracking (saves memory)
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
# Forward pass only
test_pred = model(X)
loss = loss_fn(test_pred, y)
test_loss += loss.item()
# Calculate accuracy
test_pred_labels = torch.argmax(test_pred, dim=1)
test_acc += (test_pred_labels == y).sum().item() / len(test_pred)
test_loss /= len(dataloader)
test_acc /= len(dataloader)
return test_loss, test_acc
Key differences from training:
-
model.eval()- Disables dropout (we want consistency) -
torch.inference_mode()- Saves memory by not tracking gradients - No optimizer - We're not updating weights, just evaluating
The Main Training Loop
from tqdm.auto import tqdm
def train(model: torch.nn.Module,
train_dataloader: torch.utils.data.DataLoader,
test_dataloader: torch.utils.data.DataLoader,
optimizer: torch.optim.Optimizer,
loss_fn: torch.nn.Module = nn.CrossEntropyLoss(),
epochs: int = 5,
device=device):
results = {
"train_loss": [],
"train_accuracy": [],
"test_loss": [],
"test_accuracy": []
}
for epoch in tqdm(range(epochs)):
train_loss, train_acc = train_step(model=model,
dataloader=train_dataloader,
loss_fn=loss_fn,
optimizer=optimizer,
device=device)
test_loss, test_acc = test_step(model=model,
dataloader=test_dataloader,
loss_fn=loss_fn,
device=device)
print(f"Epoch: {epoch+1} | "
f"Train loss: {train_loss:.4f} | "
f"Train Acc: {train_acc:.4f} | "
f"Test loss: {test_loss:.4f} | "
f"Test Acc: {test_acc:.4f}")
# Store results
results["train_loss"].append(train_loss)
results["train_accuracy"].append(train_acc)
results["test_loss"].append(test_loss)
results["test_accuracy"].append(test_acc)
return results
This orchestrates everything:
- Trains for the specified number of epochs
- Tests after each epoch
- Prints metrics (watching numbers is addictive! 📈)
- Stores everything for later plotting
Let's Train!
# Set up optimizer and loss function
optimizer = torch.optim.Adam(model_0.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()
# Start training
torch.manual_seed(42)
results_0 = train(model=model_0,
train_dataloader=train_dataloader,
test_dataloader=test_dataloader,
optimizer=optimizer,
loss_fn=loss_fn,
epochs=20,
device=device)
I chose Adam optimizer with learning rate 0.001 - a solid default for most problems.
Part 5: The Overfitting Disaster
The Results That Made Me Cry
Here's what happened after 20 epochs:
Epoch: 17 | Train loss: 1.0987 | Train Acc: 0.5280 | Test loss: 1.1463 | Test Acc: 0.5045
Epoch: 18 | Train loss: 1.0703 | Train Acc: 0.5439 | Test loss: 1.1714 | Test Acc: 0.5179
Epoch: 19 | Train loss: 1.1060 | Train Acc: 0.5351 | Test loss: 1.2098 | Test Acc: 0.4911
Epoch: 20 | Train loss: 1.0435 | Train Acc: 0.5609 | Test loss: 1.1909 | Test Acc: 0.5089
56% training accuracy. 51% test accuracy.
For a 4-class problem, random guessing would give 25%. So I was doing better than random... barely.
Let's visualize the carnage:
import matplotlib.pyplot as plt
def plot_loss_curves(results):
"""Plot training and test loss/accuracy curves"""
epochs = range(len(results["train_loss"]))
plt.figure(figsize=(15, 5))
# Loss
plt.subplot(1, 2, 1)
plt.plot(epochs, results["train_loss"], label="Train Loss")
plt.plot(epochs, results["test_loss"], label="Test Loss")
plt.title("Loss")
plt.xlabel("Epochs")
plt.legend()
# Accuracy
plt.subplot(1, 2, 2)
plt.plot(epochs, results["train_accuracy"], label="Train Accuracy")
plt.plot(epochs, results["test_accuracy"], label="Test Accuracy")
plt.title("Accuracy")
plt.xlabel("Epochs")
plt.legend()
plt.tight_layout()
plt.show()
plot_loss_curves(results_0)
Look at those curves! The training and test lines are practically on top of each other, bouncing around aimlessly. This isn't overfitting - this is underfitting. The model is too simple to learn the patterns.
My Desperate Attempts to Fix It
I tried everything I could think of:
Attempt 1: More Augmentation
# Added rotation, color jitter, normalization
enhanced_transform = transforms.Compose([
transforms.Resize((64, 64)),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomRotation(15),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
Result: Accuracy improved to ~55%. Still terrible.
Attempt 2: More Hidden Units
# Increased from 10 to 32 hidden units
model_1 = TinyVGG(input_shape=3, hidden_units=32, output_shape=4).to(device)
Result: Training accuracy improved to ~62%, test accuracy stuck at ~53%. Better, but not good enough.
Attempt 3: Adding Dropout
# Added Dropout(0.4) after each pooling layer and before final classifier
self.conv_block_1 = nn.Sequential(
nn.Conv2d(...),
nn.ReLU(),
nn.Conv2d(...),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Dropout(p=0.4) # NEW
)
Result: Worse! Accuracy dropped to ~48%. Dropout was hurting because the model was already struggling.
Attempt 4: Bigger Images
# Increased from 64x64 to 128x128
transforms.Resize((128, 128))
Result: Training slowed to a crawl (4x longer per epoch), accuracy improved to ~58%. Still not worth it.
Attempt 5: Different Learning Rates
Tried 0.01, 0.0001, 0.00001... nothing made a meaningful difference.
The Moment of Clarity
After hours of frustration, I finally understood the problem:
💡 The Real Issue: I only had 600 training images split across 4 classes. That's just 150 images per class to learn from scratch. My tiny model couldn't extract meaningful features from such limited data.
The solution wasn't more augmentation or more layers. I needed either:
- Way more data (thousands of images per class)
- Transfer learning (use a model pre-trained on millions of images)
Gathering thousands more images? That would take weeks. Transfer learning? That could work today.
Part 6: The Solution - Transfer Learning
Coming in Part 2!
I know, I know , I left you on a cliffhanger. But trust me, the transfer learning solution is worth its own dedicated post.
Until then, try building your own classifier from scratch. Experience the pain. Appreciate the solution even more.
Quick Recap: What We Covered in Part 1
- ✅ Downloaded and prepared the Food-101 dataset
- ✅ Created a custom subset (4 classes, 800 images)
- ✅ Built a TinyVGG model from scratch
- ✅ Trained for 20 epochs and got... 56% accuracy
- ✅ Tried EVERYTHING to improve it (spoiler: nothing worked)
- ✅ Realized the fundamental problem: not enough data to train from scratch
Resources
Code & Dataset:
Learning Resources:
- CNN Explainer - Interactive visualization
- PyTorch Transfer Learning Tutorial
- Fast.ai Practical Deep Learning
📢 Don't Miss Part 2!
Follow me here on dev.to to get notified when Part 2 drops!
In the meantime, check out my other ML projects on GitHub
Your Turn!
Have you dealt with small datasets? What worked for you? Have any transfer learning horror stories or success stories?
Drop your experiences in the comments! I'd love to hear about your ML journeys, especially the failures that taught you something. 👇
And remember: The best ML projects are often the ones that don't work perfectly on the first try - because that's when you actually learn something.
Thanks for reading! If you found this helpful, consider giving it a ❤️ and following for more ML adventures (and misadventures).
Tags: #machinelearning #pytorch #python #beginners #deeplearning #tutorial #computerVision #transferlearning

Top comments (0)