TL;DR
I trained a 3-block CNN in PyTorch on the FER-2013 dataset to classify 7 emotions. This post explains the dataset challenges, preprocessing and augmentation, exact model architecture, training recipe, evaluation (confusion matrix + per-class F1), and next steps for deployment.
Introduction
Emotion recognition enables richer human–computer interactions. I chose FER-2013 because it’s realistic: low-resolution (48×48), grayscale, and class-imbalanced. The goal: produce a reproducible, deployment-ready CNN pipeline that balances accuracy and efficiency for real-time inference.
Problem statement
- Input: 48×48 grayscale faces.
- Task: 7-class classification — Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral.
- Challenges: small images → limited features, class imbalance, noisy labels, and intra-class variation.
Dataset & preprocessing
- Source: FER-2013 (Kaggle). Split into train/val/test as in the original CSV (or your split).
- Preprocessing pipeline (PyTorch transforms):
from torchvision import transforms
train_transform = transforms.Compose([
transforms.ToPILImage(),
transforms.RandomHorizontalFlip(), # sensible for faces
transforms.RandomRotation(10), # small rotations
transforms.RandomResizedCrop(48, scale=(0.9,1.0)),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
val_transform = transforms.Compose([
transforms.ToPILImage(),
transforms.Resize((48,48)),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
Model architecture
- Input: 48 × 48 × 1 (grayscale)
- Block 1: Conv2d(1 → 64, 3×3, pad=1) → BatchNorm2d(64) → ReLU → MaxPool2d(2×2) → OUTPUT 24×24×64
- Block 2: Conv2d(64 → 128, 3×3, pad=1) → BatchNorm2d(128) → ReLU → MaxPool2d(2×2) → OUTPUT 12×12×128
- Block 3: Conv2d(128 → 256, 3×3, pad=1) → BatchNorm2d(256) → ReLU → MaxPool2d(2×2) → OUTPUT 6×6×256
- Dropout2d(p=0.25) → Flatten (9216) → FC(9216 → 512) → ReLU → Dropout(p=0.5) → FC(512 → 7) → Softmax (inference)
Training recipe
- Loss: CrossEntropyLoss()
- Optimizer: AdamW(lr=1e-3, weight_decay=1e-4)
- Scheduler: ReduceLROnPlateau or CosineAnnealingLR (I used ReduceLROnPlateau on val loss)
- Batch size: 64 (adjust by GPU memory)
- Epochs: 30–60 with early stopping (patience 7 on val loss)
- Checkpoint: save best_model.pt by val F1 (or loss)
Minimal training loop snippet
`criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=3)
for epoch in range(1, epochs+1):
train_one_epoch(model, train_loader, optimizer, criterion)
val_loss, val_metrics = validate(model, val_loader, criterion)
scheduler.step(val_loss)
if val_metrics['f1_macro'] > best_f1:
best_f1 = val_metrics['f1_macro']
torch.save(model.state_dict(), 'best_model.pt')
`
Reproducibility
import random, numpy as np, torch
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
Conclusion
Check code on GitHub. If you want this adapted for a real-time webcam or a Django web deploy, contact me.
Top comments (0)