How I Built a Facial-Expression Recognition Model with PyTorch (FER-2013, 72% Val Acc)

TL;DR

I trained a 3-block CNN in PyTorch on the FER-2013 dataset to classify 7 emotions. This post explains the dataset challenges, preprocessing and augmentation, exact model architecture, training recipe, evaluation (confusion matrix + per-class F1), and next steps for deployment.

Introduction

Emotion recognition enables richer human–computer interactions. I chose FER-2013 because it’s realistic: low-resolution (48×48), grayscale, and class-imbalanced. The goal: produce a reproducible, deployment-ready CNN pipeline that balances accuracy and efficiency for real-time inference.

Problem statement

Input: 48×48 grayscale faces.
Task: 7-class classification — Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral.
Challenges: small images → limited features, class imbalance, noisy labels, and intra-class variation.

Dataset & preprocessing

Source: FER-2013 (Kaggle). Split into train/val/test as in the original CSV (or your split).
Preprocessing pipeline (PyTorch transforms):

from torchvision import transforms

train_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomHorizontalFlip(),     # sensible for faces
    transforms.RandomRotation(10),         # small rotations
    transforms.RandomResizedCrop(48, scale=(0.9,1.0)),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

val_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((48,48)),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

Model architecture

Input: 48 × 48 × 1 (grayscale)
Block 1: Conv2d(1 → 64, 3×3, pad=1) → BatchNorm2d(64) → ReLU → MaxPool2d(2×2) → OUTPUT 24×24×64
Block 2: Conv2d(64 → 128, 3×3, pad=1) → BatchNorm2d(128) → ReLU → MaxPool2d(2×2) → OUTPUT 12×12×128
Block 3: Conv2d(128 → 256, 3×3, pad=1) → BatchNorm2d(256) → ReLU → MaxPool2d(2×2) → OUTPUT 6×6×256
Dropout2d(p=0.25) → Flatten (9216) → FC(9216 → 512) → ReLU → Dropout(p=0.5) → FC(512 → 7) → Softmax (inference)

Training recipe

Loss: CrossEntropyLoss()
Optimizer: AdamW(lr=1e-3, weight_decay=1e-4)
Scheduler: ReduceLROnPlateau or CosineAnnealingLR (I used ReduceLROnPlateau on val loss)
Batch size: 64 (adjust by GPU memory)
Epochs: 30–60 with early stopping (patience 7 on val loss)
Checkpoint: save best_model.pt by val F1 (or loss)

Minimal training loop snippet

`criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=3)

for epoch in range(1, epochs+1):
train_one_epoch(model, train_loader, optimizer, criterion)
val_loss, val_metrics = validate(model, val_loader, criterion)
scheduler.step(val_loss)
if val_metrics['f1_macro'] > best_f1:
best_f1 = val_metrics['f1_macro']
torch.save(model.state_dict(), 'best_model.pt')
`

Reproducibility

import random, numpy as np, torch
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False