Dropout: Switch Off Neurons to Stop Overfitting

#ai #machinelearning #deeplearning #beginners

Dropout is almost absurdly simple — randomly switch off neurons during training — yet it was one of the biggest anti-overfitting wins in deep learning. Here's why it works, visualized.

🎲 Watch neurons drop (toggle the rate): https://dev48v.infy.uk/dl/day20-dropout.html

What it does

On each training step, each hidden neuron is kept with probability (1−p) and zeroed out with probability p (say p=0.5). A different random subset drops every step. The demo grays out a fresh random set of neurons each pass and cuts their edges.

Why that helps

Neurons can't rely on any specific other neuron being present, so they can't co-adapt into a fragile memorized solution — each must learn a feature that's useful on its own. It's like training a huge ensemble of subnetworks that share weights. Result: a smaller train/val gap (less overfitting) — which the two accuracy curves in the demo show.

Train vs inference (the gotcha)

You drop during training only. At inference, all neurons are on. To keep the expected activations consistent, inverted dropout scales the kept activations by 1/(1−p) during training, so inference needs no change.

Modern note

With batch norm (Day 19) and huge datasets, dropout is needed less in conv nets — but it's still standard in Transformers (attention + feed-forward). It's regularization, alongside L2 (Day 17).

🔨 Built from scratch (mask = rand > p → scale by 1/(1−p) → off at eval) on the page: https://dev48v.infy.uk/dl/day20-dropout.html

Part of DeepLearningFromZero. 🌐 https://dev48v.infy.uk