What is Cutout?
Cutout might be one of the few augmentations that looks wrong at first glance.
Instead of perturbing the image in subtle, semantically “safe” ways, like rotation or brightness change, Cutout just… blacks out a chunk of the image.
A square.
Right in the middle.
Gone.
Introduced in 2017, Cutout asks a bold question:
“What if we deliberately hide a portion of your training data and force the model to deal with it?”
No fancy tuning.
No mixed gradients.
Just occlusion.
Why does removing information help?
Deep networks love shortcuts.
If a dataset has localized discriminative features, say, dog eyes or airplane tails, the model will latch onto them with laser precision.
Cutout’s mask breaks those shortcuts.
By erasing a random portion of each training sample:
Local features cannot be fully trusted
The model must build distributed, global representations
Robustness improves because inference almost always sees unmasked images
This “forced generalization” effect is surprisingly effective, especially on small/medium datasets.
Why doesn’t Cutout ruin training?
Two reasons:
The masked region is small relative to the full image
Most of the image remains intact. The model can still learn the correct label.
Occlusion is consistent with real-world noise
Objects get blocked, lighting changes, hands get in the way: Cutout mimics this variability.
It’s structured chaos.
When does Cutout struggle?
Cutout begins to underperform when:
Fine-grained details matter
The masked region wipes out key class-defining features
The dataset already contains natural occlusion
The mask size is tuned poorly
Still, as a single-component augmentation, Cutout remains remarkably useful and computationally near-free.
Conclusion
Cutout is the simplest form of feature dropout:
block part of the image, force the model to think harder.
Next up: CutMix, the augmentation that asked:
“What if we don’t remove pixels… but replace them with pixels from another image?”
Top comments (0)