The middle brother in classifier development: What is RandAugment?

#machinelearning #computervision #datascience #augmentations

For those unaware, RandAugment is the successor to AutoAugment (AA). Introduced in 2020, it works to aims upon AA's working while also having its own fallacies.

As this is a follow up to a previous blog, I recommend checking out the previous one listed here: AutoAugment.

To recap the problem with AA, it required large amounts of computational overhead and data to work with in classifier training. This was predominantly solved by using pretrained models and utilizing their preset weights on the internal model, a deeper look into pretrained models can be found at: Pretrained models and their fallacies.

Contents of the blog:

What is RA?
RA improves upon AA.
Why does RA fail?

What is RandAugment (RA)?

RandAugment (RA) is a developmental improvement upon AutoAugment (AA).
It introduces several enhancements; including improved accuracy, reduced computational cost, and simpler implementation.
Crucially, RA enables effective augmentation in low-compute environments, making it one of the most lightweight and practical automated data-augmentation strategies.

How does RA improve upon AA?

AutoAugment (AA) operates by constructing a search space of augmentation policies.
Each policy consists of five sub-policies, and each sub-policy applies two transformations (e.g., rotate, shear, color adjust).
The method uses a controller (based on a PPO reinforcement-learning algorithm) to explore this space and learn which augmentations work best.

However, this search process is computationally very expensive.
According to the original AutoAugment paper:
The search for CIFAR-10 took roughly 5,000 GPU hours on NVIDIA V100s.
The search for ImageNet required around 15,000 TPU hours.

Such massive training times make AA impractical for many real-world or large-scale applications.

RandAugment removes this search phase entirely

Instead of learning policies, RA:
Discards the controller and search-space construction.
Randomly selects transformations from a fixed set.

Uses two simple, tunable hyperparameters:

N – the number of transformations to apply.

M – the magnitude (strength) of each transformation.

This drastically reduces computation while achieving comparable accuracy to AutoAugment on many datasets.

Why doesn’t RA scale as well to larger datasets?

While RA eliminates the heavy search cost, it still requires applying random transformations dynamically during training.
For very large datasets (e.g., ImageNet-scale or larger), this constant per-image randomization especially when tuning N and M can still impose a noticeable computational overhead.

In other words:

AA is expensive because of policy search.

RA is much cheaper but still adds cost due to on-the-fly random augmentation for each sample.

While RA is significantly more efficient than AA, it may still be non-optimal for extremely large-scale or resource-constrained training setups compared to even simpler methods like TrivialAugment or AugMix.

A continuation into the Trivial Augment will be available within 4 days.

DEV Community

The middle brother in classifier development: What is RandAugment?

Top comments (0)