Unknownerror-404

Posted on Oct 24, 2025

Here's what I learnt after spending 6 months studying image augmentations.

#machinelearning #computervision #datascience #augmentations

October 2025

My last post explored what the most common pitfalls during early classifier training look like. Specifically, looking into transferable weights from pretrained models like MobileNet, EfficientNet, etc.

Since then I've worked more with the base augmentations, consistently for over 6 months; Working with augmentations like Autoaugment, RandAugment, TrivialAugment, Cutout, CutMix, Mixup. Though the learning from those 6 months is also discussion worthy, I'll leave those for some other blog...

Contents of this blog consists of:

Historical significance of AutoAugment.
Working behind AutoAugment.
Search Space in AutoAugment.
What is a Policy?
What are Probabilities in AutoAugment?
What is a magnitude in AutoAugment?
The RNN controller.
Proximal Policy Optimization.
Practical Experience.

Starting from this blog, I'll be breaking down augmentation techniques, how to implement them and their basic working.
Now, a word to word implementation is obviously not possible here, so we'll stick to explaining what the implementation looks like from a theoretical perspective.

Auto Augment:

Before we get into the actual theory, let me emphasize why Auto Augment is of such importance.

Henceforth, Auto Augment will be referred to as AA

AA was the first to tackle data limitations by creating dataset specific augmentations. So, for any given dataset, AA defines a curated list of policies discovered through automated search.

For this, AA utilizes a search space, the search space consists of three key properties:

Transformations/Operations:

AA makes up these augmentations in the form of policies. Each policy consists of five sub-policies, each with two different image transformation.

Image transformations are basic operations such as zooms, inversions, rotations etc. which help improve the classifications by providing variations of the base image.

The search space consists of varying number of transformations, if we are to consider the torch Vision library in python it allows us to import AA utilising the IMAGENET pretrained model as:

from torchvision.transforms import AutoAugment, AutoAugmentPolicy

autoaug = AutoAugment(policy=AutoAugmentPolicy.IMAGENET)

For this specific pretrained model, each policy consists of 16 transformations with 25 sub-policies.

Probability:

Probability states the probability of applying some transformation to any image.

For the search space each probability value is discretized into continuous values ranging from 0.0 to 1.0 providing 11 values.

Magnitude:

This defines the magnitude of the effect being applied.

Similar to probability, magnitude also consists of discretized values.

So for a transformation of rotation clockwise by 90° with a magnitude of 0.5 and probability of 1.0 leads to a 45° clockwise rotation.

These properties combine to form a policy to be applied to specific images.

Now, that we have a bearing on policy formation, let's move onto answering the next query:
How are the correct policies applied to the correct image?

As we covered policy formation there was hardly any mention of how AA chooses which policy is applied to which image. To answer that, we need to analyse the architecture of AA.

The base paper for AutoAugment utilizes a controller implemented as a RNN. Everything discussed earlier from search space formation to assigning policies is handled by this controller.

When the controller generates a new augmentation policy, it trains a smaller “child” or “toy” model on a subset of the dataset to evaluate that policy’s effectiveness. The validation accuracy of this model serves as a reward signal. Using reinforcement learning, specifically Proximal Policy Optimization (PPO), the controller updates its parameters to favor policies that yield better performance.

Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a reinforcement learning algorithm used for policy updates. PPO works by limiting the amount of change between the new policy and the old policy.

This ensures that the new policy doesn't deviate too much from the old one while also improving on the old one.

This was necessary to reduce the unstable jumps within updates and lead to a more stable update graph.

PPO builds upon its predecessors namely Trust Region Policy Optimization (TRPO) significantly reducing the computational overhead and providing a smoother convergence curve.

This allows AA to provide improved accuracy for the complete dataset by evaluating a smaller fraction utilising a RNN.

What's the tradeoff?
Like classical computer science, say data structures, for every improvement we lose a crucial property or characteristic. In case of AA that loss occurs in the form of massive computational overhead. For instance, training large quantities of images requires access to multiple GPUs. Furthermore the training time also increased based on the quantity of images being trained.

These factors limited the usability of AA for large scale use.

What I've gotten from my experience:
From my experience when working with smaller to medium scale quantity (ranging from 400-600 images) the time taken for training classifiers is comparable to that of RandAugment and Trivial Augment. The memory usage however was more fluctuative as it depended on varying applications utilising the memory, as such I was unable to obtain a clear trend among them.

Now, as we've mentioned RandAugment and Trivial Augment the following blogs will be in regards to explaining them in similar detail and touch on how they improve AA.

If you've had experience with them and want to append some information, feel free to comment below;
Until next time!

DEV Community

Here's what I learnt after spending 6 months studying image augmentations.

Top comments (0)