DEV Community

Cover image for A Compact Python Library for Creating Massive Augmented Datasets.
Thuwarakesh Murallie
Thuwarakesh Murallie

Posted on

A Compact Python Library for Creating Massive Augmented Datasets.

In most data science applications, collecting and labeling data is a costly and time-consuming process.

Yet, machine learning models do not generalize the problem well without enough data. It leads us to the situation called overfitting.

Data augmentation is a popular technique to overcome this situation. We can create copies of existing data points with slight variations. The algorithm sees them as new data.

To create image augmentation, we can use any image processing tool. But, there are dedicated libraries to do this task more efficiently.

The tool we discuss in this article is a feature-rich Python library for data augmentation. With it, we can build an augmentation pipeline to feed our ML model.

It means we don't have to transform and save copies of images from training data. The pipeline handles it every time we use an image for training.

This Little Python Package Creates Huge Augmented Datasets

If you find this post interesting, please leave some claps on Medium as well because it helps this article reach more people.

Top comments (0)