DEV Community

Cover image for A Compact Python Library for Creating Massive Augmented Datasets.
Thuwarakesh Murallie
Thuwarakesh Murallie

Posted on

3

A Compact Python Library for Creating Massive Augmented Datasets.

In most data science applications, collecting and labeling data is a costly and time-consuming process.

Yet, machine learning models do not generalize the problem well without enough data. It leads us to the situation called overfitting.

Data augmentation is a popular technique to overcome this situation. We can create copies of existing data points with slight variations. The algorithm sees them as new data.

To create image augmentation, we can use any image processing tool. But, there are dedicated libraries to do this task more efficiently.

The tool we discuss in this article is a feature-rich Python library for data augmentation. With it, we can build an augmentation pipeline to feed our ML model.

It means we don't have to transform and save copies of images from training data. The pipeline handles it every time we use an image for training.

This Little Python Package Creates Huge Augmented Datasets

If you find this post interesting, please leave some claps on Medium as well because it helps this article reach more people.

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay