A brief introduction to Image Augmentation

#datascience #machinelearning

Computer Vision is a modern-day technology that is used in cell phones to unlock them, in self-driving cars, in diagnosis of cancerous cells, and many more things. In simpler words Computer Vision means to give eyesight to computers, the ultimate goal of engineers is to make a system which is capable of doing tasks humans can do and interpret visual data from images or videos or real-world visual data.

Computer Vision is an interdisciplinary scientific branch consisting of many fields like image processing, machine learning, deep learning, neurobiology, and signal processing.

Deep learning has given Computer Vision wings by developing many state-of-the-art models for performing different tasks related to Computer Vision.

The major tasks associated with Computer Vision are object classification, semantic segmentation, instance segmentation, object detection, and pose estimation.

The Problem : Scarcity of Data

All these tasks can only be performed with Machine Learning or Deep learning if sufficient amount of data is present. Computer Vision is applied in many fields and in some of the fields it is very difficult to get data due to legal or practical reasons. One such field is the medical image field. Deep neural networks require a lot of training data to obtain good results and prevent overfitting. To make a training dataset, you need to obtain images and then label them. For example, you need to assign correct class labels if you have an image classification task. For an object detection task, you need to draw bounding boxes around objects. For a semantic segmentation task, you need to assign a correct class to each input image pixel. This process requires manual labor, and sometimes it could be very costly to label the training data. For example, to correctly label medical images, you need expensive domain experts.

The Solution : Image Augmentation

The solution to the above stated problem of data scarcity is Image Augmentation. Image augmentation is a process of creating new training examples from the existing ones. To make a new sample, you slightly change the original image. For instance, you could make a new image a little brighter or change the contrast; you could cut a piece from the original image or resize it; you could make a new image by mirroring the original one or rotating it.

A simple example demonstrating different types of image augmentation.

Image Augmentation increases the performance of Deep Learning models many folds. It also helps in reducing overfitting. All the famous Deep learning models like AlexNet, Inception, Resnet and EfficientNet, etc use Augmentation.

Albumentations is a library which is very efficient for Image Augmentation. Albumentations efficiently implements a rich variety of image transform operations that are optimized for performance. Albumentations works well with data from varied domains: photos, medical images, satellite imagery, manufacturing and industrial applications, Generative Adversarial Networks. It can work with various deep learning frameworks such as PyTorch and Keras although the library is a part of the PyTorch ecosystem. The best part about it is that it is free and open source.

Top comments (2)

mayank joshi • Nov 24 '20

Great post Aman.

amananandrai • Nov 24 '20

Thanks