Jakub Czakon

Posted on May 19, 2020

Image Segmentation: Tips and Tricks from 39 Kaggle Competitions

#python #machinelearning #datascience

This article was originally posted by Derrick Mwiti on the Neptune blog where you can find more in-depth articles for machine learning practitioners.

Imagine if you could get all the tips and tricks you need to hammer a Kaggle competition. I have gone over 39 Kaggle competitions including

Data Science Bowl 2017 – $1,000,000
Intel & MobileODT Cervical Cancer Screening – $100,000
2018 Data Science Bowl – $100,000
Airbus Ship Detection Challenge – $60,000
Planet: Understanding the Amazon from Space – $60,000
APTOS 2019 Blindness Detection – $50,000
Human Protein Atlas Image Classification – $37,000
SIIM-ACR Pneumothorax Segmentation – $30,000
Inclusive Images Challenge – $25,000

– and extracted that knowledge for you. Dig in.

External Data
Preprocessing
Data Augmentations
Modeling
Hardware Setups
Loss Functions
Training Tips
Evaluation and Cross-validation
Ensembling Methods
Post Processing

External Data

Use of the LUng Node Analysis Grand Challenge data because it contains detailed annotations from radiologists
Use of the LIDC-IDRI data because it had radiologist descriptions of each tumor that they found
Use Flickr CC, Wikipedia Commons datasets
Use Human Protein Atlas Dataset
Use IDRiD dataset

Data Exploration and Gaining insights

Clustering of 3d segmentation with the 0.5 threshold
Identify if there is a substantial difference in train/test label distributions

Preprocessing

Perform blob Detection using the Difference of Gaussian (DoG) method. Used the implementation available in skimage package.
Use of patch-based inputs for training in order to reduce the time of training
Use cudf for loading data instead of Pandas because it has a faster reader
Ensure that all the images have the same orientation
Apply contrast limited adaptive histogram equalization
Use OpenCV for all general image preprocessing
Employ automatic active learning and adding manual annotations
Resize all images to the same resolution in order to apply the same model to scans of different thicknesses
Convert scan images into normalized 3D numpy arrays
Apply single Image Haze Removal using Dark Channel Prior
Convert all data to Hounsfield units
Find duplicate images using pair-wise correlation on RGBY
Make labels more balanced by developing a sampler Apply pseudo labeling to test data in order to improve score
Scale down images/masks to 320×480
Histogram equalization (CLAHE) with kernel size 32×32
Convert DCM to PNG
Calculate the md5 hash for each image when there are duplicate images

Data Augmentations

Use albumentations package for augmentations
Apply random rotation by 90 degrees
Use horizontal, vertical or both flips
Attempt heavy geometric transformations: Elastic Transform, PerspectiveTransform, Piecewise Affine transforms, pincushion distortion
Apply random HSV
Use of loss-less augmentation for generalization to prevent loss of useful image information
Apply channel shuffling
Do data augmentation based on class frequency
Apply gaussian noise
Use lossless permutations of 3D images for data augmentation
Rotate by a random angle from 0 to 45 degrees
Scale by a random factor from 0.8 to 1.2
Brightness changing Randomly change hue, saturation and value Apply D4 augmentations Contrast limited adaptive histogram equalization Use the AutoAugment augmentation strategy

Modeling

Architectures

Use of a U-net based architecture. Adopted the concepts and applied them to 3D input tensors
Employing automatic active learning and adding manual annotations
The inception-ResNet v2 architecture for training features with different receptive fields
Siamese networks with adversarial training
ResNet50, Xception, Inception ResNet, v2 x 5 with Dense (FC) layer as the final layer
Use of a global max-pooling layer which returns a fixed-length output no matter the input size
Use of stacked dilated convolutions
VoxelNet
Replace plus sign in LinkNet skip connections with concat and conv1x1
Generalized mean pooling
Keras NASNetLarge to train the model from scratch using 224x224x3
Use of the 3D convnet to slide over the images
Imagenet-pre-trained ResNet152 as the feature extractor *Replace the final fully-connected layers of ResNet by 3 fully connected layers with dropout
Use ConvTranspose in the decoder
Applying the VGG baseline architecture
Implementing the C3D network with adjusted receptive fields and a 64 unit bottleneck layer on the end of the network
Use of UNet type architectures with pre-trained weights to improve convergence and performance of binary segmentation on 8-bit RGB input images
LinkNet since it’s fast and memory efficient
MASKRCNN
BN-Inception
Fast Point R-CNN
Seresnext
UNet and Deeplabv3
Faster RCNN
SENet154
ResNet152
NASNet-A-Large
EfficientNetB4
ResNet101
GAPNet
PNASNet-5-Large
Densenet121
AC-GAN
XceptionNet (96)(, XceptionNet (299), Inception v3 (139), InceptionResNet v2 (299), DenseNet121 (224)
AlbuNet (resnet34) from ternausnets
SpaceNet
Resnet50 from selim_sef SpaceNet 4
SCSEUnet (seresnext50) from selim_sef SpaceNet 4
A custom Unet and Linknet architecture
FPNetResNet50 (5 folds)
FPNetResNet101 (5 folds)
FPNetResNet101 (7 folds with different seeds)
PANetDilatedResNet34 (4 folds)
PANetResNet50 (4 folds)
EMANetResNet101 (2 folds)
RetinaNet
Deformable R-FCN
Deformable Relation Networks

Hardware Setups

Loss Functions

Dice Coefficient because it works well with imbalanced data
Weighted boundary loss whose aim is to reduce the distance between the predicted segmentation and the ground truth
MultiLabelSoftMarginLoss that creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input and target
Balanced cross entropy (BCE) [with logit loss]( that involves weighing the positive and negative examples by a certain coefficient
Lovasz that performs direct optimization of the mean intersection-over-union loss in neural networks based on the convex Lovasz extension of sub-modular losses
FocalLoss + Lovasz obtained by summing the Focal and Lovasz losses
Arc margin loss that incorporates margin in order to maximise face class separability
Npairs loss that computes the npairs loss between y_true and y_pred.
A combination of BCE and Dice loss functions
LSEP – a pairwise ranking that is is smooth everywhere and thus is easier to optimize
Center loss that simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers
Ring Loss that augments standard loss functions such as Softmax
Hard triplet loss that trains a network to embed features of the same class at the same time maximizing the embedding distance of different classes
1 + BCE – Dice that involves subtracting the BCE and DICE losses then adding 1
Binary cross-entropy – log(dice) that is the binary cross-entropy minus the log of the dice loss
Combinations of BCE, dice and focal
Lovasz Loss that loss performs direct optimization of the mean intersection-over-union loss
BCE + DICE -Dice loss is obtained by calculating smooth dice coefficient function
Focal loss with Gamma 2 that is an improvement to the standard cross-entropy criterion
BCE + DICE + Focal – this is basically a summation of the three loss functions
Active Contour Loss that incorporates the area and size information and integrates the information in a dense deep learning model
1024 * BCE(results, masks) + BCE(cls, cls_target)
Focal + kappa – Kappa is a loss function for multi-class classification of ordinal data in deep learning. In this case we sum it and the focal loss
ArcFaceLoss — Additive Angular Margin Loss for Deep Face Recognition
soft Dice trained on positives only – Soft Dice uses predicted probabilities
2.7 * BCE(pred_mask, gt_mask) + 0.9 * DICE(pred_mask, gt_mask) + 0.1 * BCE(pred_empty, gt_empty) which is a custom loss used by the Kaggler
nn.SmoothL1Loss() that creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise
Use of the Mean Squared Error objective function in scenarios where it seems to work better than binary-cross entropy objective function.

Training tips

Try different learning rates
Try different batch sizes
Use SDG with momentum with manual rate scheduling
Too much augmentation will reduce the accuracy
Train on image crops and predict on full images
Use of Keras’s ReduceLROnPlateau() to the learning rate
Train without augmentation until plateau then apply soft and hard augmentation to some epochs
Freeze all layers except the last one and use 1000 images from Stage1 for tuning
Make labels more balanced by developing a sampler
Use of class aware sampling
Use dropout and augmentation while tuning the last layer
Pseudo Labeling to improve score
Use Adam reducing LR on plateau with patience 2–4
Use Cyclic LR with SGD
Reduce the learning rate by a factor of two if validation loss does not improve for two consecutive epochs
Repeat the worst batch out of 10 batches
Train with default UNET
Overlap tiles so that each edge pixel is covered twice
Hyperparameter tuning: learning rate on training, non-maximum suppression and score threshold on inference
Remove low bounding box with low confidence score
Train different convolutional neural networks then build an ensemble
Stop training when the F1 score is decreasing
Differential learning rate with gradual reducing
Train ANNs [in a stacking way using 5 folds](https://www.kaggle.com/c/statoil-iceberg-classifier-challenge/discussion/48207 and 30 repeats
Track of your experiments using Neptune.

Evaluation and cross-validation

Split on non-uniform stratified by classes
Avoid overfitting by applying cross-validation while tuning the last layer
10-fold CV ensemble for classification
Combination of 5 10-fold CV ensembles for detection
Sklearn’s stratified K fold function
5 KFold Cross-Validation
Adversarial Validation & Weighting

Ensembling methods

Use simple majority voting for ensemble
XGBoost on the max malignancy at 3 zoom levels, the z-location and the amount of strange tissue
LightGBM for models with too many classes. This was done for raw data features only.
CatBoost for a second-layer model
Training with 7 features for the gradient boosting classifier
Use ‘curriculum learning’ to speed up model training. In this technique, models are first trained on simple samples then progressively moving to hard ones.
Ensemble with ResNet50, InceptionV3, and InceptionResNetV2
Ensemble method for object detection
An ensemble of Mask RCNN, YOLOv3, and Faster RCNN architectures n with a classification network — DenseNet-121 architecture

Post Processing

Apply test time augmentation — presenting an image to a model several times with different random transformations and average the predictions you get
Equalize test prediction probabilities instead of only using predicted classes
Apply geometric mean to the predictions
Overlap tiles during inferencing so that each edge pixel is covered at least thrice because UNET tends to have bad predictions around edge areas.
Non-maximum suppression and bounding box shrinkage
Watershed post processing to detach objects in instance segmentation problems.

Final Thoughts

Hopefully, this article gave you some background into image segmentation tips and tricks and given you some tools and frameworks that you can use to start competing.

We’ve covered tips on:

architectures
training tricks,
losses,
pre-processing,
post processing
ensembling
tools and frameworks. If you want to go deeper down the rabbit hole, simply follow the links and see how the best image segmentation models are built.

Happy segmenting!

DEV Community