DEV Community

Cover image for Steel Defect Detection
Vijeth-Rai
Vijeth-Rai

Posted on • Edited on

Steel Defect Detection

Steel

Steel is one of the most important building materials of modern times. Steel buildings are resistant to natural and man-made wear
which has made the material ubiquitous around the world. To help make production of steel more efficient, this case study will help
identify defects.

The production process of flat sheet steel is especially delicate. From heating and rolling, to drying and cutting, several machines touch flat steel by the time it’s ready to ship.

Severstal

Severstal is leading the charge in efficient steel mining and production. The company recently created the country’s largest
industrial data lake, with petabytes of data that were previously discarded.

Severstal is looking to machine learning to improve automation, increase efficiency, and maintain high quality in their production. Severstal uses images from high frequency cameras to power a defect detection algorithm.

Business Problem

In manufacturing industries, one of the main problems is detection of faulty production parts. Detected faulty parts are recycled.But in cases where it is not detected, it can lead to dangerous situations for the customer and reputation of the company.

The goal of the project is to help engineers improve the algorithm by localizing and classifying surface defects on a steel sheet. If successful, it will help keep manufacturing standards for steel high.

Dataset

The dataset is provided by one of the leading steel manufacturers in the world, Servastal. The dataset contains 3 features - ImageId, ClassId and Encoded Pixels.

EDA

ClassId consists of four unique classes, which are the 4 types of defects and maybe present individually or simultaneously in an
image.
There are 7095 observations in the train dataset. There are no missing values.
The segment for each defect class are encoded into a single row, even if there are several non-contiguous defect locations on an image. The segments are in the form of encoded pixels.

Check for imbalance

First, we need to determine whether the given data is balanced or not. Simple plot of Class Id vs counts is shown below.

Alt Text

Here we observe that the defect type 3 is more dominant than any other defect. Defect 2 is the least occurring defect. There is class imbalance.

Check for defect overlap

Now we will check whether the input image contains more than one defect simultaneously.

Alt Text

We can see that most observations have only one type of defect. Some have two defects simultaneously. There are no observations with three or more defects simultaneously.

Encoded pixels to masks

Encoded pixels is the information about the pixels that have defects. It contains the pattern of - the pixel index followed by count. This is interpreted as, the pixel index to the pixel index + count value are all the pixels with defects. This pattern is repeated until all pixels having defects are encoded.

The following function converts the encoded pixels to masks:

def masks(encoded_pixels):
   counts=[]
   mask=np.zeros((256*1600), dtype=np.int8)
   pre_mask=np.asarray([int(point) for point in encoded_pixels.split()])
   for index,count in enumerate(pre_mask):
      if(index%2!=0):
         counts.append(count)
   i=0
   for index,pixel in enumerate(pre_mask):
      if(index%2==0):
         if(i==len(counts)):
            break
         mask[pixel:pixel+counts[i]]=1
         i+=1
   mask=np.reshape(mask,(1600,256))
   mask=cv2.resize(mask,(256,1600)).T
   return mask
Enter fullscreen mode Exit fullscreen mode

Plotting few datapoints

Here we will visualize how the input image and target masks look like. I have given each type of defect a different colour so that it is easier to differentiate between defects.

Alt Text

Defect type 1

Alt Text
Alt Text
Alt Text

Defect type 2

Alt Text
Alt Text
Alt Text

Defect type 3

Alt Text
Alt Text
Alt Text

Defect type 4

Alt Text
Alt Text
Alt Text

Two defects simultaneously
Alt Text
Alt Text
Alt Text

From above visualizations we can observe that defect type 4 is very distinct than any other defects. Defect type 1 is difficult to be seen. Defect type 2 and 3 look similar. Due the imbalance, type 2 defects maybe classified as type 3 by the model if data is not balanced.

Feature Engineering

The classes are heavily imbalanced. Also, the size of the dataset is small. Therefore, I augmented the images such that it compensates for class imbalance and also increases the amount of training data.

Class 3 was not augmented because it already has enough data. Class 1 and 2 were augmented such that their data increased by 5 times. Since Class 4 had the least amount of data available, I augmented it 8 times to compensate for imbalance.

In this step, I augmented both images and its masks, and then I saved the images. The masks values were converted into pixel encoding using the function below.

def rle(img):
    pixels= img.T.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] 
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)
Enter fullscreen mode Exit fullscreen mode

The total number of images were increased from 7000 to 13000.

Modelling

Performance Metric

I have chosen dice coefficient as the performance metric. It is because the result masks needs to have both good precision and a good recall.

Loss function

I have used Binary Crossentropy as loss function. It is used here because the insight of an element belonging to a certain class should not influence the decision for another class because some images might contain more than 1 class.

Residual Unet

ResUNet is a semantic segmentation model inspired by the deep residual learning and UNet. An architecture that take advantages from both, Residual and UNet models.

This combination bring us two benefits: 1) the residual unit will ease training of the network; 2) the skip connections within a residual unit and between low levels and high levels of the network will facilitate information propagation without degradation, making it possible to design a neural network with much fewer parameters however could achieve comparable ever better
performance on semantic segmentation.

Paper: https://arxiv.org/pdf/1711.10684.pdf

Architecture
Alt Text

The network comprises of three parts: encoding, bridge and decoding.1 The first part encodes the input image into compact
representations. The last part recovers the representations to
a pixel-wise categorization, i.e. semantic segmentation. The
middle part serves like a bridge connecting the encoding and
decoding paths. All of the three parts are built with residual
units which consist of two 3 × 3 convolution blocks and an
identity mapping. Each convolution block includes a BN layer,
a ReLU activation layer and a convolutional layer. The identity
mapping connects input and output of the unit.

Result
  
The best loss was 0.0513 after 15 epochs and then it reached the minima.

Unet

The architecture contains two paths. First path is the contraction path (also called as the encoder) which is used to capture the context in the image. The encoder is just a traditional stack of convolutional and max pooling layers. The second path is the symmetric expanding path (also called as the decoder) which is used to enable precise localization using transposed convolutions.

Alt Text

Thus it is an end-to-end fully convolutional network (FCN), i.e. it only contains Convolutional layers and does not contain any Dense layer because of which it can accept image of any size.

Result
The validation loss reached minima at 0.0125 after 23 epochs. The real vs prediction masks are shown below.

Alt Text

The prediction masks are more precise than the manually labelled masks.

Scope for Improvement

A new model can be built called Hierarchical Multi-Scale Attention for Semantic Segmentation. This model might be able to give even better results than Unet.
paper: https://arxiv.org/abs/2005.10821

Top comments (0)