Steel
Steel is one of the most important building materials of modern times. Steel buildings are resistant to natural and man-made wear
which has made the material ubiquitous around the world. To help make production of steel more efficient, this case study will help
identify defects.
The production process of flat sheet steel is especially delicate. From heating and rolling, to drying and cutting, several machines touch flat steel by the time it’s ready to ship.
Severstal
Severstal is leading the charge in efficient steel mining and production. The company recently created the country’s largest
industrial data lake, with petabytes of data that were previously discarded.
Severstal is looking to machine learning to improve automation, increase efficiency, and maintain high quality in their production. Severstal uses images from high frequency cameras to power a defect detection algorithm.
Business Problem
In manufacturing industries, one of the main problems is detection of faulty production parts. Detected faulty parts are recycled.But in cases where it is not detected, it can lead to dangerous situations for the customer and reputation of the company.
The goal of the project is to help engineers improve the algorithm by localizing and classifying surface defects on a steel sheet. If successful, it will help keep manufacturing standards for steel high.
Dataset
The dataset is provided by one of the leading steel manufacturers in the world, Servastal. The dataset contains 3 features - ImageId, ClassId and Encoded Pixels.
EDA
ClassId consists of four unique classes, which are the 4 types of defects and maybe present individually or simultaneously in an
image.
There are 7095 observations in the train dataset. There are no missing values.
The segment for each defect class are encoded into a single row, even if there are several non-contiguous defect locations on an image. The segments are in the form of encoded pixels.
Check for imbalance
First, we need to determine whether the given data is balanced or not. Simple plot of Class Id vs counts is shown below.
Here we observe that the defect type 3 is more dominant than any other defect. Defect 2 is the least occurring defect. There is class imbalance.
Check for defect overlap
Now we will check whether the input image contains more than one defect simultaneously.
We can see that most observations have only one type of defect. Some have two defects simultaneously. There are no observations with three or more defects simultaneously.
Encoded pixels to masks
Encoded pixels is the information about the pixels that have defects. It contains the pattern of - the pixel index followed by count. This is interpreted as, the pixel index to the pixel index + count value are all the pixels with defects. This pattern is repeated until all pixels having defects are encoded.
The following function converts the encoded pixels to masks:
def masks(encoded_pixels):
counts=[]
mask=np.zeros((256*1600), dtype=np.int8)
pre_mask=np.asarray([int(point) for point in encoded_pixels.split()])
for index,count in enumerate(pre_mask):
if(index%2!=0):
counts.append(count)
i=0
for index,pixel in enumerate(pre_mask):
if(index%2==0):
if(i==len(counts)):
break
mask[pixel:pixel+counts[i]]=1
i+=1
mask=np.reshape(mask,(1600,256))
mask=cv2.resize(mask,(256,1600)).T
return mask
Plotting few datapoints
Here we will visualize how the input image and target masks look like. I have given each type of defect a different colour so that it is easier to differentiate between defects.
Defect type 1
Defect type 2
Defect type 3
Defect type 4
From above visualizations we can observe that defect type 4 is very distinct than any other defects. Defect type 1 is difficult to be seen. Defect type 2 and 3 look similar. Due the imbalance, type 2 defects maybe classified as type 3 by the model if data is not balanced.
Feature Engineering
The classes are heavily imbalanced. Also, the size of the dataset is small. Therefore, I augmented the images such that it compensates for class imbalance and also increases the amount of training data.
Class 3 was not augmented because it already has enough data. Class 1 and 2 were augmented such that their data increased by 5 times. Since Class 4 had the least amount of data available, I augmented it 8 times to compensate for imbalance.
In this step, I augmented both images and its masks, and then I saved the images. The masks values were converted into pixel encoding using the function below.
def rle(img):
pixels= img.T.flatten()
pixels = np.concatenate([[0], pixels, [0]])
runs = np.where(pixels[1:] != pixels[:-1])[0]
runs[1::2] -= runs[::2]
return ' '.join(str(x) for x in runs)
The total number of images were increased from 7000 to 13000.
Modelling
Performance Metric
I have chosen dice coefficient as the performance metric. It is because the result masks needs to have both good precision and a good recall.
Loss function
I have used Binary Crossentropy as loss function. It is used here because the insight of an element belonging to a certain class should not influence the decision for another class because some images might contain more than 1 class.
Residual Unet
ResUNet is a semantic segmentation model inspired by the deep residual learning and UNet. An architecture that take advantages from both, Residual and UNet models.
This combination bring us two benefits: 1) the residual unit will ease training of the network; 2) the skip connections within a residual unit and between low levels and high levels of the network will facilitate information propagation without degradation, making it possible to design a neural network with much fewer parameters however could achieve comparable ever better
performance on semantic segmentation.
Paper: https://arxiv.org/pdf/1711.10684.pdf
The network comprises of three parts: encoding, bridge and decoding.1 The first part encodes the input image into compact
representations. The last part recovers the representations to
a pixel-wise categorization, i.e. semantic segmentation. The
middle part serves like a bridge connecting the encoding and
decoding paths. All of the three parts are built with residual
units which consist of two 3 × 3 convolution blocks and an
identity mapping. Each convolution block includes a BN layer,
a ReLU activation layer and a convolutional layer. The identity
mapping connects input and output of the unit.
Result
The best loss was 0.0513 after 15 epochs and then it reached the minima.
Unet
The architecture contains two paths. First path is the contraction path (also called as the encoder) which is used to capture the context in the image. The encoder is just a traditional stack of convolutional and max pooling layers. The second path is the symmetric expanding path (also called as the decoder) which is used to enable precise localization using transposed convolutions.
Thus it is an end-to-end fully convolutional network (FCN), i.e. it only contains Convolutional layers and does not contain any Dense layer because of which it can accept image of any size.
Result
The validation loss reached minima at 0.0125 after 23 epochs. The real vs prediction masks are shown below.
The prediction masks are more precise than the manually labelled masks.
Scope for Improvement
A new model can be built called Hierarchical Multi-Scale Attention for Semantic Segmentation. This model might be able to give even better results than Unet.
paper: https://arxiv.org/abs/2005.10821
Top comments (1)
Steel defect detection is crucial for ensuring high-quality manufacturing. Integrating advanced inspection technologies into processes like Centerless Grinding Arlington heights can identify defects early. O'Hare Precision’s expertise in centerless grinding enhances precision and reduces defect rates, optimizing production efficiency.