DEV Community: nekot0

[ML] Data should be checked with your eyes

nekot0 — Thu, 06 Apr 2023 22:26:32 +0000

Following the previous effort, I'm challenging the finished Kaggle image competition and making ahead the implementation. Overcoming recurring errors, I started the model training process by one epoch. My programme was being trained well, but after 50% of learning, the error occurred.

ValueError: Caught ValueError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/tmp/ipykernel_28/3482151344.py", line 46, in __getitem__
    labels = labels
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/composition.py", line 202, in __call__
    p.preprocess(data)
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/utils.py", line 83, in preprocess
    data[data_name] = self.check_and_convert(data[data_name], rows, cols, direction="to")
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/utils.py", line 91, in check_and_convert
    return self.convert_to_albumentations(data, rows, cols)
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/bbox_utils.py", line 124, in convert_to_albumentations
    return convert_bboxes_to_albumentations(data, self.params.format, rows, cols, check_validity=True)
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/bbox_utils.py", line 390, in convert_bboxes_to_albumentations
    return [convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity) for bbox in bboxes]
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/bbox_utils.py", line 390, in <listcomp>
    return [convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity) for bbox in bboxes]
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/bbox_utils.py", line 334, in convert_bbox_to_albumentations
    check_bbox(bbox)
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/bbox_utils.py", line 417, in check_bbox
    raise ValueError(f"Expected {name} for bbox {bbox} to be in the range [0.0, 1.0], got {value}.")
ValueError: Expected x_min for bbox (-0.0009398496240601503, 0.46129587155963303, 0.32471804511278196, 0.9730504587155964, array([1])) to be in the range [0.0, 1.0], got -0.0009398496240601503.

Good was that I had experienced and overcome a lot of errors. This time, I could guess which part of my code made this error. According to the message, it is because the element of the bounding box is out of the range [0.0, 1.0]. The range [0.0, 1.0] is probably relevant to scaling. Scaling is made when input images are resized. In other words, a bounding box is resized with the inappropriate ratio when the input image is resized.

However, the scaling ratio is showing negative. Why is it negative?

I wanted to check which data made this error but the message didn't tell it to me. So, I tried the code below.

dataset = CTDataset(train_root_path, train_image_list)
for i in range(len(train_image_list)):
    print(i)
    image, target = dataset.__getitem__(i)

This only prints indices and gets data from dataset. As the indices are printed before getting data, we know the index that makes the error when it occurs.

I ran the code.

And finally I found it.

Then I checked the data.

I see. This bounding box shows the area out of the image. The resizing method I'm using (Albumentations) only deals with the bounding box within the area, and it makes the error if the box goes out. In that case, we can avoid it by replacing negative coordinates by 0.

x = max(0, int(round(bounding_box['x'])))

After the correction, the programme is working well, and now it exceeds 50% of epoch 1, so it should be fine.

The learning point today is we should check the original data with our eyes before executing time-consuming programme. We should imagine what values the original data has and write a code so that it eliminates potentially error-making data or switch processes. Data should be checked with our eyes.

Data Connection between Kaggle and Google Colab

nekot0 — Wed, 05 Apr 2023 16:01:11 +0000

Writing a code, I sometimes met errors that seem to occur in the library I had imported. This is not always because the library has incorrect coding, but instead in most cases because the code I'm writing is incorrect. However, it is useful to insert comment out between codes in the library when looking for the root cause of the errors. In that case, Kaggle notebook is not a convenient tool.

If we edit codes in libraries when debugging, local machine is the easiest environment, but Google Colab is also useful because it has a similar environment to Kaggle, such as GPU resources.

To do so, we need to move data from one to another. Today I take a note of the method.

Kaggle -> Google Colab

To access Kaggle data, we first need API token, which is loaded on Colab. Then, we can download the data from Kaggle to Colab using command line.

Firstly, we first move to our account page, push the 'Create New API Token', and download the token 'kaggle.json'.

Next, we upload the token on google drive. The example below is the case when we save the token under 'My Drive'.

We move to Google Colab notebook and mount Google Drive on Colab. We mount My Drive on '/content/drive/MyDrive' in the example below.

from google.colab import drive
drive.mount('/content/drive')

And then do this command.

import os
import json
f = open("/content/drive/MyDrive/kaggle.json")
json_data = json.load(f)
os.environ['KAGGLE_USERNAME'] = json_data['username']
os.environ['KAGGLE_KEY'] = json_data['key']

Then, we have access Kaggle by installing the package for Kaggle.

!pip install kaggle

The way to download data is as below. We first move to the data page and there is API command written below the page.

Return to Colab notebook and paste the command.

!kaggle competitions download -c siim-covid19-detection

I stopped downloading because the data has the size over 80G.

The amount of data in competitions is extremely large. Therefore, I often use the entire data on Kaggle, while on Colab I test my code with a small subset of data or test modified packages made by other participants.

Google Colab -> Kaggle

Unfortunately, I don't know the way to move Colab data to Kaggle directly as far as I know. Therefore, when I need the data I made on Colab, I once download the data and then upload it on Kaggle as Dataset.

Kaggle recommends to upload data in compressed form, and uploaded data is, if recognised as compressed files, automatically extracted.

Errors in the implementation of model training with effdet

nekot0 — Wed, 05 Apr 2023 08:44:24 +0000

In the previous post, I succeeded to implement the effdet model training in the simple setting and somewhat understood the code. Now, I apply this understanding to the main problem.

My entire code still does not work but I'd like to take a note of errors I met so far and the remedies.

Warning in transform from numpy.ndarray to torch.tensor

This is not critical but the warning occurs when I tried to convert numpy.ndarray to torch.tensor directly. According to the error message, the conversion is extremely slow because it is made by each element of ndarray. This time, the warning came because I tried to convert a list of numpy.array to torch.tensor. The better description is to make torch.tensor after making the entire list numpy.array.

# warning
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:47: 
UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. 
(Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)

Processes required when an input image has multiple bounding boxes

The dataset in this competition has images, each of which has multiple bounding boxes. Also, the number of bounding boxes varies depending on the image.

If each image has one bounding box, it is stored in the form (A), but if multiple bounding boxes, in the form (B).

(A)
    [x0, y0, x1, y1]
        # x0: x-coordinate of top left
        # y0: y-coordinate of top left
        # x1: x-coordinate of bottom right
        # y1: y-coordinate of bottom right

(B)
    [[x0, y0, x1, y1],   # coordinates of bounding box 1
     [x0, y0, x1, y1],   # coordinates of bounding box 2
     ...
     [x0, y0, x1, y1]]   # coordinates of bounding box n

If images have multiple bounding boxes and the number of boxes varies, the tensor sizes are also different from an image to the others.

Therefore, the error below occurs when DataLoader receives different sizes of tensors.

# DataLoader instantiation
dataset = MyDataset(args)
dataloader = DataLoader(dataset, batch_size=4, num_workers=4)

# error message
RuntimeError: stack expects each tensor to be equal size, but got [2, 4] at entry 0 and [1, 4] at entry 1

This error can be avoided by defining 'collate_fn'. collate_fn defines the processes of combining a group of data and making it a batch. I pad the smaller data with 0 when the bounding box is smaller than the others, and made the sizes of all the bounding boxes in batch the same. The below code is an example.

import torch.nn.functional as F
from torch.utils.data import default_collate

def pad_collate_fn(batch):
    # Check the maximum number of bounding boxes in a batch
    shapes = [item[1]['bbox'].shape[0] for item in batch]
    max_shape = max(shapes)

    padded_batch = []
    for x, y in batch:

        # Remove the data with no bounding boxes
        if any(elem == 0 for elem in y['cls']):
            continue

        # Pad with 0 if the box size is smaller than the maximum
        pad_size = max_shape - y['bbox'].shape[0]
        bbox_padding = [0, 0, 0, pad_size]
        cls_padding = [0, 0, 0, pad_size]
        padded_y = {
            'bbox': F.pad(y['bbox'], bbox_padding, mode='constant', value=0),
            'cls': F.pad(y['cls'].reshape((y['cls'].shape[0],1)), cls_padding, mode='constant', value=0)
        }
        padded_batch.append((x, padded_y))

    # Apply the default batch process before return
    return default_collate(padded_batch)


# Instantiation of dataset and dataloader
dataset = MyDataset(args)
dataloader = DataLoader(
    dataset, batch_size=4, num_workers=4, 
    collate_fn=pad_collate_fn   # Pass the collate_fn as defined above
)

Channels and Data structure of input images

Colour images usually have three channels, while dicom images often used in the medical context sometimes have only one channel. As effdet requires three channels, we need to extend the channel from one to three. I made it using opencv.

import pydicom
import cv2

dcm = pydicom.dcmread(dcm_path)
image = dcm.pixel_array.astype("float32")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

We have to be careful for how the image data structure. In this method, the image channel-extended is stored in the form (Width, Height, Channel). This form works, for example, when the image is transformed using Albumentations. However, if we read the image using opencv, the image will be stored in the different form (Height, Width, Channel). The different form can make unexpected errors.

(5 April edit)

The result image obtained from the above method is in the form (Height, Width, Channel). The training worked with this and input bounding boxes of the form (x0, y0, x1, y1).

anchors.py adjustment

After dealing with the errors above, I met another when training the model. The error message says the mask size does not match with something, but I was not sure what it says.

# error message
---------------------------------------------------------------------------
IndexErrorTraceback (most recent call last)
/tmp/ipykernel_27/103307951.pyin <module>
     28forinputs,targets int:
     29optimizer.zero_grad()
---> 30         losses =bench(inputs,targets)
     31loss =losses['loss']
     32loss.backward()

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.pyin _call_impl(self, *input, **kwargs)
   1188if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190             returnforward_call(*input,**kwargs)   1191# Do not call functions when jit is used
   1192full_backward_hooks,non_full_backward_hooks =[],[]

/kaggle/input/effdet-030-package-dataset/packages/effdet/bench.pyin forward(self, x, target)
    140else:    141cls_targets, box_targets, num_positives = self.anchor_labeler.batch_label_anchors(
--> 142                 target['bbox'], target['cls'])
    143
    144loss,class_loss,box_loss =self.loss_fn(class_out,box_out,cls_targets,box_targets,num_positives)

/kaggle/input/effdet-030-package-dataset/packages/effdet/anchors.pyin batch_label_anchors(self, gt_boxes, gt_classes, filter_valid)
    376iffilter_valid:
    377valid_idx =gt_classes[i]>-1# filter gt targets w/ label <= -1
--> 378                 gt_box_list =BoxList(gt_boxes[i][valid_idx])
    379gt_class_i =gt_classes[i][valid_idx]
    380else:

IndexError: The shape of the mask [2, 1] at index 1 does not match the shape of the indexed tensor [2, 4] at index 1

According to the message, something happened in 'anchors.py'. Looking for the cause by printing out the parameters in the file, I found the process that removes bounding boxes with negative labels. The error came because this process did not apply well to the data. So, I edited this process and the error is resolved.

# Line 378 in anchors.py
gt_box_list = BoxList(gt_boxes[i][valid_idx])

# gt_boxes is a bundle of bounding boxes in the batch
# For example,
# gt_boxes:
#    tensor([[[17.1509, 58.3014, 51.9944, 78.0274],
#             [ 6.3188, 22.3562, 27.2609, 40.9863],
#             [38.4542, 19.5068, 53.6192, 37.4795]],
#            [[ 0.0000, 11.3287, 32.6629, 27.3655],
#             [ 0.0000,  0.0000,  0.0000,  0.0000],
#             [ 0.0000,  0.0000,  0.0000,  0.0000]], 
#            [[25.7564, 29.2258, 51.8807, 42.6300],
#             [34.2192, 82.6232, 57.5839, 96.0275],
#             [ 0.0000,  0.0000,  0.0000,  0.0000]]])
# In this case, gt_boxes[2] has the bounding box data of the 2nd image
# gt_boxes[2]:
#    tensor([[25.7564, 29.2258, 51.8807, 42.6300],
#            [34.2192, 82.6232, 57.5839, 96.0275],
#            [ 0.0000,  0.0000,  0.0000,  0.0000]]]
#
# valid_idx is a mask for each gt_boxes[i] that indicates which bounding boxes are effective. 
# In the above example for gt_boxes[2], the classes of 1st and 2nd lines are 1 and the third have class 0, which is removed from the bundle. 
# valid_idx: tensor([[True], [True], [False]])

# Re-write the line
gt_boxes_output = []
    for j in range(valid_idx.shape[0]):
        if valid_idx[j]: 
            gt_boxes_output.append(np.array(gt_boxes[i][j]))
gt_box_list = torch.FloatTensor(np.array(gt_boxes_output))
gt_box_list = BoxList(gt_box_list)

My entire programme still does not work, and I have to struggle with errors for a while...

EfficientDet Implementation for Object Detection

nekot0 — Mon, 03 Apr 2023 00:06:10 +0000

I have been interested in Machine Learning but left it untouched for years. I finally decided to start training myself so that I get insight into data usage and the capability of coding by myself. I found an interesting image competition and started with it. The competition had already finished but the data is still available and I can still submit my prediction and get the score.

The tasks are straightforward, including object detection and classification. I found EfficientDet as a useful model these days that manages both of these tasks, and decided to develop a model with it. However, the implementation was extremely hard. Some error messages require me to edit the packages imported, which I couldn't manage in Kaggle notebook. Therefore, I re-started the implementation of EfficientDet with a simple data.

A useful example I found is the blog written in Japanese in Oct 2021. The setting is to detect a red circle on a black square background. The source code on the blog worked in most parts, but I met some errors when I tested it in April 2023. The below is a note of the errors and remedies, and the accuracy of result.

Errors and remedies

view size is not compatible with input tensor's size and stride

We train the model with images and bounding boxes input like below:

# Training loop
for epoch in range(1, args.epoch+1):
  ...
  for (inputs, targets) in t:
    ...
    losses = bench(inputs, targets)
    ...

The error below occurred:

# Error message
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-9f3114f08672> in <cell line: 19>()
     33     targets['cls'] = targets['cls']
     34     optimizer.zero_grad()
---> 35     losses = bench(inputs, targets)
     36     loss = losses['loss']
     37     loss.backward()

/usr/local/lib/python3.9/dist-packages/effdet/anchors.py in batch_label_anchors(self, gt_boxes, gt_classes, filter_valid)
    396                     cls_targets[count:count + steps].view([feat_size[0], feat_size[1], -1]))
    397                 box_targets_out[level_idx].append(
--> 398                     box_targets[count:count + steps].view([feat_size[0], feat_size[1], -1]))
    399                 count += steps
    400                 if last_sample:

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

'view' function is called when the model inputs images and bounding boxes to reshape the bounding boxes. So, the error suggests to use 'reshape' function instead of 'view' function. I edited 'anchors.py' in the effdet package like below:

> Line 398 Before correction
  #box_targets[count:count + steps].view([feat_size[0], feat_size[1], -1]))
> After correction
  box_targets[count:count + steps].reshape([feat_size[0], feat_size[1], -1]))

Labels vanish when the bounding box goes out of boundaries

The dataset makes augmentation processes before outputting the data. These processes include in the first part randomly cropping the input image, which sometimes delete the information of bounding boxes and labels when the bounding boxes are cropped out from the original image. The sample code defines the process if this case happens, but it only defines the new bounding box and doesn't define the new labels, which causes the error.

class CircleDataset(Dataset):
  ...

  def __getitem__(self, idx):
    ...

    if bboxes.shape[0] == 0:
      bboxes = torch.zeros([1, 4], dtype=bboxes.dtype)

    ...
    return x, y

  ...

# Error message
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-10-9f3114f08672> in <cell line: 19>()
     28   t = tqdm(loader, leave=False)
     29 
---> 30   for inputs, targets in t:
     31     inputs = inputs
     32     targets['bbox'] = targets['bbox']
/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/collate.py in collate_tensor_fn(batch, collate_fn_map)
    161         storage = elem.storage()._new_shared(numel, device=elem.device)
    162         out = elem.new(storage).resize_(len(batch), *list(elem.size()))
--> 163     return torch.stack(batch, 0, out=out)
    164 
    165 
RuntimeError: stack expects each tensor to be equal size, but got [0] at entry 0 and [1] at entry 1

To avoid this, a new label as well as a new bounding box needs to be re-defined when they vanish.

# After correction
if bboxes.shape[0] == 0:
    bboxes = torch.zeros([1, 4], dtype=bboxes.dtype)
    labels = torch.FloatTensor(np.array([0])) # Added

Accuracy

I obtained a prediction, taking one image randomly from the training set and inputting it into the trained model.

Prediction uses DetBenchPredict within the effdet package. The original data size is (3, 512, 512) while DetBenchPredict takes batch as its input. So, I added a dimension using 'unsqueeze' function.

DetBenchPredict outputs (N, 6) tensor. N is the number of bounding boxes predicted, and the meaning of each of the six elements is:

x-coordinate of bounding box top left
y-coordinate of bounding box top left
x-coordinate of bounding box bottom right
y-coordinate of bounding box bottom right
probability that the image is classified correctly
classification

The code is as below. Bounding boxes are drawn if the probability is over 50%.

image, targets = dataset.__getitem__(0)
image = image.unsqueeze(0)

bench = DetBenchPredict(model)
with torch.no_grad():
  output = bench(image)

# Draw the predictions with over 50% probability
fig, ax = pp.subplots()
ax.imshow(image[0,:,:])

for i in range(output.shape[1]):
  if output[0, i, 4] > 0.5:
    x1 = int(output[0, i, 0])
    y1 = int(output[0, i, 1])
    width = int(output[0, i, 2] - output[0, i, 0])
    height = int(output[0, i, 3] - output[0, i, 1])
    rect = patches.Rectangle((x1, y1), width, height, edgecolor='r', facecolor='none')
    ax.add_patch(rect)
    print(output[0,i,:])

pp.show()

The accuracy after 1 epoch is like this:

(output[0, i, :)
tensor([ 14.0453, 114.7553,  26.5884, 158.7972,   0.6781,   1.0000])
tensor([144.7045, 129.4016, 182.4770, 259.8239,   0.6156,   1.0000])
tensor([ -0.6067, 162.9664,  68.7289, 175.3027,   0.5549,   1.0000])
tensor([ -4.6260,   7.1583, 156.3810, 120.1586,   0.5246,   1.0000])
tensor([ 29.6035,  88.9964,  99.8469, 168.4458,   0.5069,   1.0000])
tensor([182.1268, 257.2897, 182.7585, 465.5251,   0.5004,   1.0000])

The accuracy after 10 epoch is like this: