[ML] Data should be checked with your eyes

#machinelearning #python #tutorial

Following the previous effort, I'm challenging the finished Kaggle image competition and making ahead the implementation. Overcoming recurring errors, I started the model training process by one epoch. My programme was being trained well, but after 50% of learning, the error occurred.

ValueError: Caught ValueError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/tmp/ipykernel_28/3482151344.py", line 46, in __getitem__
    labels = labels
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/composition.py", line 202, in __call__
    p.preprocess(data)
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/utils.py", line 83, in preprocess
    data[data_name] = self.check_and_convert(data[data_name], rows, cols, direction="to")
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/utils.py", line 91, in check_and_convert
    return self.convert_to_albumentations(data, rows, cols)
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/bbox_utils.py", line 124, in convert_to_albumentations
    return convert_bboxes_to_albumentations(data, self.params.format, rows, cols, check_validity=True)
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/bbox_utils.py", line 390, in convert_bboxes_to_albumentations
    return [convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity) for bbox in bboxes]
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/bbox_utils.py", line 390, in <listcomp>
    return [convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity) for bbox in bboxes]
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/bbox_utils.py", line 334, in convert_bbox_to_albumentations
    check_bbox(bbox)
  File "/opt/conda/lib/python3.7/site-packages/albumentations/core/bbox_utils.py", line 417, in check_bbox
    raise ValueError(f"Expected {name} for bbox {bbox} to be in the range [0.0, 1.0], got {value}.")
ValueError: Expected x_min for bbox (-0.0009398496240601503, 0.46129587155963303, 0.32471804511278196, 0.9730504587155964, array([1])) to be in the range [0.0, 1.0], got -0.0009398496240601503.

Good was that I had experienced and overcome a lot of errors. This time, I could guess which part of my code made this error. According to the message, it is because the element of the bounding box is out of the range [0.0, 1.0]. The range [0.0, 1.0] is probably relevant to scaling. Scaling is made when input images are resized. In other words, a bounding box is resized with the inappropriate ratio when the input image is resized.

However, the scaling ratio is showing negative. Why is it negative?

I wanted to check which data made this error but the message didn't tell it to me. So, I tried the code below.

dataset = CTDataset(train_root_path, train_image_list)
for i in range(len(train_image_list)):
    print(i)
    image, target = dataset.__getitem__(i)

This only prints indices and gets data from dataset. As the indices are printed before getting data, we know the index that makes the error when it occurs.

I ran the code.

And finally I found it.

Then I checked the data.

I see. This bounding box shows the area out of the image. The resizing method I'm using (Albumentations) only deals with the bounding box within the area, and it makes the error if the box goes out. In that case, we can avoid it by replacing negative coordinates by 0.

x = max(0, int(round(bounding_box['x'])))

After the correction, the programme is working well, and now it exceeds 50% of epoch 1, so it should be fine.

The learning point today is we should check the original data with our eyes before executing time-consuming programme. We should imagine what values the original data has and write a code so that it eliminates potentially error-making data or switch processes. Data should be checked with our eyes.

DEV Community

[ML] Data should be checked with your eyes

Top comments (0)

Read next

🚀 Amazon Nova: AWS's New Foundation Model for GenAI🤖

HubSpot offers a powerful platform for creating user interfaces (UI) and serverless functions using React, allowing users to develop highly customizable pages and forms directly within the CRM itself.

What is JSON Merge Patch?

Build Your YouTube Video Transcriber with Streamlit & Youtube API's 🚀