DEV Community

Cover image for From Segmentation Masks to YOLO Labels: My Dataset Prep Pipeline
zkaria gamal
zkaria gamal

Posted on

From Segmentation Masks to YOLO Labels: My Dataset Prep Pipeline

I just finished a small but useful pipeline for skin lesion dataset preparation and annotation validation.

š—§š—µš—² š—½š—æš—¼š—·š—²š—°š˜ š—µš—®š—»š—±š—¹š—²š˜€ š˜š˜„š—¼ š˜„š—¼š—æš—øš—³š—¹š—¼š˜„š˜€:
  • Converting binary segmentation masks into YOLO labels
  • Converting YOLO labels back into masks for validation and visualization

It was built around ISIC-style skin lesion data with 7 classes:
AKIEC, BCC, BKL, DF, MEL, NV, and VASC.

š—Ŗš—µš—®š˜ š—œ š—¹š—²š—®š—æš—»š—²š—± š—³š—æš—¼š—ŗ š˜š—µš—¶š˜€ š—½š—æš—¼š—·š—²š—°š˜:
  • Clean annotation pipelines save a lot of debugging time
  • A quick visual validation step catches label issues early
  • Even simple format conversions can reveal bad labels or inconsistent data

This project helped me better understand the full path from segmentation masks to training-ready YOLO annotations.

In the next phase, I plan to turn it into a more reusable Python package with a cleaner structure, better error handling, and a more maintainable workflow so it can be easier to use and adapt for future datasets.

If you work with medical imaging or dataset preparation, I’d love to hear how you validate your labels before training.
project repo

MachineLearning #ComputerVision #YOLO #DeepLearning #MedicalImaging #DataAnnotation #ISIC #Python #OpenCV

Top comments (0)