DEV Community

Cover image for Labellerr SDK & YOLOv8: Cars and Number Plate Detection — Practical, Step-by-Step
Tushar Singh
Tushar Singh

Posted on

Labellerr SDK & YOLOv8: Cars and Number Plate Detection — Practical, Step-by-Step

Project Overview

In this project, we will build a complete end-to-end pipeline for detecting cars and their number plates using a custom-trained object detection model. The workflow is structured and beginner-friendly: we will first label images in Labellerr, then export those annotations, convert them into the YOLOv8 compatible format, and finally train and evaluate the model to see real detection results.

What is Labellerr?

Labellerr is a smart data annotation platform designed specifically for computer vision workflows.
It helps teams create high-quality labeled datasets efficiently, without the usual hassle of managing annotation tasks manually.

Key advantages of using Labellerr:

  • Intuitive bounding box annotation UI — ideal for object detection tasks
  • Built-in quality review workflows — ensures consistency across annotations
  • Direct export support for YOLO formats — no manual formatting or conversion required

In short, Labellerr saves both time and effort while maintaining annotation quality, which is crucial for achieving good model performance.

For more details and examples, refer to the official documentation:
https://docs.labellerr.com/

What is YOLOv8?

YOLOv8(by Ultralytics) is one of the most widely used state-of-the-art object detection models.
It is known for delivering high accuracy, while still being fast enough for real-time detection.

Why YOLOv8 works well here:

  • Fast inference speed even on modest hardware
  • Straightforward training workflow
  • Excellent accuracy for object localization

This makes it a strong choice for real-world applications such as vehicle surveillance, traffic monitoring, and automated license plate recognition systems.

Why Use Labellerr and YOLOv8 Together?

Both tools complement each other perfectly:

Tool Role in the Pipeline Benefit
Labellerr Label images clearly and consistently High-quality dataset preparation
YOLOv8 Train and evaluate the detection model Strong real-world performance with minimal setup

The workflow becomes:

Label → Export → Train → Detect,
without any messy intermediate steps or custom conversion scripts.

This results in a clean, reliable, and efficient pipeline from annotation to model deployment.


Project Setup

Step 1:

In this step, we will install all the required libraries including Labellerr SDK and YOLOv8.

Install all dependencies in one go.

!python -m pip install --upgrade pip
!python -m pip install --quiet https://github.com/tensormatics/SDKPython/releases/download/prod/labellerr_sdk-1.0.0.tar.gz kaggle Pillow requests python-dotenv opencv-python numpy scikit-learn
Enter fullscreen mode Exit fullscreen mode

Step 2: Setting Up Kaggle Credentials

Since we are downloading the dataset from Kaggle, we first need to configure our Kaggle API credentials inside Google Colab.
This allows us to download datasets directly using commands like kaggle datasets download.

Steps:

  • Go to your Kaggle account settings
  • Scroll to API section
  • Click on Create New API Token
  • A file named kaggle.json will be downloaded
  • We will programmatically place this file in the correct location
import os, json, shutil
from pathlib import Path
from getpass import getpass
from IPython.display import display, Markdown

# Get Kaggle credentials from user input
KAGGLE_USERNAME = input("Enter your Kaggle username: ")
KAGGLE_KEY = getpass("Enter your Kaggle API key: ")

# Create .kaggle directory in user's home folder
kaggle_dir = Path.home() / ".kaggle"
kaggle_dir.mkdir(exist_ok=True)

# Write credentials to kaggle.json file
with open(kaggle_dir / "kaggle.json", "w") as f:
    json.dump({"username": KAGGLE_USERNAME, "key": KAGGLE_KEY}, f)

# Set proper permissions (600 = read/write for owner only)
os.chmod(kaggle_dir / "kaggle.json", 0o600)

display(Markdown("Kaggle credentials configured at `~/.kaggle/kaggle.json`"))
display(Markdown("Credentials securely stored with proper file permissions"))
Enter fullscreen mode Exit fullscreen mode

What This Code Does:

Part / Line Purpose
input() / getpass() Securely capture username & API key (key not echoed)
~/.kaggle/ directory Required location for Kaggle CLI credentials
kaggle.json write Stores {username, key} for authentication
os.chmod(..., 0o600) Restricts file permissions to owner only
display(Markdown(...)) Shows friendly confirmation messages

Step 3: Downloading the Dataset from Kaggle

Now that our Kaggle credentials are configured, we can download the dataset directly into Google Colab using the Kaggle CLI.
This avoids manual upload and ensures fast + clean workflow.

Dataset Used:

andrewmvd/car-plate-detection

from pathlib import Path
from IPython.display import display, Markdown

# Download dataset via Kaggle CLI
DATASET = "andrewmvd/car-plate-detection"
DATA_DIR = Path("datasets/car")
DATA_DIR.mkdir(parents=True, exist_ok=True)

!kaggle datasets download -d {DATASET} -p {str(DATA_DIR)} --unzip

display(Markdown(f" Downloaded dataset `{DATASET}` to `{DATA_DIR}`"))

Enter fullscreen mode Exit fullscreen mode

What This Code Does

Line / Component Purpose
DATASET Specifies the dataset to download from Kaggle
Path("datasets/car") Sets the directory where dataset will be stored
mkdir(parents=True, exist_ok=True) Creates folder if it doesn’t already exist
kaggle datasets download Fetches dataset using Kaggle CLI
--unzip Automatically extracts the files after download
display(Markdown(...)) Outputs a friendly completion message

Step 4: Preparing Sample Images for Inspection

Before labeling and training, it's always a good idea to preview a few images from the dataset.
This helps us verify:

  • The dataset downloaded correctly
  • The images are relevant
  • Their quality is suitable for training

Here, we will randomly pick 10 sample images and store them in a separate folder for quick review.

from pathlib import Path
from IPython.display import display, Markdown
import random
import shutil

# Prepare 10 sample images (ripe/unripe bananas subset if present; otherwise any images)
SEARCH_DIRS = [
    DATA_DIR,
]
IMAGE_EXTS = {".jpg", ".jpeg", ".png", ".bmp", ".tiff"}

all_images = []
for base in SEARCH_DIRS:
    for p in base.rglob("*"):
        if p.is_file() and p.suffix.lower() in IMAGE_EXTS:
            all_images.append(p)

random.shuffle(all_images)

SAMPLE_DIR = Path("sample_images")
SAMPLE_DIR.mkdir(exist_ok=True)

selected = all_images[:10]
for i, src in enumerate(selected, start=1):
    dst = SAMPLE_DIR / f"car_{i:02d}{src.suffix.lower()}"
    shutil.copy2(src, dst)

display(Markdown(f" Prepared {len(selected)} images in `{SAMPLE_DIR}`"))
Enter fullscreen mode Exit fullscreen mode

What This Code Does

Part / Line Purpose
DATA_DIR Directory containing the downloaded dataset
IMAGE_EXTS Defines valid image file extensions
rglob("*") Recursively searches all folders for images
random.shuffle() Randomizes image selection
SAMPLE_DIR.mkdir() Creates a folder named sample_images
copy2() Copies selected sample images to that folder
display(Markdown(...)) Prints a neat confirmation message

Step 5: Connecting to Labellerr (Authentication)

Now we'll connect to the Labellerr platform. This cell will prompt you for your Labellerr credentials to initialize the API client.

  • Client ID: Your workspace-specific ID.
  • Email: The email associated with your Labellerr account.
  • API Key & Secret: Found in your Labellerr account settings. These will be entered securely with hidden input.

** How to Get Your Client ID**
The method depends on your Labellerr plan:

  • If you're using a Pro or Enterprise plan, you can simply contact Labellerr Support and they will share your Client ID.
  • If you're on the Free plan, you can request your Client ID by sending a short message to: support@tensormatics.com (mention the email you used to sign up on Labellerr — that’s it.)
from getpass import getpass
from labellerr.client import LabellerrClient
from labellerr.exceptions import LabellerrError
from IPython.display import display, Markdown

# --- Interactive Input for Labellerr Credentials ---
print("Please enter your Labellerr API credentials.")
LABELLERR_CLIENT_ID = input("Labellerr Client ID: ")
LABELLERR_EMAIL = input("Labellerr Email: ")
LABELLERR_API_KEY = getpass("Labellerr API Key (input will be hidden): ")
LABELLERR_API_SECRET = getpass("Labellerr API Secret (input will be hidden): ")

# --- Initialize Labellerr Client ---
try:
    if not all([LABELLERR_API_KEY, LABELLERR_API_SECRET, LABELLERR_CLIENT_ID, LABELLERR_EMAIL]):
        raise ValueError("One or more required fields were left empty.")

    client = LabellerrClient(LABELLERR_API_KEY, LABELLERR_API_SECRET)

    display(Markdown(" Labellerr client initialized successfully!"))

except (LabellerrError, ValueError) as e:
    display(Markdown(f" **Client Initialization Failed:** {e}"))
    client = None

Enter fullscreen mode Exit fullscreen mode

What This Code Does

Line / Component Purpose
getpass() Safely hides API key input while typing
LabellerrClient() Creates authenticated Labellerr session
ValueError check Ensures no fields were left empty
try / except Catches invalid credentials and prevents crashes
display(Markdown(...)) Prints clean success/failure messages

Step 6: Creating Annotation Schema in Labellerr

Before labeling, we need to define what objects we want to annotate (Car & License Plate).
This step tells Labellerr which classes to display in the labeling UI.

ANNOTATION_QUESTIONS = [
    {
        "question_number": 1,
        "question": "Car",
        "question_id": "car-bbox-001",
        "option_type": "BoundingBox",
        "required": False,
        "options": [
            {"option_id": "opt-001", "option_name": "#FF0000"}
        ],
        "question_metadata": []
    },
    {
        "question_number": 2,
        "question": "License Plate",
        "question_id": "plate-bbox-002",
        "option_type": "BoundingBox",
        "required": False,
        "options": [
            {"option_id": "opt-002", "option_name": "#00FF00"}
        ],
        "question_metadata": []
    }
]




PROJECT_NAME = "Vehicle Object Detection new"
DATASET_NAME = "vehicle_dataset_sample_new"
DATASET_DESCRIPTION = "10 vehicle images including cars, plates"
DATA_TYPE = "image"
Enter fullscreen mode Exit fullscreen mode

What This Code Defines

Field / Section Meaning / Purpose
ANNOTATION_QUESTIONS List of labeling tasks the annotator will perform
question Name shown in labeling UI (Car, License Plate)
option_type = "BoundingBox" Means user will draw bounding boxes
option_name = "#FF0000" / "#00FF00" Box color to visually differentiate classes
PROJECT_NAME Name of the project inside Labellerr
DATASET_NAME Dataset title where labeled images will be stored
DATASET_DESCRIPTION Short description for dataset organization
DATA_TYPE = "image" Specifies dataset type — important for correct processing

Step 7: Create Annotation Template in Labellerr

We’ll create an annotation guideline template (schema) in Labellerr using our questions (Car, License Plate).
Different SDK versions may return either a UUID string or a JSON dict, so we’ll handle both safely.

import json
from IPython.display import display, Markdown
from labellerr.exceptions import LabellerrError

template_id = None
try:
    res_str = client.create_annotation_guideline(
        client_id=LABELLERR_CLIENT_ID,
        questions=ANNOTATION_QUESTIONS,  
        template_name=f"{PROJECT_NAME} Template",
        data_type=DATA_TYPE,
    )

    #  Handle UUID-only or dict response
    if isinstance(res_str, str) and len(res_str) == 36 and res_str.count('-') == 4:
        template_id = res_str
        display(Markdown(f"Template created: `{template_id}`"))
    else:
        try:
            res = json.loads(res_str) if isinstance(res_str, str) else res_str
            template_id = res.get("response", {}).get("template_id")
            if template_id:
                display(Markdown(f"Template created: `{template_id}`"))
            else:
                display(Markdown(" Could not find template_id in response"))
                display(Markdown(f"Raw response: `{res_str}`"))
        except json.JSONDecodeError as e:
            display(Markdown(f" Response parsing issue: `{e}`"))
            display(Markdown(f"Raw response: `{res_str}`"))
            if isinstance(res_str, str) and len(res_str) > 20:
                template_id = res_str
                display(Markdown(f"Using response as template_id: `{template_id}`"))


except LabellerrError as e:
    display(Markdown(f"Template creation failed: `{e}`"))


Enter fullscreen mode Exit fullscreen mode

What This Code Does

Line / Component Purpose
create_annotation_guideline(...) Creates a labeling template in Labellerr
questions=ANNOTATION_QUESTIONS Passes your schema (Car, License Plate)
template_name Human-readable name for the template
data_type=DATA_TYPE Ensures correct modality (e.g., "image")
UUID check (len==36 & 4 dashes) Detects raw UUID responses from older SDKs
json.loads(...) Parses JSON response from newer SDKs
res["response"]["template_id"] Extracts template ID when returned inside a dict
except LabellerrError Catches SDK errors and prints a clean message
Fallback to template_id = res_str Uses response as template ID if no JSON found

Expected Success Output

Template created: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx


Step 8: Creating the Project in Labellerr & Uploading Images

Once our annotation template is ready, the next step is to create a project in Labellerr and link our dataset.
We will also upload the sample images folder so annotators can begin labeling.

import json
from IPython.display import display, Markdown

project_id = None
if template_id:
    try:
        print("Creating project and linking dataset...")

        payload = {
            "client_id": LABELLERR_CLIENT_ID,
            "dataset_name": DATASET_NAME,
            "dataset_description": DATASET_DESCRIPTION,
            "data_type": DATA_TYPE,
            "created_by": LABELLERR_EMAIL,
            "project_name": PROJECT_NAME,
            "annotation_template_id": template_id,
            "rotation_config": {
                "annotation_rotation_count": 1,
                "review_rotation_count": 1,
                "client_review_rotation_count": 1
            },
            "autolabel": False,
            "folder_to_upload": "/content/sample_images"
        }

        # Pass as a single payload dictionary
        res = client.initiate_create_project(payload)

        # Handle response
        if isinstance(res, str) and len(res) == 36 and res.count('-') == 4:
            project_id = res
            display(Markdown(f"Project created: `{project_id}`"))
        else:
            try:
                res_obj = json.loads(res) if isinstance(res, str) else res
                project_id = res_obj.get("response", {}).get("project_id")
                if project_id:
                    display(Markdown(f" Project created: `{project_id}`"))
                else:
                    display(Markdown(" Could not find project_id in response"))
                    display(Markdown(f"Raw response: `{res}`"))
            except Exception as e:
                display(Markdown(f" Response parsing issue: `{e}`"))
                display(Markdown(f"Raw response: `{res}`"))

    except LabellerrError as e:
        display(Markdown(f" Project creation failed: `{e}`"))

Enter fullscreen mode Exit fullscreen mode

What This Code Does

Line / Component Purpose
payload = {...} Contains all project settings and metadata
dataset_name / project_name Organized names in Labellerr dashboard
annotation_template_id Attaches the schema we made earlier
folder_to_upload Directory containing images to upload to Labellerr
initiate_create_project(payload) Creates project + uploads dataset
UUID response check Handles cases where API returns raw UUID
JSON parse fallback Handles cases where API returns JSON object
except LabellerrError Catches SDK errors cleanly

Step 9: Linking an Existing Dataset to the Project in Labellerr

If your dataset is already uploaded to Labellerr, you do not need to upload images again.
Instead, you simply need to copy its Dataset ID from the Labellerr dashboard and pass it into the project creation function.

How to Get Dataset ID:

  • Open Labellerr Dashboard
  • Go to Datasets section
  • Select your dataset
  • Copy the Dataset ID shown in the details panel

This Dataset ID is what we will use in the code.

Code to Link Dataset With Project

rotation_config = {
    "annotation_rotation_count": 1,
    "review_rotation_count": 1,
    "client_review_rotation_count": 1
}

print("Creating project and linking existing dataset...")

res = client.create_project(
    project_name=PROJECT_NAME,
    data_type=DATA_TYPE,
    client_id=LABELLERR_CLIENT_ID,
    dataset_id="b7954c7f-a071-4eb7-b4e0-1980b3505e2b",   # ✅ Paste your Dataset ID here
    annotation_template_id=template_id,
    rotation_config=rotation_config
)

print("Project created successfully:")
print(res)

Enter fullscreen mode Exit fullscreen mode

Explanation

Field / Parameter Meaning
dataset_id The dataset you selected from Labellerr → copied from dashboard
annotation_template_id The annotation schema we created earlier (Car + License Plate)
rotation_config How many people annotate → review → client verify
create_project() Creates a new project and links your dataset + labeling workflow

Step 10:Labeling Workflow in Labellerr (Step-by-Step)

Once your dataset is linked to the project, you can start labeling right inside Labellerr.
Follow these clear steps to go from Label → Review → Accept.

Steps:

1. Go to Projects

Open Labellerr Dashboard → Projects tab.

2. Open Your Project

Select the project you created/linked (e.g., Vehicle Object Detection).

3. Go to the Label Section

_ Inside the project, click Label to open the annotation interface._

4. Start Labeling

- Choose the right tool (e.g., Bounding Box).
- Draw boxes around each Car and License Plate as per your schema.
- Assign the correct class from the sidebar.

5. Save / Submit

Save the annotation for each image (as per UI button in your workspace).

6. Move to Review

Go to Review tab. The images you labeled will show up for verification.

7. Accept or Send Back

- If the annotation looks good → Accept
- If changes needed → Send back to labeling.


Step 11: Exporting Labeled Dataset from Labellerr

After labeling + review, we export annotations from Labellerr.
This script automates 3 steps: Create export → Poll status → Download & Validate.

# --- Car Dataset Export Script ---
from IPython.display import display, Markdown
import requests
import json
import time
from pathlib import Path
import logging
import traceback
import uuid
from labellerr.exceptions import LabellerrError

project_id = "magdaia_joyous_peafowl_21008"

# CONFIGURE LOGGER
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S',
    force=True
)
logger = logging.getLogger(__name__)

# --- GLOBAL VARS ---
downloaded_annotations = None

# Step 1: Create Export
def create_export(client, project_id, client_id, export_config):
    """Initiates an export job on the Labellerr platform."""
    logger.info("Step 1: Creating export for car dataset...")
    try:
        res = client.create_local_export(project_id, client_id, export_config)
        export_id = res["response"]["report_id"]
        logger.info(f"Export created successfully. Export ID: {export_id}")
        return export_id
    except LabellerrError as e:
        logger.error(f"Export creation failed: `{e}`")
        return None

# Step 2: Poll Export Status
def poll_export_status(client, api_key, api_secret, project_id, export_id, client_id, max_wait_time=300, wait_interval=10):
    """Polls Labellerr API for export completion status."""
    logger.info(f"Step 2: Polling for export completion (max {max_wait_time}s)...")
    elapsed_time = 0
    while elapsed_time < max_wait_time:
        raw_status = client.check_export_status(
            api_key=api_key,
            api_secret=api_secret,
            project_id=project_id,
            report_ids=[export_id],
            client_id=client_id
        )

        status_obj = None
        if isinstance(raw_status, dict):
            status_obj = raw_status
        elif isinstance(raw_status, str):
            try:
                status_obj = json.loads(raw_status)
            except json.JSONDecodeError:
                logger.warning(f"Could not parse status string: '{raw_status}'")

        if status_obj and status_obj.get('status') and len(status_obj['status']) > 0:
            export_status = status_obj['status'][0]
            is_completed = export_status.get('is_completed', False)
            export_status_text = export_status.get('export_status', 'Unknown')
            logger.info(f"Current status: '{export_status_text}' (Completed: {is_completed})")
            if is_completed:
                logger.info("Export completed! Proceeding to download.")
                return True
            elif export_status_text.lower() == "failed":
                logger.error("Export failed!")
                return False

        time.sleep(wait_interval)
        elapsed_time += wait_interval

    logger.warning(f"Export timeout after {max_wait_time}s.")
    return False

# Step 3: Download & Validate Export
def download_and_validate_export(client, api_key, api_secret, project_id, export_id, client_id):
    """Fetches download URL, downloads, and validates the exported data."""
    logger.info("Step 3: Fetching download URL and validating data...")
    try:
        download_uuid = str(uuid.uuid4())
        raw_download_result = client.fetch_download_url(
            api_key=api_key,
            api_secret=api_secret,
            project_id=project_id,
            uuid=download_uuid,
            export_id=export_id,
            client_id=client_id
        )

        download_obj = None
        if isinstance(raw_download_result, dict):
            download_obj = raw_download_result
        elif isinstance(raw_download_result, str):
            try:
                download_obj = json.loads(raw_download_result)
            except json.JSONDecodeError:
                logger.warning(f"Could not parse download URL string: '{raw_download_result}'")

        download_url = (
            download_obj.get('url')
            or download_obj.get('response', {}).get('download_url')
        )

        if download_url:
            logger.info("Download URL fetched successfully.")
            logger.info(f"DOWNLOAD LINK (expires in ~1 hour): {download_url}")

            exports_dir = Path("exports")
            exports_dir.mkdir(exist_ok=True)
            export_file = exports_dir / f"car_dataset_export_{export_id}.json"

            response = requests.get(download_url)
            if response.status_code == 200:
                with open(export_file, 'wb') as f:
                    f.write(response.content)
                logger.info(f"Export downloaded to {export_file}")

                # Validate JSON structure
                try:
                    json_data = json.loads(response.content)
                    annotated_count = sum(
                        1 for ann in json_data
                        if ann.get("latest_answer") and len(ann["latest_answer"]) > 0
                    )

                    logger.info(f"VALIDATION: {annotated_count}/{len(json_data)} images have annotations.")
                    if annotated_count == 0:
                        logger.error(" No annotations found! Please ensure images are labeled.")
                        return False
                    else:
                        logger.info("Annotation data validated successfully!")
                        return True

                except json.JSONDecodeError:
                    logger.error(" Invalid JSON format in downloaded export.")
                    return False
            else:
                logger.error(f"Failed to download export file. HTTP {response.status_code}")
                return False

        else:
            logger.error(f"No download URL found. Raw response: {raw_download_result}")
            return False

    except Exception as e:
        logger.error(f" Error during download: {e}\n{traceback.format_exc()}")
        return False


#  MAIN EXECUTION
if project_id:
    try:
        export_config = {
            "export_name": "CarDatasetExport",
            "export_description": "Export of all annotated car images",
            "export_format": "json",
            "statuses": ['review', 'r_assigned', 'client_review', 'cr_assigned', 'accepted'],
            "export_destination": "local",
            "question_ids": ["all"]
        }

        export_id = create_export(client, project_id, LABELLERR_CLIENT_ID, export_config)

        if export_id:
            if poll_export_status(client, LABELLERR_API_KEY, LABELLERR_API_SECRET, project_id, export_id, LABELLERR_CLIENT_ID):
                download_and_validate_export(client, LABELLERR_API_KEY, LABELLERR_API_SECRET, project_id, export_id, LABELLERR_CLIENT_ID)

    except Exception as e:
        logger.error(f" Unexpected error in main block: {e}\n{traceback.format_exc()}")

Enter fullscreen mode Exit fullscreen mode

What Each Part Does:

Block / Line Purpose
logging.basicConfig(...) Shows clean, timestamped logs in console
create_local_export(...) Starts export job on Labellerr (returns report_id)
poll_export_status(...) Re-checks export status until completed/failed
fetch_download_url(...) Retrieves temporary download link for export file
requests.get(download_url) Downloads export into exports/*.json
Annotation Validation Confirms labeled data actually exists
statuses=[...] Controls which workflow stages to include in export
question_ids=["all"] Ensures all annotation question categories are exported

Step 12: Installing YOLOv8 (Ultralytics)

Before we start training our object detection model, we need to install Ultralytics, which contains YOLOv8.

!python -m pip install ultralytics
Enter fullscreen mode Exit fullscreen mode

Step 13: Convert Labellerr Export (JSON) → YOLOv8 Dataset + Train/Val Split

Labellerr exports annotations in JSON format, but YOLOv8 requires images and labels in a specific directory structure along with .txt annotation files in normalized YOLO format (class x_center y_center width height).

In this step, we will:

  • Load the exported JSON file
  • Split the dataset into train and val sets
  • Convert bounding boxes into YOLO normalized coordinates
  • Copy images to the correct folders
  • Create annotation .txt files
import json, shutil, random
from pathlib import Path
from sklearn.model_selection import train_test_split

# --- Paths ---
EXPORT_FILE = Path("/content/exports/car_dataset_export_NfKqUBdOqPEz2HPfnSw4.json")
IMAGE_SOURCE_DIR = Path("/content/sample_images")  # your 10 images here
YOLO_DATA_DIR = Path("/content/yolo_Car_dataset")

# --- Classes ---
CLASS_NAMES = ["Car", "License Plate"]
CLASS_MAP = {name: i for i, name in enumerate(CLASS_NAMES)}

# --- Reset dataset folder ---
if YOLO_DATA_DIR.exists():
    shutil.rmtree(YOLO_DATA_DIR)

for split in ["train", "val"]:
    (YOLO_DATA_DIR / "images" / split).mkdir(parents=True, exist_ok=True)
    (YOLO_DATA_DIR / "labels" / split).mkdir(parents=True, exist_ok=True)

# --- Load JSON ---
with open(EXPORT_FILE, "r") as f:
    data = json.load(f)

print(f"Loaded {len(data)} items")

# --- Split Train/Val ---
train_data, val_data = train_test_split(data, test_size=0.2, random_state=42)

def convert_and_save(items, split):
    for item in items:
        file_name = item["file_name"]
        width = item["file_metadata"]["image_width"]
        height = item["file_metadata"]["image_height"]

        # Copy image
        src = IMAGE_SOURCE_DIR / file_name
        dst = YOLO_DATA_DIR / "images" / split / file_name
        if src.exists():
            shutil.copy(src, dst)
        else:
            print(f" Missing image: {src}")
            continue

        # Prepare label file
        label_path = YOLO_DATA_DIR / "labels" / split / f"{Path(file_name).stem}.txt"
        lines = []

        for ans_group in item.get("latest_answer", []):
            for ann in ans_group.get("answer", []):
                label = ann.get("label")
                if label not in CLASS_MAP:
                    continue
                cls_id = CLASS_MAP[label]
                bbox = ann["answer"]

                # YOLO normalized format
                x_center = ((bbox["xmin"] + bbox["xmax"]) / 2) / width
                y_center = ((bbox["ymin"] + bbox["ymax"]) / 2) / height
                w = (bbox["xmax"] - bbox["xmin"]) / width
                h = (bbox["ymax"] - bbox["ymin"]) / height

                lines.append(f"{cls_id} {x_center:.6f} {y_center:.6f} {w:.6f} {h:.6f}")

        with open(label_path, "w") as f:
            f.write("\n".join(lines))

# --- Run conversion ---
convert_and_save(train_data, "train")
convert_and_save(val_data, "val")

print("\n YOLO dataset structure created successfully at:", YOLO_DATA_DIR)
print("Classes:", CLASS_MAP)

Enter fullscreen mode Exit fullscreen mode

What This Script Does

Step Action Explanation
1 Load JSON export Reads annotation and metadata from Labellerr
2 Define class mapping Assigns each object category a numerical class ID
3 Create /images/ and /labels/ folders Matches the expected YOLO dataset layout
4 Split data into train & val Ensures proper supervised learning workflow
5 Copy images to destination folders Prepares dataset structure for YOLO
6 Convert bounding boxes → YOLO format Normalizes coordinates between 0–1
7 Write .txt annotation files YOLO uses one annotation text file per image

Resulting Dataset Structure

yolo_Car_dataset/
 ├─ images/
 │   ├─ train/
 │   └─ val/
 └─ labels/
     ├─ train/
     └─ val/
Enter fullscreen mode Exit fullscreen mode

Step 14: Convert Labellerr Annotations → YOLOv8 Labels (Function)

This helper converts Labellerr’s JSON annotations into YOLO normalized TXT labels and copies images into the correct split folders.

def convert_labellerr_to_yolo(data, split):
    """Converts Labellerr bounding box annotations to YOLO format."""
    for ann in data:
        image_name = ann.get("file_name")
        image_width = ann.get("file_metadata", {}).get("image_width", 1)
        image_height = ann.get("file_metadata", {}).get("image_height", 1)

        # Copy image
        source_image_path = IMAGE_SOURCE_DIR / image_name
        dest_image_path = YOLO_DATA_DIR / "images" / split / image_name
        if source_image_path.exists():
            shutil.copy2(source_image_path, dest_image_path)
        else:
            print(f"Warning: Source image not found: {source_image_path}")
            continue

        # Label file path
        label_path = YOLO_DATA_DIR / "labels" / split / f"{Path(image_name).stem}.txt"

        with open(label_path, "w") as f:
            for qa in ann.get("latest_answer", []):  # iterate over each question (Car, License Plate, etc.)
                for ans in qa.get("answer", []):     # iterate over each bounding box inside that question
                    class_name = ans.get("label")
                    if class_name not in CLASS_MAP:
                        continue

                    class_id = CLASS_MAP[class_name]
                    box = ans.get("answer", {})

                    if not all(k in box for k in ["xmin", "ymin", "xmax", "ymax"]):
                        continue

                    xmin, ymin, xmax, ymax = (
                        float(box["xmin"]),
                        float(box["ymin"]),
                        float(box["xmax"]),
                        float(box["ymax"]),
                    )

                    # Convert to YOLO normalized format
                    x_center = ((xmin + xmax) / 2) / image_width
                    y_center = ((ymin + ymax) / 2) / image_height
                    width = (xmax - xmin) / image_width
                    height = (ymax - ymin) / image_height

                    f.write(f"{class_id} {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}\n")

        # Check for empty file
        if label_path.stat().st_size == 0:
            print(f"Warning: Empty label file -> {label_path.name}")

Enter fullscreen mode Exit fullscreen mode

What This Function Does

Piece Purpose
data List of Labellerr JSON items (one per image)
split Either "train" or "val"; controls which folder images & labels go into
Copy image → images/split/ Places each image inside YOLO dataset structure
Create TXT → labels/split/ Creates YOLO annotation files matching each image
CLASS_MAP mapping Converts class names like "Car" into numeric YOLO class IDs
Normalize bbox Converts bounding box to YOLO normalized format (x_center y_center width height)
Empty label warning Alerts when an image has no valid annotations (good for debugging)

Step 15: Creating dataset.yaml for YOLOv8

YOLOv8 needs a dataset.yaml file that tells it where the images & labels are located and which classes the model will learn.
The script below automatically generates it based on your dataset folder and class mapping.

import yaml
from pathlib import Path # Added import
from IPython.display import display, Markdown # Added import

# Define the dataset configuration
yaml_content = {
    'path': str(YOLO_DATA_DIR.resolve()), # The absolute path to the dataset directory
    'train': 'images/train',             # Path to training images (relative to 'path')
    'val': 'images/val',                 # Path to validation images (relative to 'path')
    'names': {v: k for k, v in CLASS_MAP.items()} # Class names map (e.g., {0: 'Ripe Banana', 1: 'Unripe Banana'})
}

# Write the configuration to a file
yaml_file = YOLO_DATA_DIR / "dataset.yaml"
with open(yaml_file, 'w') as f:
    yaml.dump(yaml_content, f, default_flow_style=False, sort_keys=False)

print(f"Created dataset configuration at '{yaml_file}'")
print("\n--- dataset.yaml content ---")
print(yaml.dump(yaml_content, sort_keys=False))
Enter fullscreen mode Exit fullscreen mode

Understanding the dataset.yaml Fields

Field Meaning
path Root directory of the YOLO dataset
train Folder containing training images (relative to path)
val Folder containing validation images (relative to path)
names Class index → Class name dictionary used by YOLO during training

Example dataset.yaml (Generated Output)

Created dataset configuration at '/content/yolo_Car_dataset/dataset.yaml'
--- dataset.yaml content ---
path: /content/yolo_Car_dataset
train: images/train
val: images/val
names:
0: Car
1: License Plate


Step 16: Training YOLOv8 on Our Dataset

Now that our dataset is structured and the dataset.yaml file is ready, we can train a YOLOv8 model.
Here, we use YOLOv8m (medium variant) for better accuracy.

from ultralytics import YOLO
from pathlib import Path

# Load medium model for better accuracy
model = YOLO('yolov8m.pt')

results = model.train(
    data=str(yaml_file),   # Path to your dataset YAML
    epochs=100,            # Increase training epochs for better learning
    imgsz=640,
    batch=4,
    freeze=0,
    project='car_license_training',
    name='yolov8m_finetuned'
)

print("\n YOLOv8m Object Detection Training Complete!")
Enter fullscreen mode Exit fullscreen mode

What This Code Does

Parameter Description
YOLO('yolov8m.pt') Loads the YOLOv8 Medium model (balanced accuracy & speed)
data=str(yaml_file) Uses the dataset YAML we generated earlier
epochs=100 Number of training cycles (higher = better accuracy)
imgsz=640 Input image resolution (default recommended for YOLOv8)
batch=4 Number of images processed per training step
freeze=0 Allows full fine-tuning of entire model
project= Folder where training logs & weights are stored
name= Name of the trained model checkpoint subfolder

** Tip**

To get higher accuracy, you may later increase:

epochs = 150 or 200


Step 17: Inference: Test the Trained YOLOv8 Model on Validation Images

Now that training is complete, let’s load the best weights and run inference on a few validation images.
We’ll draw colored boxes: green for license plates, yellow for cars.

from ultralytics import YOLO
from pathlib import Path
from IPython.display import display, Markdown
from PIL import Image
import cv2
import numpy as np

# Load best weights
best_weights = Path("/content/car_license_training/yolov8m_finetuned/weights/best.pt")
if best_weights.exists():
    infer_model = YOLO(str(best_weights))
    display(Markdown(f"Using best weights: `{best_weights}`"))
else:
    raise FileNotFoundError(" best.pt not found!")

# Load validation images
val_dir = Path("/content/yolo_Car_dataset/images/val")
test_images = [p for p in val_dir.glob("*") if p.suffix.lower() in {".jpg", ".jpeg", ".png"}][:6]
if not test_images:
    raise FileNotFoundError("No validation images found.")

# Inference
display(Markdown("###  Running inference with lower confidence (0.10)..."))

for img_path in test_images:
    preds = infer_model.predict(source=str(img_path), imgsz=640, conf=0.10, save=False, verbose=False)

    result = preds[0]

    img = cv2.imread(str(img_path))
    if img is None:
        print(f" Could not read image: {img_path}")
        continue

    for box in result.boxes:
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        cls = int(box.cls[0])
        conf = float(box.conf[0])
        class_name = infer_model.names[cls]
        label = f"{class_name} {conf:.2f}"

        #  Color code
        if class_name.lower() in ["license plate", "plate"]:
            color = (0, 255, 0)   # Green for license plate
        else:
            color = (0, 255, 255) # Yellow for car

        cv2.rectangle(img, (x1, y1), (x2, y2), color, 2)
        cv2.putText(img, label, (x1, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)

    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    display(Image.fromarray(img_rgb))

display(Markdown("Inference complete — check if license plates appear now!"))
Enter fullscreen mode Exit fullscreen mode

What This Inference Code Does

Part Purpose
Load best.pt Uses the best checkpoint from training for inference
Collect validation images Selects sample images to test predictions
infer_model.predict(...) Runs YOLOv8 inference on each image
Draw bounding boxes + confidence scores Visualizes detection results clearly
Color coding Green = License Plate, Yellow = Car
Display results inline Shows final annotated images directly in notebook/output

Expected Output

Using
best weights: /content/car_license_training/yolov8m_finetuned/weights/best.pt
Running inference with lower confidence (0.10)...


Google Colab Notebook

Run the full workflow, experiment with the dataset, or retrain the model:

https://colab.research.google.com/drive/1KmontQoTchCJ9oqqGeBaZ0mxCD4eVZTU#scrollTo=2SdNLRuRVLDt


Thanks for reading! If you found this helpful, feel free to connect or leave feedback

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.