Tushar Singh

Posted on Nov 6 • Edited on Nov 22

Labellerr SDK & YOLOv8: Cars and Number Plate Detection Practical, Step-by-Step

#beginners #deeplearning #tutorial

Project Overview

In this project, we will build a complete end-to-end pipeline for detecting cars and their number plates using a custom-trained object detection model. The workflow is structured and beginner-friendly: we will first label images in Labellerr, then export those annotations, convert them into the YOLOv8 compatible format, and finally train and evaluate the model to see real detection results.

What is Labellerr?

Labellerr is a smart data annotation platform designed specifically for computer vision workflows.
It helps teams create high-quality labeled datasets efficiently, without the usual hassle of managing annotation tasks manually.

Key advantages of using Labellerr:

Intuitive bounding box annotation UI ideal for object detection tasks
Built-in quality review workflows ensures consistency across annotations
Direct export support for YOLO formats no manual formatting or conversion required

In short, Labellerr saves both time and effort while maintaining annotation quality, which is crucial for achieving good model performance.

For more details and examples, refer to the official documentation:
https://docs.labellerr.com/

What is YOLOv8?

YOLOv8(by Ultralytics) is one of the most widely used state-of-the-art object detection models.
It is known for delivering high accuracy, while still being fast enough for real-time detection.

Why YOLOv8 works well here:

Fast inference speed even on modest hardware
Straightforward training workflow
Excellent accuracy for object localization

This makes it a strong choice for real-world applications such as vehicle surveillance, traffic monitoring, and automated license plate recognition systems.

Why Use Labellerr and YOLOv8 Together?

Both tools complement each other perfectly:

Tool	Role in the Pipeline	Benefit
Labellerr	Label images clearly and consistently	High-quality dataset preparation
YOLOv8	Train and evaluate the detection model	Strong real-world performance with minimal setup

The workflow becomes:

Label → Export → Train → Detect,
without any messy intermediate steps or custom conversion scripts.

This results in a clean, reliable, and efficient pipeline from annotation to model deployment.

Project Setup

Step 1:

In this step, we will install all the required libraries including Labellerr SDK and YOLOv8.

Install all dependencies in one go.

!python -m pip install --upgrade pip
!python -m pip install --quiet https://github.com/tensormatics/SDKPython/releases/download/prod/labellerr_sdk-1.0.0.tar.gz kaggle Pillow requests python-dotenv opencv-python numpy scikit-learn

Step 2: Setting Up Kaggle Credentials

Since we are downloading the dataset from Kaggle, we first need to configure our Kaggle API credentials inside Google Colab.
This allows us to download datasets directly using commands like kaggle datasets download.

Steps:

Go to your Kaggle account settings
Scroll to API section
Click on Create New API Token
A file named kaggle.json will be downloaded
We will programmatically place this file in the correct location

import os, json, shutil
from pathlib import Path
from getpass import getpass
from IPython.display import display, Markdown

# Get Kaggle credentials from user input
KAGGLE_USERNAME = input("Enter your Kaggle username: ")
KAGGLE_KEY = getpass("Enter your Kaggle API key: ")

# Create .kaggle directory in user's home folder
kaggle_dir = Path.home() / ".kaggle"
kaggle_dir.mkdir(exist_ok=True)

# Write credentials to kaggle.json file
with open(kaggle_dir / "kaggle.json", "w") as f:
    json.dump({"username": KAGGLE_USERNAME, "key": KAGGLE_KEY}, f)

# Set proper permissions (600 = read/write for owner only)
os.chmod(kaggle_dir / "kaggle.json", 0o600)

display(Markdown("Kaggle credentials configured at `~/.kaggle/kaggle.json`"))
display(Markdown("Credentials securely stored with proper file permissions"))

What This Code Does:

Part / Line	Purpose
`input()` / `getpass()`	Securely capture username & API key (key not echoed)
`~/.kaggle/` directory	Required location for Kaggle CLI credentials
`kaggle.json` write	Stores `{username, key}` for authentication
`os.chmod(..., 0o600)`	Restricts file permissions to owner only
`display(Markdown(...))`	Shows friendly confirmation messages

Step 3: Downloading the Dataset from Kaggle

Now that our Kaggle credentials are configured, we can download the dataset directly into Google Colab using the Kaggle CLI.
This avoids manual upload and ensures fast + clean workflow.

Dataset Used:

andrewmvd/car-plate-detection

from pathlib import Path
from IPython.display import display, Markdown

# Download dataset via Kaggle CLI
DATASET = "andrewmvd/car-plate-detection"
DATA_DIR = Path("datasets/car")
DATA_DIR.mkdir(parents=True, exist_ok=True)

!kaggle datasets download -d {DATASET} -p {str(DATA_DIR)} --unzip

display(Markdown(f" Downloaded dataset `{DATASET}` to `{DATA_DIR}`"))

What This Code Does

Line / Component	Purpose
`DATASET`	Specifies the dataset to download from Kaggle
`Path("datasets/car")`	Sets the directory where dataset will be stored
`mkdir(parents=True, exist_ok=True)`	Creates folder if it doesn’t already exist
`kaggle datasets download`	Fetches dataset using Kaggle CLI
`--unzip`	Automatically extracts the files after download
`display(Markdown(...))`	Outputs a friendly completion message

Step 4: Preparing Sample Images for Inspection

Before labeling and training, it's always a good idea to preview a few images from the dataset.
This helps us verify:

The dataset downloaded correctly
The images are relevant
Their quality is suitable for training

Here, we will randomly pick 10 sample images and store them in a separate folder for quick review.

from pathlib import Path
from IPython.display import display, Markdown
import random
import shutil

# Prepare 10 sample images (ripe/unripe bananas subset if present; otherwise any images)
SEARCH_DIRS = [
    DATA_DIR,
]
IMAGE_EXTS = {".jpg", ".jpeg", ".png", ".bmp", ".tiff"}

all_images = []
for base in SEARCH_DIRS:
    for p in base.rglob("*"):
        if p.is_file() and p.suffix.lower() in IMAGE_EXTS:
            all_images.append(p)

random.shuffle(all_images)

SAMPLE_DIR = Path("sample_images")
SAMPLE_DIR.mkdir(exist_ok=True)

selected = all_images[:10]
for i, src in enumerate(selected, start=1):
    dst = SAMPLE_DIR / f"car_{i:02d}{src.suffix.lower()}"
    shutil.copy2(src, dst)

display(Markdown(f" Prepared {len(selected)} images in `{SAMPLE_DIR}`"))

What This Code Does

Part / Line	Purpose
`DATA_DIR`	Directory containing the downloaded dataset
`IMAGE_EXTS`	Defines valid image file extensions
`rglob("*")`	Recursively searches all folders for images
`random.shuffle()`	Randomizes image selection
`SAMPLE_DIR.mkdir()`	Creates a folder named `sample_images`
`copy2()`	Copies selected sample images to that folder
`display(Markdown(...))`	Prints a neat confirmation message

Step 5: Connecting to Labellerr (Authentication)

Now we'll connect to the Labellerr platform. This cell will prompt you for your Labellerr credentials to initialize the API client.

Client ID: Your workspace-specific ID.
Email: The email associated with your Labellerr account.
API Key & Secret: Found in your Labellerr account settings. These will be entered securely with hidden input.

** How to Get Your Client ID**
The method depends on your Labellerr plan:

If you're using a Pro or Enterprise plan, you can simply contact Labellerr Support and they will share your Client ID.
If you're on the Free plan, you can request your Client ID by sending a short message to: support@tensormatics.com (mention the email you used to sign up on Labellerr that’s it.)

from getpass import getpass
from labellerr.client import LabellerrClient
from labellerr.exceptions import LabellerrError
from IPython.display import display, Markdown

# --- Interactive Input for Labellerr Credentials ---
print("Please enter your Labellerr API credentials.")
LABELLERR_CLIENT_ID = input("Labellerr Client ID: ")
LABELLERR_EMAIL = input("Labellerr Email: ")
LABELLERR_API_KEY = getpass("Labellerr API Key (input will be hidden): ")
LABELLERR_API_SECRET = getpass("Labellerr API Secret (input will be hidden): ")

# --- Initialize Labellerr Client ---
try:
    if not all([LABELLERR_API_KEY, LABELLERR_API_SECRET, LABELLERR_CLIENT_ID, LABELLERR_EMAIL]):
        raise ValueError("One or more required fields were left empty.")

    client = LabellerrClient(LABELLERR_API_KEY, LABELLERR_API_SECRET)

    display(Markdown(" Labellerr client initialized successfully!"))

except (LabellerrError, ValueError) as e:
    display(Markdown(f" **Client Initialization Failed:** {e}"))
    client = None

What This Code Does

Line / Component	Purpose
`getpass()`	Safely hides API key input while typing
`LabellerrClient()`	Creates authenticated Labellerr session
`ValueError` check	Ensures no fields were left empty
`try / except`	Catches invalid credentials and prevents crashes
`display(Markdown(...))`	Prints clean success/failure messages

Step 6: Creating Annotation Schema in Labellerr

Before labeling, we need to define what objects we want to annotate (Car & License Plate).
This step tells Labellerr which classes to display in the labeling UI.

ANNOTATION_QUESTIONS = [
    {
        "question_number": 1,
        "question": "Car",
        "question_id": "car-bbox-001",
        "option_type": "BoundingBox",
        "required": False,
        "options": [
            {"option_id": "opt-001", "option_name": "#FF0000"}
        ],
        "question_metadata": []
    },
    {
        "question_number": 2,
        "question": "License Plate",
        "question_id": "plate-bbox-002",
        "option_type": "BoundingBox",
        "required": False,
        "options": [
            {"option_id": "opt-002", "option_name": "#00FF00"}
        ],
        "question_metadata": []
    }
]




PROJECT_NAME = "Vehicle Object Detection new"
DATASET_NAME = "vehicle_dataset_sample_new"
DATASET_DESCRIPTION = "10 vehicle images including cars, plates"
DATA_TYPE = "image"

What This Code Defines

Field / Section	Meaning / Purpose
`ANNOTATION_QUESTIONS`	List of labeling tasks the annotator will perform
`question`	Name shown in labeling UI (Car, License Plate)
`option_type = "BoundingBox"`	Means user will draw bounding boxes
`option_name = "#FF0000"` / `"#00FF00"`	Box color to visually differentiate classes
`PROJECT_NAME`	Name of the project inside Labellerr
`DATASET_NAME`	Dataset title where labeled images will be stored
`DATASET_DESCRIPTION`	Short description for dataset organization
`DATA_TYPE = "image"`	Specifies dataset type important for correct processing

Step 7: Create Annotation Template in Labellerr

We’ll create an annotation guideline template (schema) in Labellerr using our questions (Car, License Plate).
Different SDK versions may return either a UUID string or a JSON dict, so we’ll handle both safely.

import json
from IPython.display import display, Markdown
from labellerr.exceptions import LabellerrError

template_id = None
try:
    res_str = client.create_annotation_guideline(
        client_id=LABELLERR_CLIENT_ID,
        questions=ANNOTATION_QUESTIONS,  
        template_name=f"{PROJECT_NAME} Template",
        data_type=DATA_TYPE,
    )

    #  Handle UUID-only or dict response
    if isinstance(res_str, str) and len(res_str) == 36 and res_str.count('-') == 4:
        template_id = res_str
        display(Markdown(f"Template created: `{template_id}`"))
    else:
        try:
            res = json.loads(res_str) if isinstance(res_str, str) else res_str
            template_id = res.get("response", {}).get("template_id")
            if template_id:
                display(Markdown(f"Template created: `{template_id}`"))
            else:
                display(Markdown(" Could not find template_id in response"))
                display(Markdown(f"Raw response: `{res_str}`"))
        except json.JSONDecodeError as e:
            display(Markdown(f" Response parsing issue: `{e}`"))
            display(Markdown(f"Raw response: `{res_str}`"))
            if isinstance(res_str, str) and len(res_str) > 20:
                template_id = res_str
                display(Markdown(f"Using response as template_id: `{template_id}`"))


except LabellerrError as e:
    display(Markdown(f"Template creation failed: `{e}`"))

What This Code Does

Line / Component	Purpose
`create_annotation_guideline(...)`	Creates a labeling template in Labellerr
`questions=ANNOTATION_QUESTIONS`	Passes your schema (Car, License Plate)
`template_name`	Human-readable name for the template
`data_type=DATA_TYPE`	Ensures correct modality (e.g., `"image"`)
UUID check (`len==36` & 4 dashes)	Detects raw UUID responses from older SDKs
`json.loads(...)`	Parses JSON response from newer SDKs
`res["response"]["template_id"]`	Extracts template ID when returned inside a dict
`except LabellerrError`	Catches SDK errors and prints a clean message
Fallback to `template_id = res_str`	Uses response as template ID if no JSON found

Expected Success Output

Template created: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Step 8: Creating the Project in Labellerr & Uploading Images

Once our annotation template is ready, the next step is to create a project in Labellerr and link our dataset.
We will also upload the sample images folder so annotators can begin labeling.

import json
from IPython.display import display, Markdown

project_id = None
if template_id:
    try:
        print("Creating project and linking dataset...")

        payload = {
            "client_id": LABELLERR_CLIENT_ID,
            "dataset_name": DATASET_NAME,
            "dataset_description": DATASET_DESCRIPTION,
            "data_type": DATA_TYPE,
            "created_by": LABELLERR_EMAIL,
            "project_name": PROJECT_NAME,
            "annotation_template_id": template_id,
            "rotation_config": {
                "annotation_rotation_count": 1,
                "review_rotation_count": 1,
                "client_review_rotation_count": 1
            },
            "autolabel": False,
            "folder_to_upload": "/content/sample_images"
        }

        # Pass as a single payload dictionary
        res = client.initiate_create_project(payload)

        # Handle response
        if isinstance(res, str) and len(res) == 36 and res.count('-') == 4:
            project_id = res
            display(Markdown(f"Project created: `{project_id}`"))
        else:
            try:
                res_obj = json.loads(res) if isinstance(res, str) else res
                project_id = res_obj.get("response", {}).get("project_id")
                if project_id:
                    display(Markdown(f" Project created: `{project_id}`"))
                else:
                    display(Markdown(" Could not find project_id in response"))
                    display(Markdown(f"Raw response: `{res}`"))
            except Exception as e:
                display(Markdown(f" Response parsing issue: `{e}`"))
                display(Markdown(f"Raw response: `{res}`"))

    except LabellerrError as e:
        display(Markdown(f" Project creation failed: `{e}`"))

What This Code Does

Line / Component	Purpose
`payload = {...}`	Contains all project settings and metadata
`dataset_name` / `project_name`	Organized names in Labellerr dashboard
`annotation_template_id`	Attaches the schema we made earlier
`folder_to_upload`	Directory containing images to upload to Labellerr
`initiate_create_project(payload)`	Creates project + uploads dataset
UUID response check	Handles cases where API returns raw UUID
JSON parse fallback	Handles cases where API returns JSON object
`except LabellerrError`	Catches SDK errors cleanly

Step 9: Linking an Existing Dataset to the Project in Labellerr

If your dataset is already uploaded to Labellerr, you do not need to upload images again.
Instead, you simply need to copy its Dataset ID from the Labellerr dashboard and pass it into the project creation function.

How to Get Dataset ID:

Open Labellerr Dashboard
Go to Datasets section
Select your dataset
Copy the Dataset ID shown in the details panel

This Dataset ID is what we will use in the code.

Code to Link Dataset With Project

rotation_config = {
    "annotation_rotation_count": 1,
    "review_rotation_count": 1,
    "client_review_rotation_count": 1
}

print("Creating project and linking existing dataset...")

res = client.create_project(
    project_name=PROJECT_NAME,
    data_type=DATA_TYPE,
    client_id=LABELLERR_CLIENT_ID,
    dataset_id="b7954c7f-a071-4eb7-b4e0-1980b3505e2b",   # ✅ Paste your Dataset ID here
    annotation_template_id=template_id,
    rotation_config=rotation_config
)

print("Project created successfully:")
print(res)

Explanation

Field / Parameter	Meaning
`dataset_id`	The dataset you selected from Labellerr → copied from dashboard
`annotation_template_id`	The annotation schema we created earlier (Car + License Plate)
`rotation_config`	How many people annotate → review → client verify
`create_project()`	Creates a new project and links your dataset + labeling workflow

Step 10:Labeling Workflow in Labellerr (Step-by-Step)

Once your dataset is linked to the project, you can start labeling right inside Labellerr.
Follow these clear steps to go from Label → Review → Accept.

Steps:

1. Go to Projects

Open Labellerr Dashboard → Projects tab.

2. Open Your Project

Select the project you created/linked (e.g., Vehicle Object Detection).

3. Go to the Label Section

_ Inside the project, click Label to open the annotation interface._

4. Start Labeling

- Choose the right tool (e.g., Bounding Box).
- Draw boxes around each Car and License Plate as per your schema.
- Assign the correct class from the sidebar.

5. Save / Submit

Save the annotation for each image (as per UI button in your workspace).

6. Move to Review

Go to Review tab. The images you labeled will show up for verification.

7. Accept or Send Back

- If the annotation looks good → Accept
- If changes needed → Send back to labeling.

Step 11: Exporting Labeled Dataset from Labellerr

After labeling + review, we export annotations from Labellerr.
This script automates 3 steps: Create export → Poll status → Download & Validate.

# --- Car Dataset Export Script ---
from IPython.display import display, Markdown
import requests
import json
import time
from pathlib import Path
import logging
import traceback
import uuid
from labellerr.exceptions import LabellerrError

project_id = "magdaia_joyous_peafowl_21008"

# CONFIGURE LOGGER
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S',
    force=True
)
logger = logging.getLogger(__name__)

# --- GLOBAL VARS ---
downloaded_annotations = None

# Step 1: Create Export
def create_export(client, project_id, client_id, export_config):
    """Initiates an export job on the Labellerr platform."""
    logger.info("Step 1: Creating export for car dataset...")
    try:
        res = client.create_local_export(project_id, client_id, export_config)
        export_id = res["response"]["report_id"]
        logger.info(f"Export created successfully. Export ID: {export_id}")
        return export_id
    except LabellerrError as e:
        logger.error(f"Export creation failed: `{e}`")
        return None

# Step 2: Poll Export Status
def poll_export_status(client, api_key, api_secret, project_id, export_id, client_id, max_wait_time=300, wait_interval=10):
    """Polls Labellerr API for export completion status."""
    logger.info(f"Step 2: Polling for export completion (max {max_wait_time}s)...")
    elapsed_time = 0
    while elapsed_time < max_wait_time:
        raw_status = client.check_export_status(
            api_key=api_key,
            api_secret=api_secret,
            project_id=project_id,
            report_ids=[export_id],
            client_id=client_id
        )

        status_obj = None
        if isinstance(raw_status, dict):
            status_obj = raw_status
        elif isinstance(raw_status, str):
            try:
                status_obj = json.loads(raw_status)
            except json.JSONDecodeError:
                logger.warning(f"Could not parse status string: '{raw_status}'")

        if status_obj and status_obj.get('status') and len(status_obj['status']) > 0:
            export_status = status_obj['status'][0]
            is_completed = export_status.get('is_completed', False)
            export_status_text = export_status.get('export_status', 'Unknown')
            logger.info(f"Current status: '{export_status_text}' (Completed: {is_completed})")
            if is_completed:
                logger.info("Export completed! Proceeding to download.")
                return True
            elif export_status_text.lower() == "failed":
                logger.error("Export failed!")
                return False

        time.sleep(wait_interval)
        elapsed_time += wait_interval

    logger.warning(f"Export timeout after {max_wait_time}s.")
    return False

# Step 3: Download & Validate Export
def download_and_validate_export(client, api_key, api_secret, project_id, export_id, client_id):
    """Fetches download URL, downloads, and validates the exported data."""
    logger.info("Step 3: Fetching download URL and validating data...")
    try:
        download_uuid = str(uuid.uuid4())
        raw_download_result = client.fetch_download_url(
            api_key=api_key,
            api_secret=api_secret,
            project_id=project_id,
            uuid=download_uuid,
            export_id=export_id,
            client_id=client_id
        )

        download_obj = None
        if isinstance(raw_download_result, dict):
            download_obj = raw_download_result
        elif isinstance(raw_download_result, str):
            try:
                download_obj = json.loads(raw_download_result)
            except json.JSONDecodeError:
                logger.warning(f"Could not parse download URL string: '{raw_download_result}'")

        download_url = (
            download_obj.get('url')
            or download_obj.get('response', {}).get('download_url')
        )

        if download_url:
            logger.info("Download URL fetched successfully.")
            logger.info(f"DOWNLOAD LINK (expires in ~1 hour): {download_url}")

            exports_dir = Path("exports")
            exports_dir.mkdir(exist_ok=True)
            export_file = exports_dir / f"car_dataset_export_{export_id}.json"

            response = requests.get(download_url)
            if response.status_code == 200:
                with open(export_file, 'wb') as f:
                    f.write(response.content)
                logger.info(f"Export downloaded to {export_file}")

                # Validate JSON structure
                try:
                    json_data = json.loads(response.content)
                    annotated_count = sum(
                        1 for ann in json_data
                        if ann.get("latest_answer") and len(ann["latest_answer"]) > 0
                    )

                    logger.info(f"VALIDATION: {annotated_count}/{len(json_data)} images have annotations.")
                    if annotated_count == 0:
                        logger.error(" No annotations found! Please ensure images are labeled.")
                        return False
                    else:
                        logger.info("Annotation data validated successfully!")
                        return True

                except json.JSONDecodeError:
                    logger.error(" Invalid JSON format in downloaded export.")
                    return False
            else:
                logger.error(f"Failed to download export file. HTTP {response.status_code}")
                return False

        else:
            logger.error(f"No download URL found. Raw response: {raw_download_result}")
            return False

    except Exception as e:
        logger.error(f" Error during download: {e}\n{traceback.format_exc()}")
        return False


#  MAIN EXECUTION
if project_id:
    try:
        export_config = {
            "export_name": "CarDatasetExport",
            "export_description": "Export of all annotated car images",
            "export_format": "json",
            "statuses": ['review', 'r_assigned', 'client_review', 'cr_assigned', 'accepted'],
            "export_destination": "local",
            "question_ids": ["all"]
        }

        export_id = create_export(client, project_id, LABELLERR_CLIENT_ID, export_config)

        if export_id:
            if poll_export_status(client, LABELLERR_API_KEY, LABELLERR_API_SECRET, project_id, export_id, LABELLERR_CLIENT_ID):
                download_and_validate_export(client, LABELLERR_API_KEY, LABELLERR_API_SECRET, project_id, export_id, LABELLERR_CLIENT_ID)

    except Exception as e:
        logger.error(f" Unexpected error in main block: {e}\n{traceback.format_exc()}")

What Each Part Does:

Block / Line	Purpose
`logging.basicConfig(...)`	Shows clean, timestamped logs in console
`create_local_export(...)`	Starts export job on Labellerr (returns `report_id`)
`poll_export_status(...)`	Re-checks export status until completed/failed
`fetch_download_url(...)`	Retrieves temporary download link for export file
`requests.get(download_url)`	Downloads export into `exports/*.json`
Annotation Validation	Confirms labeled data actually exists
`statuses=[...]`	Controls which workflow stages to include in export
`question_ids=["all"]`	Ensures all annotation question categories are exported

Step 12: Installing YOLOv8 (Ultralytics)

Before we start training our object detection model, we need to install Ultralytics, which contains YOLOv8.

!python -m pip install ultralytics

Step 13: Convert Labellerr Export (JSON) → YOLOv8 Dataset + Train/Val Split

Labellerr exports annotations in JSON format, but YOLOv8 requires images and labels in a specific directory structure along with .txt annotation files in normalized YOLO format (class x_center y_center width height).

In this step, we will:

Load the exported JSON file
Split the dataset into train and val sets
Convert bounding boxes into YOLO normalized coordinates
Copy images to the correct folders
Create annotation .txt files

import json, shutil, random
from pathlib import Path
from sklearn.model_selection import train_test_split

# --- Paths ---
EXPORT_FILE = Path("/content/exports/car_dataset_export_NfKqUBdOqPEz2HPfnSw4.json")
IMAGE_SOURCE_DIR = Path("/content/sample_images")  # your 10 images here
YOLO_DATA_DIR = Path("/content/yolo_Car_dataset")

# --- Classes ---
CLASS_NAMES = ["Car", "License Plate"]
CLASS_MAP = {name: i for i, name in enumerate(CLASS_NAMES)}

# --- Reset dataset folder ---
if YOLO_DATA_DIR.exists():
    shutil.rmtree(YOLO_DATA_DIR)

for split in ["train", "val"]:
    (YOLO_DATA_DIR / "images" / split).mkdir(parents=True, exist_ok=True)
    (YOLO_DATA_DIR / "labels" / split).mkdir(parents=True, exist_ok=True)

# --- Load JSON ---
with open(EXPORT_FILE, "r") as f:
    data = json.load(f)

print(f"Loaded {len(data)} items")

# --- Split Train/Val ---
train_data, val_data = train_test_split(data, test_size=0.2, random_state=42)

def convert_and_save(items, split):
    for item in items:
        file_name = item["file_name"]
        width = item["file_metadata"]["image_width"]
        height = item["file_metadata"]["image_height"]

        # Copy image
        src = IMAGE_SOURCE_DIR / file_name
        dst = YOLO_DATA_DIR / "images" / split / file_name
        if src.exists():
            shutil.copy(src, dst)
        else:
            print(f" Missing image: {src}")
            continue

        # Prepare label file
        label_path = YOLO_DATA_DIR / "labels" / split / f"{Path(file_name).stem}.txt"
        lines = []

        for ans_group in item.get("latest_answer", []):
            for ann in ans_group.get("answer", []):
                label = ann.get("label")
                if label not in CLASS_MAP:
                    continue
                cls_id = CLASS_MAP[label]
                bbox = ann["answer"]

                # YOLO normalized format
                x_center = ((bbox["xmin"] + bbox["xmax"]) / 2) / width
                y_center = ((bbox["ymin"] + bbox["ymax"]) / 2) / height
                w = (bbox["xmax"] - bbox["xmin"]) / width
                h = (bbox["ymax"] - bbox["ymin"]) / height

                lines.append(f"{cls_id} {x_center:.6f} {y_center:.6f} {w:.6f} {h:.6f}")

        with open(label_path, "w") as f:
            f.write("\n".join(lines))

# --- Run conversion ---
convert_and_save(train_data, "train")
convert_and_save(val_data, "val")

print("\n YOLO dataset structure created successfully at:", YOLO_DATA_DIR)
print("Classes:", CLASS_MAP)

What This Script Does

Step	Action	Explanation
1	Load JSON export	Reads annotation and metadata from Labellerr
2	Define class mapping	Assigns each object category a numerical class ID
3	Create `/images/` and `/labels/` folders	Matches the expected YOLO dataset layout
4	Split data into train & val	Ensures proper supervised learning workflow
5	Copy images to destination folders	Prepares dataset structure for YOLO
6	Convert bounding boxes → YOLO format	Normalizes coordinates between 0–1
7	Write `.txt` annotation files	YOLO uses one annotation text file per image

Resulting Dataset Structure

yolo_Car_dataset/
 ├─ images/
 │   ├─ train/
 │   └─ val/
 └─ labels/
     ├─ train/
     └─ val/

Step 14: Convert Labellerr Annotations → YOLOv8 Labels (Function)

This helper converts Labellerr’s JSON annotations into YOLO normalized TXT labels and copies images into the correct split folders.

def convert_labellerr_to_yolo(data, split):
    """Converts Labellerr bounding box annotations to YOLO format."""
    for ann in data:
        image_name = ann.get("file_name")
        image_width = ann.get("file_metadata", {}).get("image_width", 1)
        image_height = ann.get("file_metadata", {}).get("image_height", 1)

        # Copy image
        source_image_path = IMAGE_SOURCE_DIR / image_name
        dest_image_path = YOLO_DATA_DIR / "images" / split / image_name
        if source_image_path.exists():
            shutil.copy2(source_image_path, dest_image_path)
        else:
            print(f"Warning: Source image not found: {source_image_path}")
            continue

        # Label file path
        label_path = YOLO_DATA_DIR / "labels" / split / f"{Path(image_name).stem}.txt"

        with open(label_path, "w") as f:
            for qa in ann.get("latest_answer", []):  # iterate over each question (Car, License Plate, etc.)
                for ans in qa.get("answer", []):     # iterate over each bounding box inside that question
                    class_name = ans.get("label")
                    if class_name not in CLASS_MAP:
                        continue

                    class_id = CLASS_MAP[class_name]
                    box = ans.get("answer", {})

                    if not all(k in box for k in ["xmin", "ymin", "xmax", "ymax"]):
                        continue

                    xmin, ymin, xmax, ymax = (
                        float(box["xmin"]),
                        float(box["ymin"]),
                        float(box["xmax"]),
                        float(box["ymax"]),
                    )

                    # Convert to YOLO normalized format
                    x_center = ((xmin + xmax) / 2) / image_width
                    y_center = ((ymin + ymax) / 2) / image_height
                    width = (xmax - xmin) / image_width
                    height = (ymax - ymin) / image_height

                    f.write(f"{class_id} {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}\n")

        # Check for empty file
        if label_path.stat().st_size == 0:
            print(f"Warning: Empty label file -> {label_path.name}")

What This Function Does

Piece	Purpose
`data`	List of Labellerr JSON items (one per image)
`split`	Either `"train"` or `"val"`; controls which folder images & labels go into
Copy image → `images/split/`	Places each image inside YOLO dataset structure
Create TXT → `labels/split/`	Creates YOLO annotation files matching each image
`CLASS_MAP` mapping	Converts class names like `"Car"` into numeric YOLO class IDs
Normalize bbox	Converts bounding box to YOLO normalized format (`x_center y_center width height`)
Empty label warning	Alerts when an image has no valid annotations (good for debugging)

Step 15: Creating dataset.yaml for YOLOv8

YOLOv8 needs a dataset.yaml file that tells it where the images & labels are located and which classes the model will learn.
The script below automatically generates it based on your dataset folder and class mapping.

import yaml
from pathlib import Path # Added import
from IPython.display import display, Markdown # Added import

# Define the dataset configuration
yaml_content = {
    'path': str(YOLO_DATA_DIR.resolve()), # The absolute path to the dataset directory
    'train': 'images/train',             # Path to training images (relative to 'path')
    'val': 'images/val',                 # Path to validation images (relative to 'path')
    'names': {v: k for k, v in CLASS_MAP.items()} # Class names map (e.g., {0: 'Ripe Banana', 1: 'Unripe Banana'})
}

# Write the configuration to a file
yaml_file = YOLO_DATA_DIR / "dataset.yaml"
with open(yaml_file, 'w') as f:
    yaml.dump(yaml_content, f, default_flow_style=False, sort_keys=False)

print(f"Created dataset configuration at '{yaml_file}'")
print("\n--- dataset.yaml content ---")
print(yaml.dump(yaml_content, sort_keys=False))

Understanding the `dataset.yaml` Fields

Field	Meaning
`path`	Root directory of the YOLO dataset
`train`	Folder containing training images (relative to `path`)
`val`	Folder containing validation images (relative to `path`)
`names`	Class index → Class name dictionary used by YOLO during training

Example dataset.yaml (Generated Output)

Created dataset configuration at '/content/yolo_Car_dataset/dataset.yaml'
--- dataset.yaml content ---
path: /content/yolo_Car_dataset
train: images/train
val: images/val
names:
0: Car
1: License Plate

Step 16: Training YOLOv8 on Our Dataset

Now that our dataset is structured and the dataset.yaml file is ready, we can train a YOLOv8 model.
Here, we use YOLOv8m (medium variant) for better accuracy.

from ultralytics import YOLO
from pathlib import Path

# Load medium model for better accuracy
model = YOLO('yolov8m.pt')

results = model.train(
    data=str(yaml_file),   # Path to your dataset YAML
    epochs=100,            # Increase training epochs for better learning
    imgsz=640,
    batch=4,
    freeze=0,
    project='car_license_training',
    name='yolov8m_finetuned'
)

print("\n YOLOv8m Object Detection Training Complete!")

What This Code Does

Parameter	Description
`YOLO('yolov8m.pt')`	Loads the YOLOv8 Medium model (balanced accuracy & speed)
`data=str(yaml_file)`	Uses the dataset YAML we generated earlier
`epochs=100`	Number of training cycles (higher = better accuracy)
`imgsz=640`	Input image resolution (default recommended for YOLOv8)
`batch=4`	Number of images processed per training step
`freeze=0`	Allows full fine-tuning of entire model
`project=`	Folder where training logs & weights are stored
`name=`	Name of the trained model checkpoint subfolder

** Tip**

To get higher accuracy, you may later increase:

epochs = 150 or 200

Step 17: Inference: Test the Trained YOLOv8 Model on Validation Images

Now that training is complete, let’s load the best weights and run inference on a few validation images.
We’ll draw colored boxes: green for license plates, yellow for cars.

from ultralytics import YOLO
from pathlib import Path
from IPython.display import display, Markdown
from PIL import Image
import cv2
import numpy as np

# Load best weights
best_weights = Path("/content/car_license_training/yolov8m_finetuned/weights/best.pt")
if best_weights.exists():
    infer_model = YOLO(str(best_weights))
    display(Markdown(f"Using best weights: `{best_weights}`"))
else:
    raise FileNotFoundError(" best.pt not found!")

# Load validation images
val_dir = Path("/content/yolo_Car_dataset/images/val")
test_images = [p for p in val_dir.glob("*") if p.suffix.lower() in {".jpg", ".jpeg", ".png"}][:6]
if not test_images:
    raise FileNotFoundError("No validation images found.")

# Inference
display(Markdown("###  Running inference with lower confidence (0.10)..."))

for img_path in test_images:
    preds = infer_model.predict(source=str(img_path), imgsz=640, conf=0.10, save=False, verbose=False)

    result = preds[0]

    img = cv2.imread(str(img_path))
    if img is None:
        print(f" Could not read image: {img_path}")
        continue

    for box in result.boxes:
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        cls = int(box.cls[0])
        conf = float(box.conf[0])
        class_name = infer_model.names[cls]
        label = f"{class_name} {conf:.2f}"

        #  Color code
        if class_name.lower() in ["license plate", "plate"]:
            color = (0, 255, 0)   # Green for license plate
        else:
            color = (0, 255, 255) # Yellow for car

        cv2.rectangle(img, (x1, y1), (x2, y2), color, 2)
        cv2.putText(img, label, (x1, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)

    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    display(Image.fromarray(img_rgb))

display(Markdown("Inference complete  check if license plates appear now!"))

What This Inference Code Does

Part	Purpose
Load `best.pt`	Uses the best checkpoint from training for inference
Collect validation images	Selects sample images to test predictions
`infer_model.predict(...)`	Runs YOLOv8 inference on each image
Draw bounding boxes + confidence scores	Visualizes detection results clearly
Color coding	Green = License Plate, Yellow = Car
Display results inline	Shows final annotated images directly in notebook/output

Expected Output

Using
best weights: /content/car_license_training/yolov8m_finetuned/weights/best.pt
Running inference with lower confidence (0.10)...

Google Colab Notebook

Run the full workflow, experiment with the dataset, or retrain the model:

https://colab.research.google.com/drive/1KmontQoTchCJ9oqqGeBaZ0mxCD4eVZTU#scrollTo=2SdNLRuRVLDt

Thanks for reading! If you found this helpful, feel free to connect or leave feedback

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

Project Overview

What is Labellerr?

What is YOLOv8?

Why Use Labellerr and YOLOv8 Together?

Project Setup

Step 1:

Step 2: Setting Up Kaggle Credentials

Step 3: Downloading the Dataset from Kaggle

Step 4: Preparing Sample Images for Inspection

Step 5: Connecting to Labellerr (Authentication)

Step 6: Creating Annotation Schema in Labellerr

Step 7: Create Annotation Template in Labellerr

What This Code Does

Step 8: Creating the Project in Labellerr & Uploading Images

Step 9: Linking an Existing Dataset to the Project in Labellerr

Step 10:Labeling Workflow in Labellerr (Step-by-Step)

Step 11: Exporting Labeled Dataset from Labellerr

Step 12: Installing YOLOv8 (Ultralytics)

Step 13: Convert Labellerr Export (JSON) → YOLOv8 Dataset + Train/Val Split

What This Script Does

Step 14: Convert Labellerr Annotations → YOLOv8 Labels (Function)

What This Function Does

Step 15: Creating dataset.yaml for YOLOv8

Understanding the dataset.yaml Fields

Step 16: Training YOLOv8 on Our Dataset

Step 17: Inference: Test the Trained YOLOv8 Model on Validation Images

What This Inference Code Does

Google Colab Notebook

Understanding the `dataset.yaml` Fields