Project Overview
In this project, we will build a complete end-to-end pipeline for detecting cars and their number plates using a custom-trained object detection model. The workflow is structured and beginner-friendly: we will first label images in Labellerr, then export those annotations, convert them into the YOLOv8 compatible format, and finally train and evaluate the model to see real detection results.
What is Labellerr?
Labellerr is a smart data annotation platform designed specifically for computer vision workflows.
It helps teams create high-quality labeled datasets efficiently, without the usual hassle of managing annotation tasks manually.
Key advantages of using Labellerr:
- Intuitive bounding box annotation UI — ideal for object detection tasks
- Built-in quality review workflows — ensures consistency across annotations
- Direct export support for YOLO formats — no manual formatting or conversion required
In short, Labellerr saves both time and effort while maintaining annotation quality, which is crucial for achieving good model performance.
For more details and examples, refer to the official documentation:
https://docs.labellerr.com/
What is YOLOv8?
YOLOv8(by Ultralytics) is one of the most widely used state-of-the-art object detection models.
It is known for delivering high accuracy, while still being fast enough for real-time detection.
Why YOLOv8 works well here:
- Fast inference speed even on modest hardware
- Straightforward training workflow
- Excellent accuracy for object localization
This makes it a strong choice for real-world applications such as vehicle surveillance, traffic monitoring, and automated license plate recognition systems.
Why Use Labellerr and YOLOv8 Together?
Both tools complement each other perfectly:
| Tool | Role in the Pipeline | Benefit |
|---|---|---|
| Labellerr | Label images clearly and consistently | High-quality dataset preparation |
| YOLOv8 | Train and evaluate the detection model | Strong real-world performance with minimal setup |
The workflow becomes:
Label → Export → Train → Detect,
without any messy intermediate steps or custom conversion scripts.
This results in a clean, reliable, and efficient pipeline from annotation to model deployment.
Project Setup
Step 1:
In this step, we will install all the required libraries including Labellerr SDK and YOLOv8.
Install all dependencies in one go.
!python -m pip install --upgrade pip
!python -m pip install --quiet https://github.com/tensormatics/SDKPython/releases/download/prod/labellerr_sdk-1.0.0.tar.gz kaggle Pillow requests python-dotenv opencv-python numpy scikit-learn
Step 2: Setting Up Kaggle Credentials
Since we are downloading the dataset from Kaggle, we first need to configure our Kaggle API credentials inside Google Colab.
This allows us to download datasets directly using commands like kaggle datasets download.
Steps:
- Go to your Kaggle account settings
- Scroll to API section
- Click on Create New API Token
- A file named kaggle.json will be downloaded
- We will programmatically place this file in the correct location
import os, json, shutil
from pathlib import Path
from getpass import getpass
from IPython.display import display, Markdown
# Get Kaggle credentials from user input
KAGGLE_USERNAME = input("Enter your Kaggle username: ")
KAGGLE_KEY = getpass("Enter your Kaggle API key: ")
# Create .kaggle directory in user's home folder
kaggle_dir = Path.home() / ".kaggle"
kaggle_dir.mkdir(exist_ok=True)
# Write credentials to kaggle.json file
with open(kaggle_dir / "kaggle.json", "w") as f:
json.dump({"username": KAGGLE_USERNAME, "key": KAGGLE_KEY}, f)
# Set proper permissions (600 = read/write for owner only)
os.chmod(kaggle_dir / "kaggle.json", 0o600)
display(Markdown("Kaggle credentials configured at `~/.kaggle/kaggle.json`"))
display(Markdown("Credentials securely stored with proper file permissions"))
What This Code Does:
| Part / Line | Purpose |
|---|---|
input() / getpass()
|
Securely capture username & API key (key not echoed) |
~/.kaggle/ directory |
Required location for Kaggle CLI credentials |
kaggle.json write |
Stores {username, key} for authentication |
os.chmod(..., 0o600) |
Restricts file permissions to owner only |
display(Markdown(...)) |
Shows friendly confirmation messages |
Step 3: Downloading the Dataset from Kaggle
Now that our Kaggle credentials are configured, we can download the dataset directly into Google Colab using the Kaggle CLI.
This avoids manual upload and ensures fast + clean workflow.
Dataset Used:
andrewmvd/car-plate-detection
from pathlib import Path
from IPython.display import display, Markdown
# Download dataset via Kaggle CLI
DATASET = "andrewmvd/car-plate-detection"
DATA_DIR = Path("datasets/car")
DATA_DIR.mkdir(parents=True, exist_ok=True)
!kaggle datasets download -d {DATASET} -p {str(DATA_DIR)} --unzip
display(Markdown(f" Downloaded dataset `{DATASET}` to `{DATA_DIR}`"))
What This Code Does
| Line / Component | Purpose |
|---|---|
DATASET |
Specifies the dataset to download from Kaggle |
Path("datasets/car") |
Sets the directory where dataset will be stored |
mkdir(parents=True, exist_ok=True) |
Creates folder if it doesn’t already exist |
kaggle datasets download |
Fetches dataset using Kaggle CLI |
--unzip |
Automatically extracts the files after download |
display(Markdown(...)) |
Outputs a friendly completion message |
Step 4: Preparing Sample Images for Inspection
Before labeling and training, it's always a good idea to preview a few images from the dataset.
This helps us verify:
- The dataset downloaded correctly
- The images are relevant
- Their quality is suitable for training
Here, we will randomly pick 10 sample images and store them in a separate folder for quick review.
from pathlib import Path
from IPython.display import display, Markdown
import random
import shutil
# Prepare 10 sample images (ripe/unripe bananas subset if present; otherwise any images)
SEARCH_DIRS = [
DATA_DIR,
]
IMAGE_EXTS = {".jpg", ".jpeg", ".png", ".bmp", ".tiff"}
all_images = []
for base in SEARCH_DIRS:
for p in base.rglob("*"):
if p.is_file() and p.suffix.lower() in IMAGE_EXTS:
all_images.append(p)
random.shuffle(all_images)
SAMPLE_DIR = Path("sample_images")
SAMPLE_DIR.mkdir(exist_ok=True)
selected = all_images[:10]
for i, src in enumerate(selected, start=1):
dst = SAMPLE_DIR / f"car_{i:02d}{src.suffix.lower()}"
shutil.copy2(src, dst)
display(Markdown(f" Prepared {len(selected)} images in `{SAMPLE_DIR}`"))
What This Code Does
| Part / Line | Purpose |
|---|---|
DATA_DIR |
Directory containing the downloaded dataset |
IMAGE_EXTS |
Defines valid image file extensions |
rglob("*") |
Recursively searches all folders for images |
random.shuffle() |
Randomizes image selection |
SAMPLE_DIR.mkdir() |
Creates a folder named sample_images
|
copy2() |
Copies selected sample images to that folder |
display(Markdown(...)) |
Prints a neat confirmation message |
Step 5: Connecting to Labellerr (Authentication)
Now we'll connect to the Labellerr platform. This cell will prompt you for your Labellerr credentials to initialize the API client.
- Client ID: Your workspace-specific ID.
- Email: The email associated with your Labellerr account.
- API Key & Secret: Found in your Labellerr account settings. These will be entered securely with hidden input.
** How to Get Your Client ID**
The method depends on your Labellerr plan:
- If you're using a Pro or Enterprise plan, you can simply contact Labellerr Support and they will share your Client ID.
- If you're on the Free plan, you can request your Client ID by sending a short message to: support@tensormatics.com (mention the email you used to sign up on Labellerr — that’s it.)
from getpass import getpass
from labellerr.client import LabellerrClient
from labellerr.exceptions import LabellerrError
from IPython.display import display, Markdown
# --- Interactive Input for Labellerr Credentials ---
print("Please enter your Labellerr API credentials.")
LABELLERR_CLIENT_ID = input("Labellerr Client ID: ")
LABELLERR_EMAIL = input("Labellerr Email: ")
LABELLERR_API_KEY = getpass("Labellerr API Key (input will be hidden): ")
LABELLERR_API_SECRET = getpass("Labellerr API Secret (input will be hidden): ")
# --- Initialize Labellerr Client ---
try:
if not all([LABELLERR_API_KEY, LABELLERR_API_SECRET, LABELLERR_CLIENT_ID, LABELLERR_EMAIL]):
raise ValueError("One or more required fields were left empty.")
client = LabellerrClient(LABELLERR_API_KEY, LABELLERR_API_SECRET)
display(Markdown(" Labellerr client initialized successfully!"))
except (LabellerrError, ValueError) as e:
display(Markdown(f" **Client Initialization Failed:** {e}"))
client = None
What This Code Does
| Line / Component | Purpose |
|---|---|
getpass() |
Safely hides API key input while typing |
LabellerrClient() |
Creates authenticated Labellerr session |
ValueError check |
Ensures no fields were left empty |
try / except |
Catches invalid credentials and prevents crashes |
display(Markdown(...)) |
Prints clean success/failure messages |
Step 6: Creating Annotation Schema in Labellerr
Before labeling, we need to define what objects we want to annotate (Car & License Plate).
This step tells Labellerr which classes to display in the labeling UI.
ANNOTATION_QUESTIONS = [
{
"question_number": 1,
"question": "Car",
"question_id": "car-bbox-001",
"option_type": "BoundingBox",
"required": False,
"options": [
{"option_id": "opt-001", "option_name": "#FF0000"}
],
"question_metadata": []
},
{
"question_number": 2,
"question": "License Plate",
"question_id": "plate-bbox-002",
"option_type": "BoundingBox",
"required": False,
"options": [
{"option_id": "opt-002", "option_name": "#00FF00"}
],
"question_metadata": []
}
]
PROJECT_NAME = "Vehicle Object Detection new"
DATASET_NAME = "vehicle_dataset_sample_new"
DATASET_DESCRIPTION = "10 vehicle images including cars, plates"
DATA_TYPE = "image"
What This Code Defines
| Field / Section | Meaning / Purpose |
|---|---|
ANNOTATION_QUESTIONS |
List of labeling tasks the annotator will perform |
question |
Name shown in labeling UI (Car, License Plate) |
option_type = "BoundingBox" |
Means user will draw bounding boxes |
option_name = "#FF0000" / "#00FF00"
|
Box color to visually differentiate classes |
PROJECT_NAME |
Name of the project inside Labellerr |
DATASET_NAME |
Dataset title where labeled images will be stored |
DATASET_DESCRIPTION |
Short description for dataset organization |
DATA_TYPE = "image" |
Specifies dataset type — important for correct processing |
Step 7: Create Annotation Template in Labellerr
We’ll create an annotation guideline template (schema) in Labellerr using our questions (Car, License Plate).
Different SDK versions may return either a UUID string or a JSON dict, so we’ll handle both safely.
import json
from IPython.display import display, Markdown
from labellerr.exceptions import LabellerrError
template_id = None
try:
res_str = client.create_annotation_guideline(
client_id=LABELLERR_CLIENT_ID,
questions=ANNOTATION_QUESTIONS,
template_name=f"{PROJECT_NAME} Template",
data_type=DATA_TYPE,
)
# Handle UUID-only or dict response
if isinstance(res_str, str) and len(res_str) == 36 and res_str.count('-') == 4:
template_id = res_str
display(Markdown(f"Template created: `{template_id}`"))
else:
try:
res = json.loads(res_str) if isinstance(res_str, str) else res_str
template_id = res.get("response", {}).get("template_id")
if template_id:
display(Markdown(f"Template created: `{template_id}`"))
else:
display(Markdown(" Could not find template_id in response"))
display(Markdown(f"Raw response: `{res_str}`"))
except json.JSONDecodeError as e:
display(Markdown(f" Response parsing issue: `{e}`"))
display(Markdown(f"Raw response: `{res_str}`"))
if isinstance(res_str, str) and len(res_str) > 20:
template_id = res_str
display(Markdown(f"Using response as template_id: `{template_id}`"))
except LabellerrError as e:
display(Markdown(f"Template creation failed: `{e}`"))
What This Code Does
| Line / Component | Purpose |
|---|---|
create_annotation_guideline(...) |
Creates a labeling template in Labellerr |
questions=ANNOTATION_QUESTIONS |
Passes your schema (Car, License Plate) |
template_name |
Human-readable name for the template |
data_type=DATA_TYPE |
Ensures correct modality (e.g., "image") |
UUID check (len==36 & 4 dashes) |
Detects raw UUID responses from older SDKs |
json.loads(...) |
Parses JSON response from newer SDKs |
res["response"]["template_id"] |
Extracts template ID when returned inside a dict |
except LabellerrError |
Catches SDK errors and prints a clean message |
Fallback to template_id = res_str
|
Uses response as template ID if no JSON found |
Expected Success Output
Template created:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Step 8: Creating the Project in Labellerr & Uploading Images
Once our annotation template is ready, the next step is to create a project in Labellerr and link our dataset.
We will also upload the sample images folder so annotators can begin labeling.
import json
from IPython.display import display, Markdown
project_id = None
if template_id:
try:
print("Creating project and linking dataset...")
payload = {
"client_id": LABELLERR_CLIENT_ID,
"dataset_name": DATASET_NAME,
"dataset_description": DATASET_DESCRIPTION,
"data_type": DATA_TYPE,
"created_by": LABELLERR_EMAIL,
"project_name": PROJECT_NAME,
"annotation_template_id": template_id,
"rotation_config": {
"annotation_rotation_count": 1,
"review_rotation_count": 1,
"client_review_rotation_count": 1
},
"autolabel": False,
"folder_to_upload": "/content/sample_images"
}
# Pass as a single payload dictionary
res = client.initiate_create_project(payload)
# Handle response
if isinstance(res, str) and len(res) == 36 and res.count('-') == 4:
project_id = res
display(Markdown(f"Project created: `{project_id}`"))
else:
try:
res_obj = json.loads(res) if isinstance(res, str) else res
project_id = res_obj.get("response", {}).get("project_id")
if project_id:
display(Markdown(f" Project created: `{project_id}`"))
else:
display(Markdown(" Could not find project_id in response"))
display(Markdown(f"Raw response: `{res}`"))
except Exception as e:
display(Markdown(f" Response parsing issue: `{e}`"))
display(Markdown(f"Raw response: `{res}`"))
except LabellerrError as e:
display(Markdown(f" Project creation failed: `{e}`"))
What This Code Does
| Line / Component | Purpose |
|---|---|
payload = {...} |
Contains all project settings and metadata |
dataset_name / project_name
|
Organized names in Labellerr dashboard |
annotation_template_id |
Attaches the schema we made earlier |
folder_to_upload |
Directory containing images to upload to Labellerr |
initiate_create_project(payload) |
Creates project + uploads dataset |
| UUID response check | Handles cases where API returns raw UUID |
| JSON parse fallback | Handles cases where API returns JSON object |
except LabellerrError |
Catches SDK errors cleanly |
Step 9: Linking an Existing Dataset to the Project in Labellerr
If your dataset is already uploaded to Labellerr, you do not need to upload images again.
Instead, you simply need to copy its Dataset ID from the Labellerr dashboard and pass it into the project creation function.
How to Get Dataset ID:
- Open Labellerr Dashboard
- Go to Datasets section
- Select your dataset
- Copy the Dataset ID shown in the details panel
This Dataset ID is what we will use in the code.
Code to Link Dataset With Project
rotation_config = {
"annotation_rotation_count": 1,
"review_rotation_count": 1,
"client_review_rotation_count": 1
}
print("Creating project and linking existing dataset...")
res = client.create_project(
project_name=PROJECT_NAME,
data_type=DATA_TYPE,
client_id=LABELLERR_CLIENT_ID,
dataset_id="b7954c7f-a071-4eb7-b4e0-1980b3505e2b", # ✅ Paste your Dataset ID here
annotation_template_id=template_id,
rotation_config=rotation_config
)
print("Project created successfully:")
print(res)
Explanation
| Field / Parameter | Meaning |
|---|---|
dataset_id |
The dataset you selected from Labellerr → copied from dashboard |
annotation_template_id |
The annotation schema we created earlier (Car + License Plate) |
rotation_config |
How many people annotate → review → client verify |
create_project() |
Creates a new project and links your dataset + labeling workflow |
Step 10:Labeling Workflow in Labellerr (Step-by-Step)
Once your dataset is linked to the project, you can start labeling right inside Labellerr.
Follow these clear steps to go from Label → Review → Accept.
Steps:
1. Go to Projects
Open Labellerr Dashboard → Projects tab.
2. Open Your Project
Select the project you created/linked (e.g., Vehicle Object Detection).
3. Go to the Label Section
_ Inside the project, click Label to open the annotation interface._
4. Start Labeling
- Choose the right tool (e.g., Bounding Box).
- Draw boxes around each Car and License Plate as per your schema.
- Assign the correct class from the sidebar.
5. Save / Submit
Save the annotation for each image (as per UI button in your workspace).
6. Move to Review
Go to Review tab. The images you labeled will show up for verification.
7. Accept or Send Back
- If the annotation looks good → Accept
- If changes needed → Send back to labeling.
Step 11: Exporting Labeled Dataset from Labellerr
After labeling + review, we export annotations from Labellerr.
This script automates 3 steps: Create export → Poll status → Download & Validate.
# --- Car Dataset Export Script ---
from IPython.display import display, Markdown
import requests
import json
import time
from pathlib import Path
import logging
import traceback
import uuid
from labellerr.exceptions import LabellerrError
project_id = "magdaia_joyous_peafowl_21008"
# CONFIGURE LOGGER
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S',
force=True
)
logger = logging.getLogger(__name__)
# --- GLOBAL VARS ---
downloaded_annotations = None
# Step 1: Create Export
def create_export(client, project_id, client_id, export_config):
"""Initiates an export job on the Labellerr platform."""
logger.info("Step 1: Creating export for car dataset...")
try:
res = client.create_local_export(project_id, client_id, export_config)
export_id = res["response"]["report_id"]
logger.info(f"Export created successfully. Export ID: {export_id}")
return export_id
except LabellerrError as e:
logger.error(f"Export creation failed: `{e}`")
return None
# Step 2: Poll Export Status
def poll_export_status(client, api_key, api_secret, project_id, export_id, client_id, max_wait_time=300, wait_interval=10):
"""Polls Labellerr API for export completion status."""
logger.info(f"Step 2: Polling for export completion (max {max_wait_time}s)...")
elapsed_time = 0
while elapsed_time < max_wait_time:
raw_status = client.check_export_status(
api_key=api_key,
api_secret=api_secret,
project_id=project_id,
report_ids=[export_id],
client_id=client_id
)
status_obj = None
if isinstance(raw_status, dict):
status_obj = raw_status
elif isinstance(raw_status, str):
try:
status_obj = json.loads(raw_status)
except json.JSONDecodeError:
logger.warning(f"Could not parse status string: '{raw_status}'")
if status_obj and status_obj.get('status') and len(status_obj['status']) > 0:
export_status = status_obj['status'][0]
is_completed = export_status.get('is_completed', False)
export_status_text = export_status.get('export_status', 'Unknown')
logger.info(f"Current status: '{export_status_text}' (Completed: {is_completed})")
if is_completed:
logger.info("Export completed! Proceeding to download.")
return True
elif export_status_text.lower() == "failed":
logger.error("Export failed!")
return False
time.sleep(wait_interval)
elapsed_time += wait_interval
logger.warning(f"Export timeout after {max_wait_time}s.")
return False
# Step 3: Download & Validate Export
def download_and_validate_export(client, api_key, api_secret, project_id, export_id, client_id):
"""Fetches download URL, downloads, and validates the exported data."""
logger.info("Step 3: Fetching download URL and validating data...")
try:
download_uuid = str(uuid.uuid4())
raw_download_result = client.fetch_download_url(
api_key=api_key,
api_secret=api_secret,
project_id=project_id,
uuid=download_uuid,
export_id=export_id,
client_id=client_id
)
download_obj = None
if isinstance(raw_download_result, dict):
download_obj = raw_download_result
elif isinstance(raw_download_result, str):
try:
download_obj = json.loads(raw_download_result)
except json.JSONDecodeError:
logger.warning(f"Could not parse download URL string: '{raw_download_result}'")
download_url = (
download_obj.get('url')
or download_obj.get('response', {}).get('download_url')
)
if download_url:
logger.info("Download URL fetched successfully.")
logger.info(f"DOWNLOAD LINK (expires in ~1 hour): {download_url}")
exports_dir = Path("exports")
exports_dir.mkdir(exist_ok=True)
export_file = exports_dir / f"car_dataset_export_{export_id}.json"
response = requests.get(download_url)
if response.status_code == 200:
with open(export_file, 'wb') as f:
f.write(response.content)
logger.info(f"Export downloaded to {export_file}")
# Validate JSON structure
try:
json_data = json.loads(response.content)
annotated_count = sum(
1 for ann in json_data
if ann.get("latest_answer") and len(ann["latest_answer"]) > 0
)
logger.info(f"VALIDATION: {annotated_count}/{len(json_data)} images have annotations.")
if annotated_count == 0:
logger.error(" No annotations found! Please ensure images are labeled.")
return False
else:
logger.info("Annotation data validated successfully!")
return True
except json.JSONDecodeError:
logger.error(" Invalid JSON format in downloaded export.")
return False
else:
logger.error(f"Failed to download export file. HTTP {response.status_code}")
return False
else:
logger.error(f"No download URL found. Raw response: {raw_download_result}")
return False
except Exception as e:
logger.error(f" Error during download: {e}\n{traceback.format_exc()}")
return False
# MAIN EXECUTION
if project_id:
try:
export_config = {
"export_name": "CarDatasetExport",
"export_description": "Export of all annotated car images",
"export_format": "json",
"statuses": ['review', 'r_assigned', 'client_review', 'cr_assigned', 'accepted'],
"export_destination": "local",
"question_ids": ["all"]
}
export_id = create_export(client, project_id, LABELLERR_CLIENT_ID, export_config)
if export_id:
if poll_export_status(client, LABELLERR_API_KEY, LABELLERR_API_SECRET, project_id, export_id, LABELLERR_CLIENT_ID):
download_and_validate_export(client, LABELLERR_API_KEY, LABELLERR_API_SECRET, project_id, export_id, LABELLERR_CLIENT_ID)
except Exception as e:
logger.error(f" Unexpected error in main block: {e}\n{traceback.format_exc()}")
What Each Part Does:
| Block / Line | Purpose |
|---|---|
logging.basicConfig(...) |
Shows clean, timestamped logs in console |
create_local_export(...) |
Starts export job on Labellerr (returns report_id) |
poll_export_status(...) |
Re-checks export status until completed/failed |
fetch_download_url(...) |
Retrieves temporary download link for export file |
requests.get(download_url) |
Downloads export into exports/*.json
|
| Annotation Validation | Confirms labeled data actually exists |
statuses=[...] |
Controls which workflow stages to include in export |
question_ids=["all"] |
Ensures all annotation question categories are exported |
Step 12: Installing YOLOv8 (Ultralytics)
Before we start training our object detection model, we need to install Ultralytics, which contains YOLOv8.
!python -m pip install ultralytics
Step 13: Convert Labellerr Export (JSON) → YOLOv8 Dataset + Train/Val Split
Labellerr exports annotations in JSON format, but YOLOv8 requires images and labels in a specific directory structure along with .txt annotation files in normalized YOLO format (class x_center y_center width height).
In this step, we will:
- Load the exported JSON file
- Split the dataset into train and val sets
- Convert bounding boxes into YOLO normalized coordinates
- Copy images to the correct folders
- Create annotation .txt files
import json, shutil, random
from pathlib import Path
from sklearn.model_selection import train_test_split
# --- Paths ---
EXPORT_FILE = Path("/content/exports/car_dataset_export_NfKqUBdOqPEz2HPfnSw4.json")
IMAGE_SOURCE_DIR = Path("/content/sample_images") # your 10 images here
YOLO_DATA_DIR = Path("/content/yolo_Car_dataset")
# --- Classes ---
CLASS_NAMES = ["Car", "License Plate"]
CLASS_MAP = {name: i for i, name in enumerate(CLASS_NAMES)}
# --- Reset dataset folder ---
if YOLO_DATA_DIR.exists():
shutil.rmtree(YOLO_DATA_DIR)
for split in ["train", "val"]:
(YOLO_DATA_DIR / "images" / split).mkdir(parents=True, exist_ok=True)
(YOLO_DATA_DIR / "labels" / split).mkdir(parents=True, exist_ok=True)
# --- Load JSON ---
with open(EXPORT_FILE, "r") as f:
data = json.load(f)
print(f"Loaded {len(data)} items")
# --- Split Train/Val ---
train_data, val_data = train_test_split(data, test_size=0.2, random_state=42)
def convert_and_save(items, split):
for item in items:
file_name = item["file_name"]
width = item["file_metadata"]["image_width"]
height = item["file_metadata"]["image_height"]
# Copy image
src = IMAGE_SOURCE_DIR / file_name
dst = YOLO_DATA_DIR / "images" / split / file_name
if src.exists():
shutil.copy(src, dst)
else:
print(f" Missing image: {src}")
continue
# Prepare label file
label_path = YOLO_DATA_DIR / "labels" / split / f"{Path(file_name).stem}.txt"
lines = []
for ans_group in item.get("latest_answer", []):
for ann in ans_group.get("answer", []):
label = ann.get("label")
if label not in CLASS_MAP:
continue
cls_id = CLASS_MAP[label]
bbox = ann["answer"]
# YOLO normalized format
x_center = ((bbox["xmin"] + bbox["xmax"]) / 2) / width
y_center = ((bbox["ymin"] + bbox["ymax"]) / 2) / height
w = (bbox["xmax"] - bbox["xmin"]) / width
h = (bbox["ymax"] - bbox["ymin"]) / height
lines.append(f"{cls_id} {x_center:.6f} {y_center:.6f} {w:.6f} {h:.6f}")
with open(label_path, "w") as f:
f.write("\n".join(lines))
# --- Run conversion ---
convert_and_save(train_data, "train")
convert_and_save(val_data, "val")
print("\n YOLO dataset structure created successfully at:", YOLO_DATA_DIR)
print("Classes:", CLASS_MAP)
What This Script Does
| Step | Action | Explanation |
|---|---|---|
| 1 | Load JSON export | Reads annotation and metadata from Labellerr |
| 2 | Define class mapping | Assigns each object category a numerical class ID |
| 3 | Create /images/ and /labels/ folders |
Matches the expected YOLO dataset layout |
| 4 | Split data into train & val | Ensures proper supervised learning workflow |
| 5 | Copy images to destination folders | Prepares dataset structure for YOLO |
| 6 | Convert bounding boxes → YOLO format | Normalizes coordinates between 0–1 |
| 7 | Write .txt annotation files |
YOLO uses one annotation text file per image |
Resulting Dataset Structure
yolo_Car_dataset/
├─ images/
│ ├─ train/
│ └─ val/
└─ labels/
├─ train/
└─ val/
Step 14: Convert Labellerr Annotations → YOLOv8 Labels (Function)
This helper converts Labellerr’s JSON annotations into YOLO normalized TXT labels and copies images into the correct split folders.
def convert_labellerr_to_yolo(data, split):
"""Converts Labellerr bounding box annotations to YOLO format."""
for ann in data:
image_name = ann.get("file_name")
image_width = ann.get("file_metadata", {}).get("image_width", 1)
image_height = ann.get("file_metadata", {}).get("image_height", 1)
# Copy image
source_image_path = IMAGE_SOURCE_DIR / image_name
dest_image_path = YOLO_DATA_DIR / "images" / split / image_name
if source_image_path.exists():
shutil.copy2(source_image_path, dest_image_path)
else:
print(f"Warning: Source image not found: {source_image_path}")
continue
# Label file path
label_path = YOLO_DATA_DIR / "labels" / split / f"{Path(image_name).stem}.txt"
with open(label_path, "w") as f:
for qa in ann.get("latest_answer", []): # iterate over each question (Car, License Plate, etc.)
for ans in qa.get("answer", []): # iterate over each bounding box inside that question
class_name = ans.get("label")
if class_name not in CLASS_MAP:
continue
class_id = CLASS_MAP[class_name]
box = ans.get("answer", {})
if not all(k in box for k in ["xmin", "ymin", "xmax", "ymax"]):
continue
xmin, ymin, xmax, ymax = (
float(box["xmin"]),
float(box["ymin"]),
float(box["xmax"]),
float(box["ymax"]),
)
# Convert to YOLO normalized format
x_center = ((xmin + xmax) / 2) / image_width
y_center = ((ymin + ymax) / 2) / image_height
width = (xmax - xmin) / image_width
height = (ymax - ymin) / image_height
f.write(f"{class_id} {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}\n")
# Check for empty file
if label_path.stat().st_size == 0:
print(f"Warning: Empty label file -> {label_path.name}")
What This Function Does
| Piece | Purpose |
|---|---|
data |
List of Labellerr JSON items (one per image) |
split |
Either "train" or "val"; controls which folder images & labels go into |
Copy image → images/split/
|
Places each image inside YOLO dataset structure |
Create TXT → labels/split/
|
Creates YOLO annotation files matching each image |
CLASS_MAP mapping |
Converts class names like "Car" into numeric YOLO class IDs |
| Normalize bbox | Converts bounding box to YOLO normalized format (x_center y_center width height) |
| Empty label warning | Alerts when an image has no valid annotations (good for debugging) |
Step 15: Creating dataset.yaml for YOLOv8
YOLOv8 needs a dataset.yaml file that tells it where the images & labels are located and which classes the model will learn.
The script below automatically generates it based on your dataset folder and class mapping.
import yaml
from pathlib import Path # Added import
from IPython.display import display, Markdown # Added import
# Define the dataset configuration
yaml_content = {
'path': str(YOLO_DATA_DIR.resolve()), # The absolute path to the dataset directory
'train': 'images/train', # Path to training images (relative to 'path')
'val': 'images/val', # Path to validation images (relative to 'path')
'names': {v: k for k, v in CLASS_MAP.items()} # Class names map (e.g., {0: 'Ripe Banana', 1: 'Unripe Banana'})
}
# Write the configuration to a file
yaml_file = YOLO_DATA_DIR / "dataset.yaml"
with open(yaml_file, 'w') as f:
yaml.dump(yaml_content, f, default_flow_style=False, sort_keys=False)
print(f"Created dataset configuration at '{yaml_file}'")
print("\n--- dataset.yaml content ---")
print(yaml.dump(yaml_content, sort_keys=False))
Understanding the dataset.yaml Fields
| Field | Meaning |
|---|---|
path |
Root directory of the YOLO dataset |
train |
Folder containing training images (relative to path) |
val |
Folder containing validation images (relative to path) |
names |
Class index → Class name dictionary used by YOLO during training |
Example dataset.yaml (Generated Output)
Created dataset configuration at '/content/yolo_Car_dataset/dataset.yaml'
--- dataset.yaml content ---
path: /content/yolo_Car_dataset
train: images/train
val: images/val
names:
0: Car
1: License Plate
Step 16: Training YOLOv8 on Our Dataset
Now that our dataset is structured and the dataset.yaml file is ready, we can train a YOLOv8 model.
Here, we use YOLOv8m (medium variant) for better accuracy.
from ultralytics import YOLO
from pathlib import Path
# Load medium model for better accuracy
model = YOLO('yolov8m.pt')
results = model.train(
data=str(yaml_file), # Path to your dataset YAML
epochs=100, # Increase training epochs for better learning
imgsz=640,
batch=4,
freeze=0,
project='car_license_training',
name='yolov8m_finetuned'
)
print("\n YOLOv8m Object Detection Training Complete!")
What This Code Does
| Parameter | Description |
|---|---|
YOLO('yolov8m.pt') |
Loads the YOLOv8 Medium model (balanced accuracy & speed) |
data=str(yaml_file) |
Uses the dataset YAML we generated earlier |
epochs=100 |
Number of training cycles (higher = better accuracy) |
imgsz=640 |
Input image resolution (default recommended for YOLOv8) |
batch=4 |
Number of images processed per training step |
freeze=0 |
Allows full fine-tuning of entire model |
project= |
Folder where training logs & weights are stored |
name= |
Name of the trained model checkpoint subfolder |
** Tip**
To get higher accuracy, you may later increase:
epochs = 150 or 200
Step 17: Inference: Test the Trained YOLOv8 Model on Validation Images
Now that training is complete, let’s load the best weights and run inference on a few validation images.
We’ll draw colored boxes: green for license plates, yellow for cars.
from ultralytics import YOLO
from pathlib import Path
from IPython.display import display, Markdown
from PIL import Image
import cv2
import numpy as np
# Load best weights
best_weights = Path("/content/car_license_training/yolov8m_finetuned/weights/best.pt")
if best_weights.exists():
infer_model = YOLO(str(best_weights))
display(Markdown(f"Using best weights: `{best_weights}`"))
else:
raise FileNotFoundError(" best.pt not found!")
# Load validation images
val_dir = Path("/content/yolo_Car_dataset/images/val")
test_images = [p for p in val_dir.glob("*") if p.suffix.lower() in {".jpg", ".jpeg", ".png"}][:6]
if not test_images:
raise FileNotFoundError("No validation images found.")
# Inference
display(Markdown("### Running inference with lower confidence (0.10)..."))
for img_path in test_images:
preds = infer_model.predict(source=str(img_path), imgsz=640, conf=0.10, save=False, verbose=False)
result = preds[0]
img = cv2.imread(str(img_path))
if img is None:
print(f" Could not read image: {img_path}")
continue
for box in result.boxes:
x1, y1, x2, y2 = map(int, box.xyxy[0])
cls = int(box.cls[0])
conf = float(box.conf[0])
class_name = infer_model.names[cls]
label = f"{class_name} {conf:.2f}"
# Color code
if class_name.lower() in ["license plate", "plate"]:
color = (0, 255, 0) # Green for license plate
else:
color = (0, 255, 255) # Yellow for car
cv2.rectangle(img, (x1, y1), (x2, y2), color, 2)
cv2.putText(img, label, (x1, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
display(Image.fromarray(img_rgb))
display(Markdown("Inference complete — check if license plates appear now!"))
What This Inference Code Does
| Part | Purpose |
|---|---|
Load best.pt
|
Uses the best checkpoint from training for inference |
| Collect validation images | Selects sample images to test predictions |
infer_model.predict(...) |
Runs YOLOv8 inference on each image |
| Draw bounding boxes + confidence scores | Visualizes detection results clearly |
| Color coding | Green = License Plate, Yellow = Car |
| Display results inline | Shows final annotated images directly in notebook/output |
Expected Output
Using
best weights: /content/car_license_training/yolov8m_finetuned/weights/best.pt
Running inference with lower confidence (0.10)...
Google Colab Notebook
Run the full workflow, experiment with the dataset, or retrain the model:
https://colab.research.google.com/drive/1KmontQoTchCJ9oqqGeBaZ0mxCD4eVZTU#scrollTo=2SdNLRuRVLDt
Thanks for reading! If you found this helpful, feel free to connect or leave feedback

Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.