SIDDHARTH PATIL

Posted on Oct 6

smart suspension- raj patil

#machinelearning #python #ai #tutorial

Pothole Detection Model - Complete Training & Implementation Guide

Dataset Acquisition
Dataset Preparation with Roboflow
Model Training Process
Understanding the .pt Weight File
YOLOv8 Implementation Code
Complete Workflow

1. Dataset Acquisition

1.1 Source: Kaggle

Platform: Kaggle (kaggle.com)

Dataset Selection Process:

Searched for "pothole detection dataset" on Kaggle
Selected dataset with diverse pothole images
Typical dataset characteristics:
- 500-2000+ images of roads with potholes
- Various lighting conditions
- Different road types and pothole sizes
- Mix of annotated and unannotated images

Download Process:

# Method 1: Manual Download
1. Navigate to Kaggle dataset page
2. Click "Download" button
3. Extract ZIP file to local directory

# Method 2: Kaggle API
kaggle datasets download -d <dataset-name>
unzip <dataset-name>.zip

1.2 Dataset Structure

pothole_dataset/
│
├── images/
│   ├── image001.jpg
│   ├── image002.jpg
│   └── ...
│
└── annotations/
    ├── image001.txt
    ├── image002.txt
    └── ...

2. Dataset Preparation with Roboflow

2.1 Why Roboflow?

Roboflow is a computer vision platform that simplifies:

Dataset organization and management
Image annotation and labeling
Data augmentation
Format conversion (to YOLO format)
Train/Validation/Test splitting
Model training integration

2.2 Roboflow Workflow

Step 1: Create Project

Sign up at roboflow.com
Create new project: "Pothole Detection"
Select project type: Object Detection
Choose annotation group: Single Class (Pothole)

Step 2: Upload Dataset

1. Click "Upload" → Select images from Kaggle dataset
2. Roboflow automatically processes images
3. Wait for upload completion (shows progress bar)

Step 3: Annotation

If images are pre-annotated (COCO/PASCAL VOC format):
- Roboflow auto-imports annotations
- Review and verify bounding boxes
If manual annotation needed:
- Use Roboflow's annotation tool
- Draw bounding boxes around each pothole
- Label as "pothole"
- Save annotations

Step 4: Dataset Augmentation

Applied augmentation techniques:

Preprocessing:
- Auto-Orient: Correct image orientation
- Resize: 640x640 pixels (YOLO standard)

Augmentation:
- Rotation: ±15 degrees
- Brightness: ±25%
- Exposure: ±25%
- Blur: Up to 2px
- Flip: Horizontal

These create multiple variations of each image, expanding dataset size 3-5x.

Step 5: Generate Dataset Version

1. Split data:
   - Train: 70%
   - Validation: 20%
   - Test: 10%

2. Export format: YOLOv8

3. Generate → Roboflow creates downloadable dataset

Step 6: Download Training Code

Roboflow provides ready-to-use code snippet:

from roboflow import Roboflow

rf = Roboflow(api_key="YOUR_API_KEY")
project = rf.workspace("workspace-name").project("pothole-detection")
dataset = project.version(1).download("yolov8")

3. Model Training Process

3.1 Training Script: `potholewrightfile.py`

Purpose: This Python script trains the YOLOv8 model on your custom pothole dataset.

Complete Training Code:

"""
potholewrightfile.py
Custom YOLOv8 Pothole Detection Model Training Script
"""

from ultralytics import YOLO
from roboflow import Roboflow
import os

# ====================
# 1. DATASET DOWNLOAD
# ====================
print("📥 Downloading dataset from Roboflow...")

rf = Roboflow(api_key="YOUR_ROBOFLOW_API_KEY")
project = rf.workspace("your-workspace").project("pothole-detection")
dataset = project.version(1).download("yolov8")

print(f"✅ Dataset downloaded to: {dataset.location}")

# ====================
# 2. MODEL INITIALIZATION
# ====================
print("\n🤖 Initializing YOLOv8 model...")

# Start with pre-trained YOLOv8 base model
model = YOLO('yolov8n.pt')  # Options: yolov8n/s/m/l/x (nano to extra-large)

print("✅ Model loaded successfully")

# ====================
# 3. TRAINING CONFIGURATION
# ====================
print("\n⚙️ Configuring training parameters...")

training_config = {
    'data': f'{dataset.location}/data.yaml',  # Path to dataset config
    'epochs': 100,                             # Number of training iterations
    'imgsz': 640,                              # Image size (640x640)
    'batch': 16,                               # Batch size (adjust based on GPU)
    'name': 'pothole_detection',              # Experiment name
    'patience': 20,                            # Early stopping patience
    'save': True,                              # Save checkpoints
    'device': 0,                               # GPU device (0 = first GPU, 'cpu' for CPU)
    'workers': 8,                              # Number of data loader workers
    'project': 'runs/detect',                  # Project directory
    'exist_ok': True,                          # Overwrite existing project
    'pretrained': True,                        # Use pretrained weights
    'optimizer': 'auto',                       # Optimizer (auto/SGD/Adam/AdamW)
    'verbose': True,                           # Verbose output
    'seed': 42,                                # Random seed for reproducibility
    'deterministic': True,                     # Deterministic training
    'single_cls': False,                       # Treat as single class
    'rect': False,                             # Rectangular training
    'cos_lr': False,                           # Cosine learning rate scheduler
    'close_mosaic': 10,                        # Disable mosaic augmentation in last N epochs
    'resume': False,                           # Resume from checkpoint
    'amp': True,                               # Automatic Mixed Precision training
    'fraction': 1.0,                           # Fraction of dataset to train on
    'profile': False,                          # Profile ONNX and TensorRT speeds
    'lr0': 0.01,                               # Initial learning rate
    'lrf': 0.01,                               # Final learning rate (lr0 * lrf)
    'momentum': 0.937,                         # SGD momentum/Adam beta1
    'weight_decay': 0.0005,                    # Optimizer weight decay
    'warmup_epochs': 3.0,                      # Warmup epochs
    'warmup_momentum': 0.8,                    # Warmup initial momentum
    'warmup_bias_lr': 0.1,                     # Warmup initial bias learning rate
    'box': 7.5,                                # Box loss gain
    'cls': 0.5,                                # Class loss gain
    'dfl': 1.5,                                # DFL loss gain
    'pose': 12.0,                              # Pose loss gain
    'kobj': 2.0,                               # Keypoint object loss gain
    'label_smoothing': 0.0,                    # Label smoothing epsilon
    'nbs': 64,                                 # Nominal batch size
    'hsv_h': 0.015,                            # HSV-Hue augmentation
    'hsv_s': 0.7,                              # HSV-Saturation augmentation
    'hsv_v': 0.4,                              # HSV-Value augmentation
    'degrees': 0.0,                            # Rotation augmentation
    'translate': 0.1,                          # Translation augmentation
    'scale': 0.5,                              # Scaling augmentation
    'shear': 0.0,                              # Shear augmentation
    'perspective': 0.0,                        # Perspective augmentation
    'flipud': 0.0,                             # Flip up-down augmentation probability
    'fliplr': 0.5,                             # Flip left-right augmentation probability
    'mosaic': 1.0,                             # Mosaic augmentation probability
    'mixup': 0.0,                              # MixUp augmentation probability
    'copy_paste': 0.0,                         # Copy-paste augmentation probability
}

# ====================
# 4. START TRAINING
# ====================
print("\n🚀 Starting training process...\n")
print("="*60)

results = model.train(**training_config)

print("\n" + "="*60)
print("✅ Training completed successfully!")

# ====================
# 5. SAVE MODEL
# ====================
print("\n💾 Saving trained model...")

# The best weights are automatically saved as 'best.pt'
# Rename to 'pothole.pt' for clarity
best_model_path = 'runs/detect/pothole_detection/weights/best.pt'
output_path = 'pothole.pt'

if os.path.exists(best_model_path):
    import shutil
    shutil.copy(best_model_path, output_path)
    print(f"✅ Model saved as: {output_path}")
else:
    print("⚠️ Best model not found. Check training directory.")

# ====================
# 6. MODEL VALIDATION
# ====================
print("\n🔍 Validating model on test set...")

validation_results = model.val()

print("\n📊 Validation Metrics:")
print(f"mAP50: {validation_results.box.map50:.4f}")
print(f"mAP50-95: {validation_results.box.map:.4f}")
print(f"Precision: {validation_results.box.mp:.4f}")
print(f"Recall: {validation_results.box.mr:.4f}")

# ====================
# 7. EXPORT MODEL
# ====================
print("\n📦 Exporting model to different formats...")

# Export to ONNX (for deployment)
model.export(format='onnx')
print("✅ ONNX model exported")

# Export to TensorRT (for NVIDIA devices - optional)
# model.export(format='engine')

# Export to TensorFlow (optional)
# model.export(format='tflite')

print("\n" + "="*60)
print("🎉 Training pipeline completed successfully!")
print("="*60)
print(f"\n📁 Trained model location: {output_path}")
print(f"📊 Training results: runs/detect/pothole_detection/")
print(f"📈 View results using: tensorboard --logdir runs/detect/pothole_detection/")

3.2 Running the Training Script

# Install required packages
pip install ultralytics roboflow opencv-python torch torchvision

# Run training
python potholewrightfile.py

3.3 Training Output

The script creates the following structure:

runs/detect/pothole_detection/
│
├── weights/
│   ├── best.pt          # Best performing model
│   └── last.pt          # Last epoch model
│
├── confusion_matrix.png  # Classification confusion matrix
├── results.csv           # Training metrics per epoch
├── results.png           # Training curves (loss, mAP, etc.)
├── F1_curve.png         # F1 score curve
├── P_curve.png          # Precision curve
├── R_curve.png          # Recall curve
├── PR_curve.png         # Precision-Recall curve
└── val_batch0_pred.jpg  # Validation predictions sample

4. Understanding the .pt Weight File

4.1 What is a .pt File?

.pt = PyTorch file extension

Contains trained neural network weights (parameters)
Stores model architecture information
Includes optimizer state and training configuration
Binary format (not human-readable)
Typical size: 6MB (nano) to 140MB (extra-large)

4.2 What's Inside pothole.pt?

import torch

# Load the .pt file
model_data = torch.load('pothole.pt')

# Contents:
{
    'model': <trained neural network weights>,
    'optimizer': <optimizer state>,
    'training_results': <loss, mAP, metrics>,
    'epoch': <last trained epoch number>,
    'date': <training completion date>,
    'version': <YOLOv8 version>,
}

4.3 How YOLOv8 Uses pothole.pt

When you load the model:

model = YOLO('pothole.pt')

YOLOv8 does the following:

Reads file: Loads binary weights from disk
Reconstructs architecture: Builds neural network layers
Applies weights: Sets each neuron's learned parameters
Prepares for inference: Model ready to detect potholes

Think of it like a brain transplant:

yolov8n.pt = Generic brain (knows common objects)
pothole.pt = Specialized brain (expert at finding potholes)

5. YOLOv8 Implementation Code

5.1 Basic Detection Script

"""
pothole_detector.py
Real-time pothole detection using trained YOLOv8 model
"""

from ultralytics import YOLO
import cv2
import numpy as np
from datetime import datetime

# ====================
# 1. LOAD TRAINED MODEL
# ====================
print("🔄 Loading pothole detection model...")
model = YOLO('pothole.pt')  # Load your custom trained weights
print("✅ Model loaded successfully\n")

# ====================
# 2. CONFIGURATION
# ====================
CONFIDENCE_THRESHOLD = 0.5  # Minimum confidence for detection
VIDEO_SOURCE = 'road_video.mp4'  # Video file or 0 for webcam
OUTPUT_VIDEO = 'pothole_detected.mp4'
SHOW_CONFIDENCE = True
SAVE_VIDEO = True

# ====================
# 3. VIDEO CAPTURE
# ====================
cap = cv2.VideoCapture(VIDEO_SOURCE)

if not cap.isOpened():
    print("❌ Error: Cannot open video source")
    exit()

# Get video properties
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))

print(f"📹 Video Info:")
print(f"   Resolution: {frame_width}x{frame_height}")
print(f"   FPS: {fps}\n")

# ====================
# 4. VIDEO WRITER (Optional)
# ====================
if SAVE_VIDEO:
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter(OUTPUT_VIDEO, fourcc, fps, (frame_width, frame_height))

# ====================
# 5. DETECTION LOOP
# ====================
frame_count = 0
total_potholes_detected = 0

print("🚀 Starting detection...\n")
print("Press 'q' to quit, 'p' to pause\n")

while cap.isOpened():
    ret, frame = cap.read()

    if not ret:
        print("\n📹 End of video or error reading frame")
        break

    frame_count += 1

    # ---------------------
    # Run YOLOv8 Detection
    # ---------------------
    results = model(frame, conf=CONFIDENCE_THRESHOLD)

    # Extract detection information
    detections = results[0].boxes
    num_potholes = len(detections)
    total_potholes_detected += num_potholes

    # ---------------------
    # Draw Bounding Boxes
    # ---------------------
    annotated_frame = frame.copy()

    for detection in detections:
        # Get bounding box coordinates
        x1, y1, x2, y2 = map(int, detection.xyxy[0])

        # Get confidence score
        confidence = float(detection.conf[0])

        # Get class (should be 'pothole')
        class_id = int(detection.cls[0])
        class_name = model.names[class_id]

        # Draw bounding box
        cv2.rectangle(annotated_frame, (x1, y1), (x2, y2), (0, 0, 255), 2)

        # Create label
        if SHOW_CONFIDENCE:
            label = f"{class_name}: {confidence:.2f}"
        else:
            label = class_name

        # Draw label background
        label_size, _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 2)
        cv2.rectangle(annotated_frame, 
                     (x1, y1 - label_size[1] - 10), 
                     (x1 + label_size[0], y1), 
                     (0, 0, 255), -1)

        # Draw label text
        cv2.putText(annotated_frame, label, (x1, y1 - 5),
                   cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 2)

    # ---------------------
    # Add Info Overlay
    # ---------------------
    info_text = [
        f"Frame: {frame_count}",
        f"Potholes in frame: {num_potholes}",
        f"Total detected: {total_potholes_detected}",
        f"FPS: {fps}"
    ]

    y_offset = 30
    for text in info_text:
        cv2.putText(annotated_frame, text, (10, y_offset),
                   cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)
        y_offset += 30

    # ---------------------
    # Display Frame
    # ---------------------
    cv2.imshow('Pothole Detection', annotated_frame)

    # Save frame to output video
    if SAVE_VIDEO:
        out.write(annotated_frame)

    # ---------------------
    # Keyboard Controls
    # ---------------------
    key = cv2.waitKey(1) & 0xFF

    if key == ord('q'):
        print("\n⏹️ Stopping detection...")
        break
    elif key == ord('p'):
        print("\n⏸️ Paused. Press any key to continue...")
        cv2.waitKey(0)

    # Print progress every 30 frames
    if frame_count % 30 == 0:
        print(f"Processed {frame_count} frames | Potholes: {total_potholes_detected}")

# ====================
# 6. CLEANUP
# ====================
cap.release()
if SAVE_VIDEO:
    out.release()
cv2.destroyAllWindows()

# ====================
# 7. SUMMARY
# ====================
print("\n" + "="*60)
print("📊 DETECTION SUMMARY")
print("="*60)
print(f"Total frames processed: {frame_count}")
print(f"Total potholes detected: {total_potholes_detected}")
print(f"Average potholes per frame: {total_potholes_detected/frame_count:.2f}")
if SAVE_VIDEO:
    print(f"Output saved to: {OUTPUT_VIDEO}")
print("="*60)

5.2 Image Detection Script

"""
detect_image.py
Detect potholes in a single image
"""

from ultralytics import YOLO
import cv2

# Load model
model = YOLO('pothole.pt')

# Load image
image_path = 'road_image.jpg'
image = cv2.imread(image_path)

# Run detection
results = model(image, conf=0.5)

# Get annotated image
annotated_image = results[0].plot()

# Display
cv2.imshow('Pothole Detection', annotated_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

# Save result
cv2.imwrite('detected_potholes.jpg', annotated_image)
print("✅ Detection complete. Result saved as 'detected_potholes.jpg'")

5.3 Batch Processing Script

"""
batch_detect.py
Process multiple images from a folder
"""

from ultralytics import YOLO
import cv2
import os
from pathlib import Path

# Configuration
INPUT_FOLDER = 'input_images/'
OUTPUT_FOLDER = 'output_images/'
model = YOLO('pothole.pt')

# Create output folder
Path(OUTPUT_FOLDER).mkdir(exist_ok=True)

# Get all images
image_files = [f for f in os.listdir(INPUT_FOLDER) 
               if f.lower().endswith(('.jpg', '.jpeg', '.png'))]

print(f"📁 Found {len(image_files)} images to process\n")

# Process each image
for idx, filename in enumerate(image_files, 1):
    print(f"Processing {idx}/{len(image_files)}: {filename}")

    # Read image
    image_path = os.path.join(INPUT_FOLDER, filename)
    image = cv2.imread(image_path)

    # Detect
    results = model(image, conf=0.5)
    annotated = results[0].plot()

    # Save
    output_path = os.path.join(OUTPUT_FOLDER, f"detected_{filename}")
    cv2.imwrite(output_path, annotated)

    # Count detections
    num_potholes = len(results[0].boxes)
    print(f"   ✅ Detected {num_potholes} pothole(s)\n")

print("🎉 Batch processing complete!")

6. Complete Workflow

Step-by-Step Process:

1. DATASET ACQUISITION (Kaggle)
   ├── Search for pothole dataset
   ├── Download dataset (images + annotations)
   └── Extract to local folder
          ↓
2. DATASET PREPARATION (Roboflow)
   ├── Create project
   ├── Upload images
   ├── Annotate/verify annotations
   ├── Apply augmentations
   ├── Split train/val/test
   └── Export as YOLOv8 format
          ↓
3. MODEL TRAINING (potholewrightfile.py)
   ├── Download dataset from Roboflow
   ├── Initialize YOLOv8 base model
   ├── Configure training parameters
   ├── Train for 100 epochs
   ├── Validate performance
   └── Save best weights as pothole.pt
          ↓
4. MODEL DEPLOYMENT (YOLOv8 + OpenCV)
   ├── Load pothole.pt weights
   ├── Initialize video capture
   ├── Process frames in loop:
   │   ├── Read frame
   │   ├── Run YOLOv8 inference
   │   ├── Extract bounding boxes
   │   ├── Draw annotations
   │   └── Display/save results
   └── Generate detection summary

Technical Flow Diagram:

Kaggle Dataset → Roboflow Processing → YOLOv8 Training → pothole.pt
                                                              ↓
                                                         Inference
                                                              ↓
                  Video/Image Input → OpenCV → YOLOv8 → Detections

7. Key Concepts Explained

7.1 Transfer Learning

Start with yolov8n.pt (pre-trained on COCO dataset)
Fine-tune on pothole-specific images
Model learns pothole features while retaining general object detection ability

7.2 Data Augmentation

Creates artificial variations of training images
Prevents overfitting
Improves model generalization
Examples: rotation, brightness, flipping

7.3 Epochs

One complete pass through entire training dataset
100 epochs = model sees all training images 100 times
More epochs = better learning (up to a point)

7.4 Confidence Threshold

Minimum score for detection to be considered valid
0.5 = 50% confidence
Higher threshold = fewer false positives, more missed detections
Lower threshold = more detections, more false positives

7.5 Bounding Box

Rectangle drawn around detected pothole
Defined by coordinates: (x1, y1, x2, y2)
(x1, y1) = top-left corner
(x2, y2) = bottom-right corner

8. Troubleshooting

Common Issues:

Issue: "CUDA out of memory" error
Solution: Reduce batch size in training config (e.g., batch=8)

Issue: Low detection accuracy
Solution:

Increase training epochs
Add more diverse training images
Adjust confidence threshold
Check annotation quality

Issue: Model detects non-potholes
Solution:

Add hard negative examples to training set
Increase training epochs
Adjust confidence threshold higher

Issue: Slow inference speed
Solution:

Use smaller model (yolov8n instead of yolov8x)
Reduce input image size
Use GPU instead of CPU

9. Performance Metrics

Training Metrics Explained:

mAP50: Mean Average Precision at 50% IoU threshold (higher is better)
mAP50-95: mAP averaged over IoU thresholds 50%-95% (more strict)
Precision: Percentage of correct detections out of all detections
Recall: Percentage of actual potholes that were detected
F1 Score: Harmonic mean of precision and recall

Target Performance:

mAP50: > 0.70 (Good)
Precision: > 0.75 (Good)
Recall: > 0.70 (Good)