DEV Community

SIDDHARTH PATIL
SIDDHARTH PATIL

Posted on

smart suspension- raj patil

Pothole Detection Model - Complete Training & Implementation Guide

Table of Contents

  1. Dataset Acquisition
  2. Dataset Preparation with Roboflow
  3. Model Training Process
  4. Understanding the .pt Weight File
  5. YOLOv8 Implementation Code
  6. Complete Workflow

1. Dataset Acquisition

1.1 Source: Kaggle

Platform: Kaggle (kaggle.com)

Dataset Selection Process:

  • Searched for "pothole detection dataset" on Kaggle
  • Selected dataset with diverse pothole images
  • Typical dataset characteristics:
    • 500-2000+ images of roads with potholes
    • Various lighting conditions
    • Different road types and pothole sizes
    • Mix of annotated and unannotated images

Download Process:

# Method 1: Manual Download
1. Navigate to Kaggle dataset page
2. Click "Download" button
3. Extract ZIP file to local directory

# Method 2: Kaggle API
kaggle datasets download -d <dataset-name>
unzip <dataset-name>.zip
Enter fullscreen mode Exit fullscreen mode

1.2 Dataset Structure

pothole_dataset/
β”‚
β”œβ”€β”€ images/
β”‚   β”œβ”€β”€ image001.jpg
β”‚   β”œβ”€β”€ image002.jpg
β”‚   └── ...
β”‚
└── annotations/
    β”œβ”€β”€ image001.txt
    β”œβ”€β”€ image002.txt
    └── ...
Enter fullscreen mode Exit fullscreen mode

2. Dataset Preparation with Roboflow

2.1 Why Roboflow?

Roboflow is a computer vision platform that simplifies:

  • Dataset organization and management
  • Image annotation and labeling
  • Data augmentation
  • Format conversion (to YOLO format)
  • Train/Validation/Test splitting
  • Model training integration

2.2 Roboflow Workflow

Step 1: Create Project

  1. Sign up at roboflow.com
  2. Create new project: "Pothole Detection"
  3. Select project type: Object Detection
  4. Choose annotation group: Single Class (Pothole)

Step 2: Upload Dataset

1. Click "Upload" β†’ Select images from Kaggle dataset
2. Roboflow automatically processes images
3. Wait for upload completion (shows progress bar)
Enter fullscreen mode Exit fullscreen mode

Step 3: Annotation

  • If images are pre-annotated (COCO/PASCAL VOC format):

    • Roboflow auto-imports annotations
    • Review and verify bounding boxes
  • If manual annotation needed:

    • Use Roboflow's annotation tool
    • Draw bounding boxes around each pothole
    • Label as "pothole"
    • Save annotations

Step 4: Dataset Augmentation

Applied augmentation techniques:

Preprocessing:
- Auto-Orient: Correct image orientation
- Resize: 640x640 pixels (YOLO standard)

Augmentation:
- Rotation: Β±15 degrees
- Brightness: Β±25%
- Exposure: Β±25%
- Blur: Up to 2px
- Flip: Horizontal
Enter fullscreen mode Exit fullscreen mode

These create multiple variations of each image, expanding dataset size 3-5x.

Step 5: Generate Dataset Version

1. Split data:
   - Train: 70%
   - Validation: 20%
   - Test: 10%

2. Export format: YOLOv8

3. Generate β†’ Roboflow creates downloadable dataset
Enter fullscreen mode Exit fullscreen mode

Step 6: Download Training Code

Roboflow provides ready-to-use code snippet:

from roboflow import Roboflow

rf = Roboflow(api_key="YOUR_API_KEY")
project = rf.workspace("workspace-name").project("pothole-detection")
dataset = project.version(1).download("yolov8")
Enter fullscreen mode Exit fullscreen mode

3. Model Training Process

3.1 Training Script: potholewrightfile.py

Purpose: This Python script trains the YOLOv8 model on your custom pothole dataset.

Complete Training Code:

"""
potholewrightfile.py
Custom YOLOv8 Pothole Detection Model Training Script
"""

from ultralytics import YOLO
from roboflow import Roboflow
import os

# ====================
# 1. DATASET DOWNLOAD
# ====================
print("πŸ“₯ Downloading dataset from Roboflow...")

rf = Roboflow(api_key="YOUR_ROBOFLOW_API_KEY")
project = rf.workspace("your-workspace").project("pothole-detection")
dataset = project.version(1).download("yolov8")

print(f"βœ… Dataset downloaded to: {dataset.location}")

# ====================
# 2. MODEL INITIALIZATION
# ====================
print("\nπŸ€– Initializing YOLOv8 model...")

# Start with pre-trained YOLOv8 base model
model = YOLO('yolov8n.pt')  # Options: yolov8n/s/m/l/x (nano to extra-large)

print("βœ… Model loaded successfully")

# ====================
# 3. TRAINING CONFIGURATION
# ====================
print("\nβš™οΈ Configuring training parameters...")

training_config = {
    'data': f'{dataset.location}/data.yaml',  # Path to dataset config
    'epochs': 100,                             # Number of training iterations
    'imgsz': 640,                              # Image size (640x640)
    'batch': 16,                               # Batch size (adjust based on GPU)
    'name': 'pothole_detection',              # Experiment name
    'patience': 20,                            # Early stopping patience
    'save': True,                              # Save checkpoints
    'device': 0,                               # GPU device (0 = first GPU, 'cpu' for CPU)
    'workers': 8,                              # Number of data loader workers
    'project': 'runs/detect',                  # Project directory
    'exist_ok': True,                          # Overwrite existing project
    'pretrained': True,                        # Use pretrained weights
    'optimizer': 'auto',                       # Optimizer (auto/SGD/Adam/AdamW)
    'verbose': True,                           # Verbose output
    'seed': 42,                                # Random seed for reproducibility
    'deterministic': True,                     # Deterministic training
    'single_cls': False,                       # Treat as single class
    'rect': False,                             # Rectangular training
    'cos_lr': False,                           # Cosine learning rate scheduler
    'close_mosaic': 10,                        # Disable mosaic augmentation in last N epochs
    'resume': False,                           # Resume from checkpoint
    'amp': True,                               # Automatic Mixed Precision training
    'fraction': 1.0,                           # Fraction of dataset to train on
    'profile': False,                          # Profile ONNX and TensorRT speeds
    'lr0': 0.01,                               # Initial learning rate
    'lrf': 0.01,                               # Final learning rate (lr0 * lrf)
    'momentum': 0.937,                         # SGD momentum/Adam beta1
    'weight_decay': 0.0005,                    # Optimizer weight decay
    'warmup_epochs': 3.0,                      # Warmup epochs
    'warmup_momentum': 0.8,                    # Warmup initial momentum
    'warmup_bias_lr': 0.1,                     # Warmup initial bias learning rate
    'box': 7.5,                                # Box loss gain
    'cls': 0.5,                                # Class loss gain
    'dfl': 1.5,                                # DFL loss gain
    'pose': 12.0,                              # Pose loss gain
    'kobj': 2.0,                               # Keypoint object loss gain
    'label_smoothing': 0.0,                    # Label smoothing epsilon
    'nbs': 64,                                 # Nominal batch size
    'hsv_h': 0.015,                            # HSV-Hue augmentation
    'hsv_s': 0.7,                              # HSV-Saturation augmentation
    'hsv_v': 0.4,                              # HSV-Value augmentation
    'degrees': 0.0,                            # Rotation augmentation
    'translate': 0.1,                          # Translation augmentation
    'scale': 0.5,                              # Scaling augmentation
    'shear': 0.0,                              # Shear augmentation
    'perspective': 0.0,                        # Perspective augmentation
    'flipud': 0.0,                             # Flip up-down augmentation probability
    'fliplr': 0.5,                             # Flip left-right augmentation probability
    'mosaic': 1.0,                             # Mosaic augmentation probability
    'mixup': 0.0,                              # MixUp augmentation probability
    'copy_paste': 0.0,                         # Copy-paste augmentation probability
}

# ====================
# 4. START TRAINING
# ====================
print("\nπŸš€ Starting training process...\n")
print("="*60)

results = model.train(**training_config)

print("\n" + "="*60)
print("βœ… Training completed successfully!")

# ====================
# 5. SAVE MODEL
# ====================
print("\nπŸ’Ύ Saving trained model...")

# The best weights are automatically saved as 'best.pt'
# Rename to 'pothole.pt' for clarity
best_model_path = 'runs/detect/pothole_detection/weights/best.pt'
output_path = 'pothole.pt'

if os.path.exists(best_model_path):
    import shutil
    shutil.copy(best_model_path, output_path)
    print(f"βœ… Model saved as: {output_path}")
else:
    print("⚠️ Best model not found. Check training directory.")

# ====================
# 6. MODEL VALIDATION
# ====================
print("\nπŸ” Validating model on test set...")

validation_results = model.val()

print("\nπŸ“Š Validation Metrics:")
print(f"mAP50: {validation_results.box.map50:.4f}")
print(f"mAP50-95: {validation_results.box.map:.4f}")
print(f"Precision: {validation_results.box.mp:.4f}")
print(f"Recall: {validation_results.box.mr:.4f}")

# ====================
# 7. EXPORT MODEL
# ====================
print("\nπŸ“¦ Exporting model to different formats...")

# Export to ONNX (for deployment)
model.export(format='onnx')
print("βœ… ONNX model exported")

# Export to TensorRT (for NVIDIA devices - optional)
# model.export(format='engine')

# Export to TensorFlow (optional)
# model.export(format='tflite')

print("\n" + "="*60)
print("πŸŽ‰ Training pipeline completed successfully!")
print("="*60)
print(f"\nπŸ“ Trained model location: {output_path}")
print(f"πŸ“Š Training results: runs/detect/pothole_detection/")
print(f"πŸ“ˆ View results using: tensorboard --logdir runs/detect/pothole_detection/")
Enter fullscreen mode Exit fullscreen mode

3.2 Running the Training Script

# Install required packages
pip install ultralytics roboflow opencv-python torch torchvision

# Run training
python potholewrightfile.py
Enter fullscreen mode Exit fullscreen mode

3.3 Training Output

The script creates the following structure:

runs/detect/pothole_detection/
β”‚
β”œβ”€β”€ weights/
β”‚   β”œβ”€β”€ best.pt          # Best performing model
β”‚   └── last.pt          # Last epoch model
β”‚
β”œβ”€β”€ confusion_matrix.png  # Classification confusion matrix
β”œβ”€β”€ results.csv           # Training metrics per epoch
β”œβ”€β”€ results.png           # Training curves (loss, mAP, etc.)
β”œβ”€β”€ F1_curve.png         # F1 score curve
β”œβ”€β”€ P_curve.png          # Precision curve
β”œβ”€β”€ R_curve.png          # Recall curve
β”œβ”€β”€ PR_curve.png         # Precision-Recall curve
└── val_batch0_pred.jpg  # Validation predictions sample
Enter fullscreen mode Exit fullscreen mode

4. Understanding the .pt Weight File

4.1 What is a .pt File?

.pt = PyTorch file extension

  • Contains trained neural network weights (parameters)
  • Stores model architecture information
  • Includes optimizer state and training configuration
  • Binary format (not human-readable)
  • Typical size: 6MB (nano) to 140MB (extra-large)

4.2 What's Inside pothole.pt?

import torch

# Load the .pt file
model_data = torch.load('pothole.pt')

# Contents:
{
    'model': <trained neural network weights>,
    'optimizer': <optimizer state>,
    'training_results': <loss, mAP, metrics>,
    'epoch': <last trained epoch number>,
    'date': <training completion date>,
    'version': <YOLOv8 version>,
}
Enter fullscreen mode Exit fullscreen mode

4.3 How YOLOv8 Uses pothole.pt

When you load the model:

model = YOLO('pothole.pt')
Enter fullscreen mode Exit fullscreen mode

YOLOv8 does the following:

  1. Reads file: Loads binary weights from disk
  2. Reconstructs architecture: Builds neural network layers
  3. Applies weights: Sets each neuron's learned parameters
  4. Prepares for inference: Model ready to detect potholes

Think of it like a brain transplant:

  • yolov8n.pt = Generic brain (knows common objects)
  • pothole.pt = Specialized brain (expert at finding potholes)

5. YOLOv8 Implementation Code

5.1 Basic Detection Script

"""
pothole_detector.py
Real-time pothole detection using trained YOLOv8 model
"""

from ultralytics import YOLO
import cv2
import numpy as np
from datetime import datetime

# ====================
# 1. LOAD TRAINED MODEL
# ====================
print("πŸ”„ Loading pothole detection model...")
model = YOLO('pothole.pt')  # Load your custom trained weights
print("βœ… Model loaded successfully\n")

# ====================
# 2. CONFIGURATION
# ====================
CONFIDENCE_THRESHOLD = 0.5  # Minimum confidence for detection
VIDEO_SOURCE = 'road_video.mp4'  # Video file or 0 for webcam
OUTPUT_VIDEO = 'pothole_detected.mp4'
SHOW_CONFIDENCE = True
SAVE_VIDEO = True

# ====================
# 3. VIDEO CAPTURE
# ====================
cap = cv2.VideoCapture(VIDEO_SOURCE)

if not cap.isOpened():
    print("❌ Error: Cannot open video source")
    exit()

# Get video properties
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))

print(f"πŸ“Ή Video Info:")
print(f"   Resolution: {frame_width}x{frame_height}")
print(f"   FPS: {fps}\n")

# ====================
# 4. VIDEO WRITER (Optional)
# ====================
if SAVE_VIDEO:
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter(OUTPUT_VIDEO, fourcc, fps, (frame_width, frame_height))

# ====================
# 5. DETECTION LOOP
# ====================
frame_count = 0
total_potholes_detected = 0

print("πŸš€ Starting detection...\n")
print("Press 'q' to quit, 'p' to pause\n")

while cap.isOpened():
    ret, frame = cap.read()

    if not ret:
        print("\nπŸ“Ή End of video or error reading frame")
        break

    frame_count += 1

    # ---------------------
    # Run YOLOv8 Detection
    # ---------------------
    results = model(frame, conf=CONFIDENCE_THRESHOLD)

    # Extract detection information
    detections = results[0].boxes
    num_potholes = len(detections)
    total_potholes_detected += num_potholes

    # ---------------------
    # Draw Bounding Boxes
    # ---------------------
    annotated_frame = frame.copy()

    for detection in detections:
        # Get bounding box coordinates
        x1, y1, x2, y2 = map(int, detection.xyxy[0])

        # Get confidence score
        confidence = float(detection.conf[0])

        # Get class (should be 'pothole')
        class_id = int(detection.cls[0])
        class_name = model.names[class_id]

        # Draw bounding box
        cv2.rectangle(annotated_frame, (x1, y1), (x2, y2), (0, 0, 255), 2)

        # Create label
        if SHOW_CONFIDENCE:
            label = f"{class_name}: {confidence:.2f}"
        else:
            label = class_name

        # Draw label background
        label_size, _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 2)
        cv2.rectangle(annotated_frame, 
                     (x1, y1 - label_size[1] - 10), 
                     (x1 + label_size[0], y1), 
                     (0, 0, 255), -1)

        # Draw label text
        cv2.putText(annotated_frame, label, (x1, y1 - 5),
                   cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 2)

    # ---------------------
    # Add Info Overlay
    # ---------------------
    info_text = [
        f"Frame: {frame_count}",
        f"Potholes in frame: {num_potholes}",
        f"Total detected: {total_potholes_detected}",
        f"FPS: {fps}"
    ]

    y_offset = 30
    for text in info_text:
        cv2.putText(annotated_frame, text, (10, y_offset),
                   cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)
        y_offset += 30

    # ---------------------
    # Display Frame
    # ---------------------
    cv2.imshow('Pothole Detection', annotated_frame)

    # Save frame to output video
    if SAVE_VIDEO:
        out.write(annotated_frame)

    # ---------------------
    # Keyboard Controls
    # ---------------------
    key = cv2.waitKey(1) & 0xFF

    if key == ord('q'):
        print("\n⏹️ Stopping detection...")
        break
    elif key == ord('p'):
        print("\n⏸️ Paused. Press any key to continue...")
        cv2.waitKey(0)

    # Print progress every 30 frames
    if frame_count % 30 == 0:
        print(f"Processed {frame_count} frames | Potholes: {total_potholes_detected}")

# ====================
# 6. CLEANUP
# ====================
cap.release()
if SAVE_VIDEO:
    out.release()
cv2.destroyAllWindows()

# ====================
# 7. SUMMARY
# ====================
print("\n" + "="*60)
print("πŸ“Š DETECTION SUMMARY")
print("="*60)
print(f"Total frames processed: {frame_count}")
print(f"Total potholes detected: {total_potholes_detected}")
print(f"Average potholes per frame: {total_potholes_detected/frame_count:.2f}")
if SAVE_VIDEO:
    print(f"Output saved to: {OUTPUT_VIDEO}")
print("="*60)
Enter fullscreen mode Exit fullscreen mode

5.2 Image Detection Script

"""
detect_image.py
Detect potholes in a single image
"""

from ultralytics import YOLO
import cv2

# Load model
model = YOLO('pothole.pt')

# Load image
image_path = 'road_image.jpg'
image = cv2.imread(image_path)

# Run detection
results = model(image, conf=0.5)

# Get annotated image
annotated_image = results[0].plot()

# Display
cv2.imshow('Pothole Detection', annotated_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

# Save result
cv2.imwrite('detected_potholes.jpg', annotated_image)
print("βœ… Detection complete. Result saved as 'detected_potholes.jpg'")
Enter fullscreen mode Exit fullscreen mode

5.3 Batch Processing Script

"""
batch_detect.py
Process multiple images from a folder
"""

from ultralytics import YOLO
import cv2
import os
from pathlib import Path

# Configuration
INPUT_FOLDER = 'input_images/'
OUTPUT_FOLDER = 'output_images/'
model = YOLO('pothole.pt')

# Create output folder
Path(OUTPUT_FOLDER).mkdir(exist_ok=True)

# Get all images
image_files = [f for f in os.listdir(INPUT_FOLDER) 
               if f.lower().endswith(('.jpg', '.jpeg', '.png'))]

print(f"πŸ“ Found {len(image_files)} images to process\n")

# Process each image
for idx, filename in enumerate(image_files, 1):
    print(f"Processing {idx}/{len(image_files)}: {filename}")

    # Read image
    image_path = os.path.join(INPUT_FOLDER, filename)
    image = cv2.imread(image_path)

    # Detect
    results = model(image, conf=0.5)
    annotated = results[0].plot()

    # Save
    output_path = os.path.join(OUTPUT_FOLDER, f"detected_{filename}")
    cv2.imwrite(output_path, annotated)

    # Count detections
    num_potholes = len(results[0].boxes)
    print(f"   βœ… Detected {num_potholes} pothole(s)\n")

print("πŸŽ‰ Batch processing complete!")
Enter fullscreen mode Exit fullscreen mode

6. Complete Workflow

Step-by-Step Process:

1. DATASET ACQUISITION (Kaggle)
   β”œβ”€β”€ Search for pothole dataset
   β”œβ”€β”€ Download dataset (images + annotations)
   └── Extract to local folder
          ↓
2. DATASET PREPARATION (Roboflow)
   β”œβ”€β”€ Create project
   β”œβ”€β”€ Upload images
   β”œβ”€β”€ Annotate/verify annotations
   β”œβ”€β”€ Apply augmentations
   β”œβ”€β”€ Split train/val/test
   └── Export as YOLOv8 format
          ↓
3. MODEL TRAINING (potholewrightfile.py)
   β”œβ”€β”€ Download dataset from Roboflow
   β”œβ”€β”€ Initialize YOLOv8 base model
   β”œβ”€β”€ Configure training parameters
   β”œβ”€β”€ Train for 100 epochs
   β”œβ”€β”€ Validate performance
   └── Save best weights as pothole.pt
          ↓
4. MODEL DEPLOYMENT (YOLOv8 + OpenCV)
   β”œβ”€β”€ Load pothole.pt weights
   β”œβ”€β”€ Initialize video capture
   β”œβ”€β”€ Process frames in loop:
   β”‚   β”œβ”€β”€ Read frame
   β”‚   β”œβ”€β”€ Run YOLOv8 inference
   β”‚   β”œβ”€β”€ Extract bounding boxes
   β”‚   β”œβ”€β”€ Draw annotations
   β”‚   └── Display/save results
   └── Generate detection summary
Enter fullscreen mode Exit fullscreen mode

Technical Flow Diagram:

Kaggle Dataset β†’ Roboflow Processing β†’ YOLOv8 Training β†’ pothole.pt
                                                              ↓
                                                         Inference
                                                              ↓
                  Video/Image Input β†’ OpenCV β†’ YOLOv8 β†’ Detections
Enter fullscreen mode Exit fullscreen mode

7. Key Concepts Explained

7.1 Transfer Learning

  • Start with yolov8n.pt (pre-trained on COCO dataset)
  • Fine-tune on pothole-specific images
  • Model learns pothole features while retaining general object detection ability

7.2 Data Augmentation

  • Creates artificial variations of training images
  • Prevents overfitting
  • Improves model generalization
  • Examples: rotation, brightness, flipping

7.3 Epochs

  • One complete pass through entire training dataset
  • 100 epochs = model sees all training images 100 times
  • More epochs = better learning (up to a point)

7.4 Confidence Threshold

  • Minimum score for detection to be considered valid
  • 0.5 = 50% confidence
  • Higher threshold = fewer false positives, more missed detections
  • Lower threshold = more detections, more false positives

7.5 Bounding Box

  • Rectangle drawn around detected pothole
  • Defined by coordinates: (x1, y1, x2, y2)
  • (x1, y1) = top-left corner
  • (x2, y2) = bottom-right corner

8. Troubleshooting

Common Issues:

Issue: "CUDA out of memory" error
Solution: Reduce batch size in training config (e.g., batch=8)

Issue: Low detection accuracy
Solution:

  • Increase training epochs
  • Add more diverse training images
  • Adjust confidence threshold
  • Check annotation quality

Issue: Model detects non-potholes
Solution:

  • Add hard negative examples to training set
  • Increase training epochs
  • Adjust confidence threshold higher

Issue: Slow inference speed
Solution:

  • Use smaller model (yolov8n instead of yolov8x)
  • Reduce input image size
  • Use GPU instead of CPU

9. Performance Metrics

Training Metrics Explained:

  • mAP50: Mean Average Precision at 50% IoU threshold (higher is better)
  • mAP50-95: mAP averaged over IoU thresholds 50%-95% (more strict)
  • Precision: Percentage of correct detections out of all detections
  • Recall: Percentage of actual potholes that were detected
  • F1 Score: Harmonic mean of precision and recall

Target Performance:

  • mAP50: > 0.70 (Good)
  • Precision: > 0.75 (Good)
  • Recall: > 0.70 (Good)

Top comments (0)