<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nandita Bhattacharya</title>
    <description>The latest articles on DEV Community by Nandita Bhattacharya (@nanditab35).</description>
    <link>https://dev.to/nanditab35</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3872068%2F5cf53db9-55b0-493a-977f-bdf7056392b4.jpeg</url>
      <title>DEV Community: Nandita Bhattacharya</title>
      <link>https://dev.to/nanditab35</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nanditab35"/>
    <language>en</language>
    <item>
      <title>YOLO Evolution: Comparing YOLOv5, v11, v12, and v26 on the Cars Detection Dataset</title>
      <dc:creator>Nandita Bhattacharya</dc:creator>
      <pubDate>Sun, 12 Apr 2026 07:56:42 +0000</pubDate>
      <link>https://dev.to/nanditab35/yolo-evolution-comparing-yolov5-v11-v12-and-v26-on-the-cars-dataset-1ckj</link>
      <guid>https://dev.to/nanditab35/yolo-evolution-comparing-yolov5-v11-v12-and-v26-on-the-cars-dataset-1ckj</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhykbxi77qtwwogowa8ri.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhykbxi77qtwwogowa8ri.jpg" alt="YOLOv12 inference on Cars Detection Dataset" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 1. YOLOv12 performing real-time vehicle detection on the Cars Detection Dataset.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Introduction: The Need for Speed and Precision
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;Object Detection task:&lt;/u&gt;&lt;/strong&gt; Object Detection is a foundational challenge in Computer Vision that goes beyond simple image classification. While classification identifies what is in an image (e.g., "this is a car"), Object Detection simultaneously answers the questions of "what" and "where." It involves identifying multiple objects within a single frame, classifying each into a specific category, and pinpointing their exact locations using bounding boxes. For tasks like traffic monitoring or autonomous driving—the focus of our Cars Detection Dataset—the model must not only recognise a vehicle but also distinguish between an ambulance, a bus, or a motorcycle in a cluttered, real-time environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;The "YOLO Revolution":&lt;/u&gt;&lt;/strong&gt; Before the arrival of &lt;strong&gt;YOLO (You Only Look Once)&lt;/strong&gt;, the industry standard relied on &lt;strong&gt;"two-stage" detectors&lt;/strong&gt; like &lt;strong&gt;Faster R-CNN&lt;/strong&gt;. These architectures first proposed regions of interest using a &lt;strong&gt;Region-Proposal Network (RPN)&lt;/strong&gt; and then &lt;strong&gt;classified&lt;/strong&gt; those regions in a &lt;strong&gt;second pass&lt;/strong&gt;—a process that was accurate but computationally &lt;strong&gt;expensive and slow&lt;/strong&gt;. The "YOLO Revolution" fundamentally changed this by treating object detection as a single regression problem. By passing the &lt;strong&gt;entire image&lt;/strong&gt; through a neural network &lt;strong&gt;once&lt;/strong&gt; to predict both &lt;strong&gt;bounding boxes&lt;/strong&gt; and &lt;strong&gt;class probabilities&lt;/strong&gt; simultaneously, YOLO achieved unprecedented &lt;strong&gt;inference speeds&lt;/strong&gt;. This shift from multi-stage processing to a streamlined, one-stage architecture made &lt;strong&gt;real-time AI&lt;/strong&gt; applications possible, paving the way for the &lt;strong&gt;high-speed, high-accuracy&lt;/strong&gt; versions we see today, from the &lt;strong&gt;classic YOLOv5 to the cutting-edge YOLOv26&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. The YOLO Lineage: From v5 to v26
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;YOLOv5:&lt;/u&gt;&lt;/strong&gt; The industry's "Old Reliable." Released in 2020, YOLOv5 remains the benchmark for reliability in the computer vision community. Its architecture, centred around the &lt;strong&gt;CSP-Darknet53 backbone&lt;/strong&gt;. While its out-of-the-box performance on the Cars Detection Dataset was modest (0.2574 mAP), it &lt;strong&gt;improved&lt;/strong&gt; quite well during &lt;strong&gt;fine-tuning&lt;/strong&gt; (0.6966 mAP). One of its most striking attributes in this experiment was its speed stability; whether &lt;strong&gt;original or fine-tuned&lt;/strong&gt;, it maintained a highly consistent inference latency of roughly &lt;strong&gt;4.8 to 4.9 ms&lt;/strong&gt;, proving why it remains a favourite for production environments where predictable performance is key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;YOLOv11 &amp;amp; YOLOv12:&lt;/u&gt;&lt;/strong&gt; YOLOv11 and YOLOv12 represent the modern frontier, moving &lt;strong&gt;beyond simple convolutional&lt;/strong&gt; stacks toward sophisticated feature aggregation. &lt;strong&gt;YOLOv11&lt;/strong&gt; proved to be the efficiency champion of this study, achieving the fastest fine-tuned &lt;strong&gt;inference speed&lt;/strong&gt; at just &lt;strong&gt;4.47 ms&lt;/strong&gt;. Meanwhile, &lt;strong&gt;YOLOv12&lt;/strong&gt; introduced a more complex architecture utilising &lt;strong&gt;Area Attention&lt;/strong&gt; and &lt;strong&gt;R-ELAN (Residual Efficient Layer Aggregation Network)&lt;/strong&gt;. These features allow the model to "focus" better on &lt;strong&gt;overlapping vehicles&lt;/strong&gt; in crowded traffic scenes. This architectural investment paid off in accuracy: YOLOv12 emerged as the overall leader in your experiment with a &lt;strong&gt;top&lt;/strong&gt; score of &lt;strong&gt;0.7402 mAP (after fine-tuning)&lt;/strong&gt;, though its sophisticated attention layers resulted in a &lt;strong&gt;slightly higher latency of 7.16 ms&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;YOLOv26:&lt;/u&gt;&lt;/strong&gt; YOLOv26 is a revolutionary shift in how object detection &lt;strong&gt;handles post-processing&lt;/strong&gt;. Traditionally, models require &lt;strong&gt;Non-Maximum Suppression (NMS)&lt;/strong&gt;—a separate computation step to filter out overlapping duplicate boxes—which can create bottlenecks on edge hardware. YOLOv26 is designed as an &lt;strong&gt;end-to-end, NMS-free detector&lt;/strong&gt;, aiming to produce final results directly from the network. In your results, this architecture showed a &lt;strong&gt;unique "Latency Win"&lt;/strong&gt; during &lt;strong&gt;fine-tuning&lt;/strong&gt;: while the original model clocked in at 8.20 ms, the fine-tuned version dropped significantly to 5.07 ms. Combined with a strong &lt;strong&gt;0.7104 mAP (after fine-tuning)&lt;/strong&gt;, YOLOv26 proves that removing the NMS bottleneck is a viable strategy for &lt;strong&gt;high-performance custom detection&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Dataset &amp;amp; Technical Setup
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;Overview of the Cars Detection Dataset:&lt;/u&gt;&lt;/strong&gt; For this experiment, I utilised the Cars Detection Dataset, a specialised collection of imagery designed for multi-class vehicle detection. The vehicle objects to be detected are categorised into these classes: &lt;strong&gt;Ambulance, Bus, Car, Motorcycle, Truck &amp;amp; Background&lt;/strong&gt;. This dataset includes various perspectives that challenge a model's ability to generalise across different &lt;strong&gt;angles&lt;/strong&gt; and &lt;strong&gt;lighting conditions&lt;/strong&gt;. This variety is particularly useful for testing if a model can distinguish between similar &lt;strong&gt;large vehicles&lt;/strong&gt; (e.g. Bus, Truck), or identify smaller, higher-speed targets (e.g.  Motorcycles), making it an ideal playground for comparing the latest YOLO versions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;Slight Modification to Original Dataset's data.yaml file to avoid Path Error:&lt;/u&gt;&lt;/strong&gt;A common hurdle when working with community-contributed datasets is the environment-specific or package-specific configurations. 

&lt;ul&gt;
&lt;li&gt;In the original version of this dataset, the data.yaml file contained a &lt;strong&gt;path&lt;/strong&gt; variable, for saving relative path of the parent directory of split image folders (train, valid). But due to this, Ultralytics's model.train() function was getting &lt;strong&gt;Path Error&lt;/strong&gt;, while fine-tuning YOLOv5/v11/v12/v26. The &lt;strong&gt;model.train()&lt;/strong&gt; function is taking the directory of data.yaml as the Base directory automatically inside Kaggle Environment. So, the path variable in data.yaml to be &lt;strong&gt;commented out&lt;/strong&gt; for fine-tuning  models in Kaggle environment using the Ultralytics package. &lt;/li&gt;
&lt;li&gt;Additionally, there was &lt;strong&gt;no test variable&lt;/strong&gt; for the test split image folder in the data.yaml of the original dataset. So, in the modified version the &lt;strong&gt;test&lt;/strong&gt; variable is added, which enables the use of Ultralytics's &lt;strong&gt;model.val()&lt;/strong&gt; function for running inference &amp;amp; &lt;strong&gt;automatic inference metric calculation&lt;/strong&gt; on the held out test split data.&lt;/li&gt;
&lt;li&gt;Dataset Links: &lt;a href="https://www.kaggle.com/datasets/abdallahwagih/cars-detection" rel="noopener noreferrer"&gt;Original Cars Detection Dataset&lt;/a&gt;, &lt;a href="https://www.kaggle.com/datasets/nanditab35/cars-detection?select=cars_detection" rel="noopener noreferrer"&gt;Modified Cars Detection Dataset&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;&lt;u&gt;Dataset License:&lt;/u&gt;&lt;/strong&gt; The &lt;strong&gt;original&lt;/strong&gt; Cars Detection Dataset is published under the &lt;strong&gt;Apache 2.0&lt;/strong&gt; License, a &lt;strong&gt;highly permissive license&lt;/strong&gt; that allows for modification and redistribution. In alignment with these terms, my &lt;strong&gt;modified version&lt;/strong&gt; of the dataset is also released under the &lt;strong&gt;Apache 2.0&lt;/strong&gt; License. I have maintained &lt;strong&gt;full attribution&lt;/strong&gt; to the original author, &lt;strong&gt;Abdallah Wagih&lt;/strong&gt;, and included a &lt;strong&gt;clear log&lt;/strong&gt; of the technical changes made.&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. The Experiment: Comparative Methodology
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;Power of the Ultralytics Framework:&lt;/u&gt;&lt;/strong&gt; To ensure a &lt;strong&gt;consistent&lt;/strong&gt; and &lt;strong&gt;high-performance&lt;/strong&gt; benchmarking environment, all experiments were conducted using the &lt;strong&gt;Ultralytics Python package&lt;/strong&gt;. This framework has become the &lt;strong&gt;industry standard&lt;/strong&gt; for YOLO-based tasks because it provides a unified API for managing multiple model versions—from the &lt;strong&gt;legacy YOLOv5&lt;/strong&gt; to the &lt;strong&gt;State-Of-The-Art YOLOv11&lt;/strong&gt; and beyond. Using a single engine for training, validation, and inference ensured that the results were not skewed by different pre-processing or post-processing implementations. This &lt;strong&gt;streamlined approach&lt;/strong&gt; allowed fair comparison of the underlying architectures rather than the software wrappers around them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;Benchmarking Pre-trained (Out-of-the-box) vs. Fine-tuned models:&lt;/u&gt;&lt;/strong&gt; The core of this study focuses on the performance gap between generic knowledge and specialised expertise. I first evaluated the &lt;strong&gt;Pre-trained&lt;/strong&gt; (Out-of-the-box) models, which were &lt;strong&gt;originally trained&lt;/strong&gt; on the &lt;strong&gt;COCO dataset&lt;/strong&gt;. These models were tasked with detecting vehicles in the Cars Detection Dataset. Following this baseline, I performed Fine-tuning on each version. To ensure a &lt;strong&gt;scientifically fair&lt;/strong&gt; comparison, I &lt;strong&gt;optimised hyper-parameters&lt;/strong&gt; for &lt;strong&gt;YOLOv5&lt;/strong&gt; and then kept these settings &lt;strong&gt;fixed across all&lt;/strong&gt; other YOLO versions. By holding these variables constant, the experiment isolates the &lt;strong&gt;impact of the architectural evolutions&lt;/strong&gt;—such as YOLOv12’s attention mechanisms or YOLOv26’s NMS-free design—revealing how each engine inherently handles domain adaptation under identical training conditions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;Hardware Configuration and Latency Context:&lt;/u&gt;&lt;/strong&gt; In object detection, &lt;strong&gt;"accuracy"&lt;/strong&gt; is only &lt;strong&gt;half of the story&lt;/strong&gt;; &lt;strong&gt;"latency"&lt;/strong&gt; is &lt;strong&gt;equally critical&lt;/strong&gt; for real-world deployment. To provide a standardised context for the speed results, all tests were performed in a &lt;strong&gt;Kaggle Notebook&lt;/strong&gt; environment utilising a &lt;strong&gt;NVIDIA Tesla T4x2 GPU&lt;/strong&gt;. The T4x2 is a widely used accelerator in cloud production environments, making these latency figures (measured in &lt;strong&gt;milliseconds per image&lt;/strong&gt;) a realistic representation of what a developer can expect in a real-world application. By keeping the hardware constant across all 8 test runs, I was able to isolate the architectural efficiency of each YOLO version.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;Kaggle NoteBooks for all Experiments:&lt;/u&gt;&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/code/nanditab35/yolov5-on-cars-dataset" rel="noopener noreferrer"&gt;Kaggle Notebook: YOLO v5 on Cars Detection Dataset&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/code/nanditab35/yolov11-on-cars-dataset" rel="noopener noreferrer"&gt;Kaggle Notebook: YOLO v11 on Cars Detection Dataset&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/code/nanditab35/yolov12-on-cars-dataset" rel="noopener noreferrer"&gt;Kaggle Notebook: YOLO v12 on Cars Detection Dataset&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/code/nanditab35/yolov26-on-cars-dataset" rel="noopener noreferrer"&gt;Kaggle Notebook: YOLO v26 on Cars Detection Dataset&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Results &amp;amp; Visual Analysis
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;The Master Table: Benchmarking Results:&lt;/u&gt;&lt;/strong&gt; 

&lt;ul&gt;
&lt;li&gt;For all performance evaluations, the &lt;strong&gt;last.pt&lt;/strong&gt; weights were utilised rather than best.pt. Despite keeping training &lt;strong&gt;hyper-parameters (including epoch and patience)&lt;/strong&gt; constant across all &lt;strong&gt;versions&lt;/strong&gt;, the final weights provided more &lt;strong&gt;consistent&lt;/strong&gt; and &lt;strong&gt;sensible&lt;/strong&gt; detections, suggesting that the models reached a more &lt;strong&gt;stable convergence&lt;/strong&gt; point at the end of the training cycle for this specific dataset. &lt;/li&gt;
&lt;li&gt;The quantitative results of the experiment highlight a &lt;strong&gt;massive performance leap&lt;/strong&gt; across all architectures upon &lt;strong&gt;fine-tuning&lt;/strong&gt;. Notably, &lt;strong&gt;YOLOv12&lt;/strong&gt; emerged as the &lt;strong&gt;leader in precision&lt;/strong&gt;, while &lt;strong&gt;YOLOv11&lt;/strong&gt; optimised &lt;strong&gt;inference speed&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model Version&lt;/th&gt;
&lt;th&gt;mAP@50 (Original)&lt;/th&gt;
&lt;th&gt;mAP@50 (Fine-Tuned)&lt;/th&gt;
&lt;th&gt;mAP@50-95 (Fine-Tuned)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;YOLOv5&lt;/td&gt;
&lt;td&gt;0.2574&lt;/td&gt;
&lt;td&gt;0.6966&lt;/td&gt;
&lt;td&gt;0.5240&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YOLOv11&lt;/td&gt;
&lt;td&gt;0.2651&lt;/td&gt;
&lt;td&gt;0.6945&lt;/td&gt;
&lt;td&gt;0.5296&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;YOLOv12&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.2847&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.7402&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.5572&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;YOLOv26&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.2852&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.7104&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.5193&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model Version&lt;/th&gt;
&lt;th&gt;Inference Time (ms/image, Original)&lt;/th&gt;
&lt;th&gt;Inference Time (ms/image, Fine-Tuned)&lt;/th&gt;
&lt;th&gt;Speed Change (ms/image)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;YOLOv5&lt;/td&gt;
&lt;td&gt;4.91&lt;/td&gt;
&lt;td&gt;4.82&lt;/td&gt;
&lt;td&gt;-0.09 (Stable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;YOLOv11&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5.67&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4.47&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;-1.20 (Faster)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YOLOv12&lt;/td&gt;
&lt;td&gt;7.18&lt;/td&gt;
&lt;td&gt;7.16&lt;/td&gt;
&lt;td&gt;-0.02 (Stable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;YOLOv26&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8.20&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.07&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;-3.13 (Major Gain)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;Visual Proof: The Impact of Fine-Tuning:&lt;/u&gt;&lt;/strong&gt; To visualise the qualitative improvement, we can &lt;strong&gt;compare Confusion Matrices&lt;/strong&gt; of the &lt;strong&gt;original vs fine-tuned&lt;/strong&gt; models. In the "Out-of-the-box" matrix, there is a high concentration of &lt;strong&gt;misclassification errors&lt;/strong&gt; (too many FP and also some FN), where the model either missed vehicles entirely or misidentified them as background noise. After &lt;strong&gt;fine-tuning&lt;/strong&gt;, these errors &lt;strong&gt;significantly reduced&lt;/strong&gt;. The &lt;strong&gt;diagonal&lt;/strong&gt; of the confusion matrix (for model fine-tuned on Cars Detection Dataset)—representing correct predictions—became much &lt;strong&gt;more prominent&lt;/strong&gt;. [&lt;em&gt;&lt;u&gt;Note:&lt;/u&gt; The Confusion Matrix for the original model inference had all the COCO Dataset-related classes. Needed to filter it to show the overlapping classes, but some extra COCO Dataset classes could not be avoided even after filtering&lt;/em&gt;]

&lt;ul&gt;
&lt;li&gt;For all the &lt;strong&gt;YOLO Original&lt;/strong&gt; versions (v5, v11, v12, v26) inference &lt;strong&gt;output bounding boxes&lt;/strong&gt; and &lt;strong&gt;inference metrics&lt;/strong&gt;, go to this location under output tab of the Kaggle Notebook: &lt;em&gt;&lt;u&gt;/runs/detect/Inference_Study/Original_Inference/&lt;/u&gt;&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;For all the &lt;strong&gt;YOLO Fine-tuned&lt;/strong&gt; versions (v5, v11, v12, v26) inference &lt;strong&gt;output bounding boxes&lt;/strong&gt; and &lt;strong&gt;inference metrics&lt;/strong&gt;, go to this location under output tab of the Kaggle Notebook: &lt;em&gt;&lt;u&gt;/runs/detect/Inference_Study/FineTuned_Inference/&lt;/u&gt;&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx2jpqzef1g6c1j2alh33.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx2jpqzef1g6c1j2alh33.png" alt="Filtered Original YOLO12 CM" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 2. Filtered Confusion Matrix for the Original YOLOv12 Model Inference on Cars Detection Dataset's test split.&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz4hobu9u5p00d9fyikbb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz4hobu9u5p00d9fyikbb.png" alt="Fine-tuned YOLO12 CM" width="800" height="600"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Fig 3. Confusion Matrix for the Fine-tuned YOLOv12 Model Inference on Cars Detection Dataset's test split.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;Per-Class Performance-Strengths and Challenges:&lt;/u&gt;&lt;/strong&gt; Across all YOLO versions, the "&lt;strong&gt;Car&lt;/strong&gt;" class emerged as the &lt;strong&gt;most recognisable&lt;/strong&gt; (still having significant &lt;strong&gt;FP&lt;/strong&gt; and &lt;strong&gt;FN&lt;/strong&gt; errors), benefiting from a high volume of training samples. It often struggled to distinguish between a car and the background or other large vehicles, a gap that was only bridged through the fine-tuning process. On the other hand, "&lt;strong&gt;Motorcycles&lt;/strong&gt;" proved to be the &lt;strong&gt;toughest&lt;/strong&gt; for all the architectures (even the attention-heavy YOLOv12). This may be due to their &lt;strong&gt;smaller spatial footprint&lt;/strong&gt; and the &lt;strong&gt;higher variance&lt;/strong&gt; in their &lt;strong&gt;appearance&lt;/strong&gt; compared to cars or buses (which are more box-like). The "&lt;strong&gt;Ambulance&lt;/strong&gt;" class showed &lt;strong&gt;comparatively better&lt;/strong&gt; performance with YOLOv12 (Fine-tuned) even though it is &lt;strong&gt;not present&lt;/strong&gt; in the original COCO Dataset.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Deep Dive Analysis
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;The Fine-Tuning Jump: Bridging the Domain Gap&lt;/u&gt;&lt;/strong&gt; The dramatic surge in performance—with mAP jumping from a baseline of &lt;strong&gt;~0.28&lt;/strong&gt; to a peak of &lt;strong&gt;~0.74&lt;/strong&gt;—highlights the &lt;strong&gt;limitations&lt;/strong&gt; of &lt;strong&gt;general-purpose pre-training&lt;/strong&gt;. While the original models were trained on the COCO dataset, which contains &lt;strong&gt;80 broad categories&lt;/strong&gt;, they &lt;strong&gt;lacked the specialisation&lt;/strong&gt; required for the nuances of the Cars Detection Dataset. 

&lt;ul&gt;
&lt;li&gt;In a general context, a "&lt;strong&gt;vehicle&lt;/strong&gt;" is often just a &lt;strong&gt;large boxy object&lt;/strong&gt;; however, for a &lt;strong&gt;traffic-specific&lt;/strong&gt; application, the model must distinguish at a more &lt;strong&gt;fine-grained level&lt;/strong&gt; (e.g., between an Ambulance and a standard Truck). &lt;strong&gt;Fine-tuning&lt;/strong&gt; allows the network to repurpose its learned features to focus on these &lt;strong&gt;critical distinctions&lt;/strong&gt;, such as medical decals or specific chassis shapes, effectively transforming a "generalist" into a "specialist" for vehicle identification.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;&lt;u&gt;The Latency Paradox: Why Fine-Tuned Models Ran Faster:&lt;/u&gt;&lt;/strong&gt;
An unexpected but fascinating result of the experiment was the "Latency Paradox," where the &lt;strong&gt;fine-tuned&lt;/strong&gt; versions of &lt;strong&gt;YOLOv11, YOLOv26&lt;/strong&gt; actually recorded &lt;strong&gt;lower latency&lt;/strong&gt; than their original counterparts. 

&lt;ul&gt;
&lt;li&gt;
&lt;u&gt;Insight&lt;/u&gt;: This speed gain is primarily due to the &lt;strong&gt;reduction&lt;/strong&gt; in the number of the &lt;strong&gt;classification heads&lt;/strong&gt;(#classes). The original COCO-trained models have &lt;strong&gt;80&lt;/strong&gt; different classification heads. But the fine-tuned models have only &lt;strong&gt;6&lt;/strong&gt; classification heads, hence less computation load on the final layers.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;&lt;u&gt;YOLOv12 vs. YOLOv26: Accuracy vs. Architectural Trade-offs&lt;/u&gt;&lt;/strong&gt;: The comparison between YOLOv12 and YOLOv26 reveals a fundamental trade-off in modern object detection design. &lt;strong&gt;YOLOv12&lt;/strong&gt; emerged as the &lt;strong&gt;accuracy&lt;/strong&gt; champion, largely due to its &lt;strong&gt;Area Attention&lt;/strong&gt; mechanism which excels at capturing global dependencies and refined spatial details. On the other hand, &lt;strong&gt;YOLOv26&lt;/strong&gt; represents a shift toward &lt;strong&gt;structural optimisation&lt;/strong&gt;. By utilising an &lt;strong&gt;NMS-free&lt;/strong&gt; design, YOLOv26 aims to eliminate the post-processing bottleneck entirely, while maintaining a decently high accuracy.&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. Conclusion &amp;amp; Future Scope
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;Optimal Selection-Choosing the Best Model for Traffic Applications:&lt;/u&gt;&lt;/strong&gt; When selecting a model for a real-world traffic monitoring system, the choice depends on the specific priorities of the deployment environment. If the goal is maximum precision—such as identifying specific vehicle types in dense urban congestion—&lt;strong&gt;YOLOv12&lt;/strong&gt; is the clear leader, achieving a &lt;strong&gt;superior 0.7402 mAP&lt;/strong&gt; in our tests. However, for &lt;strong&gt;edge devices&lt;/strong&gt; with limited computational power where every millisecond counts, &lt;strong&gt;YOLOv11&lt;/strong&gt; offers the best balance, delivering the fastest fine-tuned inference speed of &lt;strong&gt;4.47 ms&lt;/strong&gt; while maintaining &lt;strong&gt;high accuracy&lt;/strong&gt;. While the legacy &lt;strong&gt;YOLOv5&lt;/strong&gt; remains remarkably &lt;strong&gt;stable&lt;/strong&gt;, the architectural advancements in the newer versions provide a clear performance improvements that is well worth the upgrade for modern AI applications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;Future Scope-Enhancing Temporal Consistency with ByteTrack:&lt;/u&gt;&lt;/strong&gt; While this study focused on single-frame object detection (Image Dataset), real-world traffic monitoring is inherently video-based. A logical next step for this project is the integration of a &lt;strong&gt;multi-object tracking (MOT)&lt;/strong&gt; algorithm like &lt;strong&gt;ByteTrack&lt;/strong&gt;. By adding a &lt;strong&gt;tracking layer&lt;/strong&gt;, the system can maintain the &lt;strong&gt;identity of vehicles&lt;/strong&gt; across &lt;strong&gt;consecutive frames&lt;/strong&gt;, even during brief &lt;strong&gt;occlusions&lt;/strong&gt;. Integrating ByteTrack with the &lt;strong&gt;high-precision bounding boxes&lt;/strong&gt; of YOLOv12 would transform this detector into a comprehensive solution capable of analysing &lt;strong&gt;vehicle trajectories&lt;/strong&gt;, counting traffic flow, and detecting complex road events with much higher temporal consistency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;Future Scope-Testing Robustness on Aerial Datasets:&lt;/u&gt;&lt;/strong&gt; To further validate the attention-based advantages of models like YOLOv12, future research should involve &lt;strong&gt;testing on&lt;/strong&gt; some &lt;strong&gt;Aerial Dataset&lt;/strong&gt;. Aerial imagery presents a unique challenge because objects appear significantly &lt;strong&gt;smaller&lt;/strong&gt; and can be &lt;strong&gt;oriented in any direction&lt;/strong&gt;. Since &lt;strong&gt;YOLOv12&lt;/strong&gt; utilises &lt;strong&gt;Area Attention&lt;/strong&gt; to capture global context, it is hypothesised that it will hold its accuracy better than traditional CNN-based models when detecting &lt;strong&gt;tiny objects&lt;/strong&gt; from a &lt;strong&gt;top-down&lt;/strong&gt; perspective. Transitioning from road-level views to drone-based surveillance will provide a rigorous test of how these architectures scale across different spatial resolutions and altitudes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  8. References and Acknowledgements:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;u&gt;Cars Detection Dataset (Modified):&lt;/u&gt;&lt;/strong&gt; Original dataset &lt;a href="https://www.kaggle.com/datasets/abdallahwagih/cars-detection" rel="noopener noreferrer"&gt;Original Cars Detection Dataset&lt;/a&gt; by &lt;a href="https://www.kaggle.com/abdallahwagih" rel="noopener noreferrer"&gt;Abdallah Wagih&lt;/a&gt; via Kaggle. Modified and redistributed under the Apache 2.0 License.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.ultralytics.com/" rel="noopener noreferrer"&gt;Ultralytics YOLO Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.ultralytics.com/models/yolo12/" rel="noopener noreferrer"&gt;Ultralytics YOLOv12 Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.ultralytics.com/models/yolo26/" rel="noopener noreferrer"&gt;Ultralytics YOLOv26 Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2407.20892v1" rel="noopener noreferrer"&gt;An Overview of YOLOv5&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2502.12524" rel="noopener noreferrer"&gt;An Overview of YOLOv12&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2602.14582" rel="noopener noreferrer"&gt;An Overview of YOLOv26&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/@ultralytics" rel="noopener noreferrer"&gt;Ultralytics Official YT Channels&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Which one do you prioritise for your projects to be deployed on edge devices: the raw accuracy of YOLOv12 or the NMS-free YOLOv26? Let me know in the comments.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>computervision</category>
      <category>deeplearning</category>
      <category>objectdetection</category>
      <category>imagedata</category>
    </item>
  </channel>
</rss>
