Harris Bashir

Posted on Jan 11

Building a Production-Ready Traffic Violation Detection System with Computer Vision

#computervision #machinelearning #ai #dataengineering

Traffic monitoring and violation detection is a classic computer vision problem that looks deceptively simple but becomes complex very quickly in real-world conditions. Variations in lighting, camera angles, occlusions, vehicle density, and inconsistent road markings make rule-based approaches unreliable at scale.

In this article, I’ll walk through how I designed and implemented an end-to-end traffic violation detection system using modern computer vision techniques. The goal was not just to detect vehicles, but to track them across frames, understand movement patterns, and identify violations in a way that could realistically work in production.

Problem Overview

The core problem was to automatically detect and analyse vehicle behaviour from video streams to identify traffic violations such as illegal turns, lane violations, or restricted-area movement.

Key challenges included:

Detecting vehicles accurately in crowded scenes
Maintaining consistent tracking across frames
Handling partial occlusion and fast movement
Designing a pipeline that could scale beyond a single video

This required more than just object detection, it needed tracking, spatial reasoning, and system-level thinking.

System Architecture

At a high level, the system consists of five stages:

Video ingestion
Vehicle detection
Multi-object tracking
Violation logic & analytics
Visualisation & reporting

Each stage was designed to be modular so components could be improved independently.

Vehicle Detection

For detection, I used YOLO-based models due to their balance of speed and accuracy in real-time scenarios.

Key decisions:

Fine-tuned YOLO models for vehicle classes
Used SAHI (Slicing Aided Hyper Inference) to improve detection accuracy on high-resolution frames
Balanced inference speed with recall to avoid missing fast-moving vehicles

YOLO provided reliable bounding boxes even in moderately dense traffic, while SAHI helped with smaller vehicles at distance.

Multi-Object Tracking

Detection alone isn’t enough, violations require understanding movement over time.

For tracking, I used DeepSORT, which combines:

Kalman filtering for motion prediction
Appearance embeddings for identity consistency

This allowed the system to:

Assign unique IDs to vehicles
Track them across frames
Handle temporary occlusions reasonably well

Tracking stability was critical, as even small ID switches can invalidate violation logic.

Violation Detection Logic

Once vehicles were reliably tracked, the next challenge was defining violation rules.

Rather than hardcoding pixel-based rules, I implemented:

Region-based logic (entry/exit zones)
Directional flow analysis
Temporal thresholds to reduce false positives

For example:

Vehicles entering restricted zones were flagged only after consistent tracking
Sudden detection spikes were ignored unless sustained across frames

This approach made the system more robust to noise and camera artefacts.

Data Pipeline & Performance Considerations

To keep the system production-ready:

Frames were processed in batches where possible
Inference and tracking were decoupled
Intermediate metadata (IDs, coordinates, timestamps) was stored for later analysis

Performance trade-offs were carefully managed:

Higher resolution improved detection but increased latency
Tracking stability was prioritised over raw FPS for accuracy

These decisions are often overlooked but matter significantly in real deployments.

Visualisation & Outputs

To make the system usable:

Violations were overlaid directly on video output
Vehicle IDs and paths were visualised
Structured outputs were generated for downstream analytics

This made it easier to validate system behaviour and explain results to non-technical stakeholders.

Key Challenges & Learnings

Some of the most important lessons from this project:

Detection accuracy is only half the problem, tracking quality matters just as much
Real-world video is noisy, and systems must tolerate imperfect data
Simple rules often outperform complex models when designed carefully
Modular design makes iteration significantly easier
These learnings influenced how I approach applied AI systems beyond computer vision.

Final Thoughts

This project reinforced the importance of engineering judgement when building applied AI systems. Models alone don’t solve problems, thoughtful system design, realistic assumptions, and careful trade-offs are what make solutions viable in production.

The full implementation, including model setup and pipeline code, is available on GitHub:

👉 GitHub Repository: CLICK HERE

Top comments (2)

Jazib Raja • Jan 11

Excellent write-up 👏
This is a great example of applied computer vision done right. I really appreciated how you went beyond model selection and focused on system-level design — especially the emphasis on tracking stability, region-based violation logic, and production trade-offs.
The discussion around decoupling detection and tracking, handling noisy real-world video, and prioritising robustness over raw FPS reflects real deployment experience, which is often missing in similar articles. The use of SAHI with YOLO and DeepSORT, combined with pragmatic rule design, makes this feel genuinely production-ready rather than a demo.
Overall, a very well-structured and insightful breakdown of an end-to-end traffic violation detection system. Thanks for sharing both the technical decisions and the lessons learned — this will be valuable for anyone building real-world CV pipelines.

Harris Bashir • Jan 11

Thank you, I really appreciate the thoughtful feedback.

I’m glad the system-level focus came through, that was a deliberate choice. In my experience, most real-world issues don’t come from model selection alone but from how detection, tracking, and rule logic interact under noisy conditions.

Decoupling detection and tracking and prioritising stability over raw FPS made a noticeable difference in reliability, especially when dealing with occlusions and inconsistent video quality. Using SAHI alongside YOLO helped with recall at distance, but it was the pragmatic rule design and temporal filtering that ultimately reduced false positives.

Thanks again for taking the time to engage with the article, I’m happy it resonated and hopefully proves useful for others building production CV pipelines.