🚀 Simplifying FastRTC Detector with Ultralytics Best Practices

“Simplicity is the ultimate sophistication.” – Steve Jobs

When you build an ML pipeline, complexity creeps in fast.
That’s what happened to our FastRTC detector—a YOLO-based module for detection + tracking + OCR.

It worked. But it was over-engineered.

🔥 The Problem

The original detector had:

Tracking spaghetti 🍝
Custom GPU/CPU optimizations nobody could maintain
Complicated caching and cleanup logic
Hard-to-debug annotation code

Result? More time fixing than time shipping.

✨ The Fix: Trust YOLO

Instead of reinventing everything, we simplified using Ultralytics’ own best practices.

✅ Tracking

if self.enable_tracking:
    results = self.model.track(
        frame,
        conf=self.confidence_threshold,
        persist=True,
        tracker=f"{self.tracker_type}.yaml",
        verbose=False
    )
else:
    results = self.model(frame, conf=self.confidence_threshold, verbose=False)

✅ Results Extraction

result = results[0]
boxes = result.boxes
masks = getattr(result, 'masks', None)

track_ids = None
if boxes is not None and hasattr(boxes, 'id'):
    track_ids = boxes.id.cpu().numpy().astype(int)

✅ Annotation

annotated_frame = result.plot(line_width=2, conf=True, labels=True)

if self.enable_tracking and track_ids is not None:
    for i, track_id in enumerate(track_ids):
        cv2.putText(annotated_frame, f"ID:{track_id}", ...)

✅ OCR Caching

if track_id and track_id in self.tracked_objects:
    detection["ocr"] = self.tracked_objects[track_id]["ocr_result"]
else:
    detection["ocr"] = self.ocr_pipeline.extract_text_from_region(frame, bbox, mask)

✅ Track Management

def _simple_track_cleanup(self):
    max_tracks = 30
    if len(self.tracked_objects) > max_tracks:
        oldest = sorted(self.tracked_objects.items(), key=lambda x: x[1].get("first_seen_frame", 0))
        self.tracked_objects = dict(oldest[-max_tracks:])