DEV Community

Syahrul Al-Rasyid
Syahrul Al-Rasyid

Posted on

๐Ÿš€ Simplifying FastRTC Detector with Ultralytics Best Practices

โ€œSimplicity is the ultimate sophistication.โ€ โ€“ Steve Jobs

When you build an ML pipeline, complexity creeps in fast.
Thatโ€™s what happened to our FastRTC detectorโ€”a YOLO-based module for detection + tracking + OCR.

It worked. But it was over-engineered.


๐Ÿ”ฅ The Problem

The original detector had:

  • Tracking spaghetti ๐Ÿ
  • Custom GPU/CPU optimizations nobody could maintain
  • Complicated caching and cleanup logic
  • Hard-to-debug annotation code

Result? More time fixing than time shipping.


โœจ The Fix: Trust YOLO

Instead of reinventing everything, we simplified using Ultralyticsโ€™ own best practices.

โœ… Tracking

if self.enable_tracking:
    results = self.model.track(
        frame,
        conf=self.confidence_threshold,
        persist=True,
        tracker=f"{self.tracker_type}.yaml",
        verbose=False
    )
else:
    results = self.model(frame, conf=self.confidence_threshold, verbose=False)
Enter fullscreen mode Exit fullscreen mode

โœ… Results Extraction

result = results[0]
boxes = result.boxes
masks = getattr(result, 'masks', None)

track_ids = None
if boxes is not None and hasattr(boxes, 'id'):
    track_ids = boxes.id.cpu().numpy().astype(int)
Enter fullscreen mode Exit fullscreen mode

โœ… Annotation

annotated_frame = result.plot(line_width=2, conf=True, labels=True)

if self.enable_tracking and track_ids is not None:
    for i, track_id in enumerate(track_ids):
        cv2.putText(annotated_frame, f"ID:{track_id}", ...)
Enter fullscreen mode Exit fullscreen mode

โœ… OCR Caching

if track_id and track_id in self.tracked_objects:
    detection["ocr"] = self.tracked_objects[track_id]["ocr_result"]
else:
    detection["ocr"] = self.ocr_pipeline.extract_text_from_region(frame, bbox, mask)
Enter fullscreen mode Exit fullscreen mode

โœ… Track Management

def _simple_track_cleanup(self):
    max_tracks = 30
    if len(self.tracked_objects) > max_tracks:
        oldest = sorted(self.tracked_objects.items(), key=lambda x: x[1].get("first_seen_frame", 0))
        self.tracked_objects = dict(oldest[-max_tracks:])
Enter fullscreen mode Exit fullscreen mode

๐ŸŽฏ The Results

  • 40% fewer lines of code
  • Cleaner, linear flow
  • Predictable performance (let YOLO optimize)
  • Fully functional: detection, tracking, OCR, annotations
  • Much easier to debug + extend

๐Ÿš€ Takeaway

If youโ€™re building on YOLO:
๐Ÿ‘‰ Donโ€™t fight it.
๐Ÿ‘‰ Donโ€™t over-engineer it.
๐Ÿ‘‰ Trust the frameworkโ€”Ultralytics already optimized it for you.

Your code will be:

  • Shorter
  • Faster
  • Easier to maintain

๐Ÿ’ก Next time your ML pipeline feels messy, ask yourself:
โ€œAm I making this harder than it needs to be?โ€

Top comments (0)