YOLO vs Detectron2 vs MMDetection: Training Speed Test

#yolo #detectron2 #mmdetection #objectdetection

MMDetection has the highest learning curve I've encountered in object detection frameworks. But it's also the only one I'd trust for a 50-class custom dataset.

Most tutorials will tell you to start with YOLO because it's "easy." They're not wrong—I had a YOLOv8 model running on a custom hardhat detection dataset in under 30 minutes. But three weeks later, when I needed to swap in Cascade R-CNN because single-stage detectors were missing small objects at distance, I was stuck rewriting everything. The YOLO ecosystem is optimized for speed and convenience, not flexibility.

I spent a month training the same construction site safety detection task (5 classes: hardhat, no-hardhat, vest, machinery, person) across all three frameworks. Same dataset, same train/val split, same Mosaic augmentation scheme. The differences weren't just in mAP—they showed up in training time, debugging pain, and how easily I could swap architectures when the first attempt failed.

Here's what actually matters when you're choosing a framework for custom detection work.