The Real Surprise: No NMS, No Anchors, Same Accuracy
Faster R-CNN has dominated object detection since 2015. Anchors, region proposals, non-maximum suppression (NMS)—these handcrafted components became so standard that nobody questioned them. Then Facebook AI dropped DETR in 2020 and achieved 42 AP on COCO with none of that machinery.
You can read the full paper here.
The key insight isn't just "Transformers work for detection." It's that the entire detection pipeline—from feature extraction to final bounding boxes—can be reformulated as a direct set prediction problem. One forward pass, 100 learned queries, bipartite matching loss. Done.
Why Faster R-CNN's Pipeline Got So Complicated
Before diving into DETR's elegance, let's appreciate what it replaced. Faster R-CNN (Ren et al., NeurIPS 2015) needs:
- Anchor generation: ~15K anchors per image across multiple scales and aspect ratios
- Region Proposal Network (RPN): First-stage filtering to ~2000 proposals
Continue reading the full article on TildAlice

Top comments (0)