SAM vs Grounded-SAM: Zero-Shot Segmentation mIoU Gap

#sam #groundedsam #zeroshotsegmentation #cocobenchmark

The Zero-Shot Segmentation Promise Falls Apart on Custom Data

Meta's Segment Anything Model (SAM) hit the scene with a bold claim: segment any object in any image with zero training. The original paper (Kirillov et al., 2023) showed stunning qualitative results. Then Grounded-SAM came along, combining SAM with open-vocabulary detection to enable text-prompted segmentation. Everyone assumed this was the solution for production pipelines that needed to segment arbitrary objects without annotation.

I ran both through a battery of tests on COCO val2017 and a custom industrial dataset. The results weren't what the papers promised.

SAM achieves 46.5 mIoU on COCO's 80 categories when you feed it ground-truth bounding boxes as prompts. That's decent. But the moment you swap COCO for custom objects—say, defects on PCB boards or specific machine parts—that number craters to around 28-32 mIoU depending on the object type. Grounded-SAM, which uses Grounding DINO (Liu et al., 2023) for detection before SAM segments, brings text-based prompting but introduces its own failure modes.

Continue reading the full article on TildAlice

DEV Community

SAM vs Grounded-SAM: Zero-Shot Segmentation mIoU Gap

The Zero-Shot Segmentation Promise Falls Apart on Custom Data

Top comments (0)