SAM vs FastSAM vs SAM 2: Inference Speed Benchmark

#sam #fastsam #sam2 #segmentation

FastSAM is 15x faster than SAM on my RTX 3090, but it misses 22% of fine details

I spent the weekend running 500 images through SAM, SAM 2, and FastSAM to settle the speed-vs-quality debate once and for all. The headline number everyone quotes—"FastSAM is 50x faster!"—turns out to be misleading. On real-world images with mixed object sizes, the gap shrinks dramatically, and the quality trade-off is harder than the papers suggest.

Here's what actually happened when I benchmarked all three models on identical hardware with consistent preprocessing. The results surprised me, especially around memory usage.

A shirtless man drinking from a yellow cup at the beach, enjoying a sunny seaside day. — Photo by RDNE Stock project on Pexels

The models: architecture differences that explain the speed gap

SAM (Segment Anything Model, Kirillov et al., 2023) uses a massive ViT-H image encoder with 632M parameters. The encoder runs once per image, producing a 256×64×64 embedding. Then for each prompt (point, box, or mask), a lightweight decoder generates the segmentation in ~50ms. Total first-frame time? Around 3-4 seconds on GPU.

Continue reading the full article on TildAlice