The Surprising Result: ConvNeXt Wins on Equal Compute
ConvNeXt-T hits 82.1% ImageNet top-1 accuracy at 4.5G FLOPs. Swin-T gets 81.3%. ViT-S/16? 79.9%.
That's a 2.2-point gap between the worst and best performers at roughly the same computational budget. When I first saw these numbers from Meta's ConvNeXt paper (Liu et al., CVPR 2022), I assumed they'd cherry-picked the comparison points. But after running my own benchmarks on a mix of model variants, the pattern holds across multiple FLOPs tiers.
Why does this matter? Because FLOPs-matched comparisons strip away the marketing noise. A model that needs 3x the compute to match another isn't better—it's just bigger. And if you're deploying to production where inference cost scales with every request, that 2.2% accuracy gain at identical compute is worth real money.
What "Equal FLOPs" Actually Means
Continue reading the full article on TildAlice

Top comments (0)