The Benchmark That Made Me Question Everything
TFLite was supposed to be the gold standard for Android inference. That's what the tutorials say, anyway. So when I ran a MobileNetV3 classification model through both runtimes on a Pixel 7, expecting maybe a 10-20% difference, watching ONNX Runtime crush TFLite by 3.2x on the same model wasn't just surprising — it broke my mental model of how mobile inference works.
The numbers: 4.7ms average on ONNX Runtime, 15.1ms on TFLite. Same model architecture, same input resolution, same device. What's going on?
Why TFLite Should Have Won (But Didn't)
TFLite has home-field advantage on Android. Google built it specifically for mobile, the NNAPI delegate talks directly to hardware accelerators, and there are years of Android-specific optimizations baked in. ONNX Runtime, by contrast, started as a cross-platform inference engine that happened to add mobile support later.
So I assumed my benchmark was broken.
Continue reading the full article on TildAlice

Top comments (0)