ONNX Runtime vs TFLite Android: 3x Speed Benchmark

#onnxruntime #tensorflowlite #androidml #mobileinference

The Benchmark That Made Me Question Everything

TFLite was supposed to be the gold standard for Android inference. That's what the tutorials say, anyway. So when I ran a MobileNetV3 classification model through both runtimes on a Pixel 7, expecting maybe a 10-20% difference, watching ONNX Runtime crush TFLite by 3.2x on the same model wasn't just surprising — it broke my mental model of how mobile inference works.

The numbers: 4.7ms average on ONNX Runtime, 15.1ms on TFLite. Same model architecture, same input resolution, same device. What's going on?

An artistic view of an empty measuring glass highlighting metric and ounce measurements. — Photo by Steve Johnson on Pexels

Why TFLite Should Have Won (But Didn't)

TFLite has home-field advantage on Android. Google built it specifically for mobile, the NNAPI delegate talks directly to hardware accelerators, and there are years of Android-specific optimizations baked in. ONNX Runtime, by contrast, started as a cross-platform inference engine that happened to add mobile support later.

So I assumed my benchmark was broken.

Continue reading the full article on TildAlice