TFLite vs ONNX Mobile: 5 ARM Devices, 12ms Gap

#tensorflowlite #onnxruntime #arm #mobileinference

The 12ms Gap Nobody Talks About

Most ARM inference benchmarks test one device, call it a day, and declare a winner. That's fine until you ship to real users with Galaxy A52s, iPhone SEs, and Raspberry Pis in the wild, and suddenly your "optimized" model runs 3x slower than expected.

I wanted to know: does TensorFlow Lite actually beat ONNX Runtime Mobile across the ARM zoo, or is that just folklore from 2019? So I grabbed five devices spanning three years of ARM evolution — Raspberry Pi 4, Jetson Nano, Galaxy S21, iPhone 13 Pro, and a Cortex-A53 dev board — and ran the same MobileNetV2 model through both runtimes. Same weights, same INT8 quantization, same input resolution.

The gap? 12ms average in TFLite's favor on Android. But on iOS, ONNX Runtime actually won by 9ms. And the Raspberry Pi results made me question everything.