TFLite & ONNX Mobile Setup: 2x Speed Left on Table

#tflite #onnxruntime #mobileml #edgeinference

The Default Config Is Lying to You

MobileNetV2 running at 180ms on a Raspberry Pi 4. That's what I got out of the box with TFLite's default interpreter. Switch on XNNPack, set four threads, and suddenly it's 35ms. Same model, same hardware, same framework—5x faster just from configuration.

Most mobile ML tutorials skip this part. They benchmark frameworks, declare winners, and leave you with code that's criminally slow. The truth? Both TFLite and ONNX Runtime Mobile can hit near-identical latency on most ARM devices when properly configured. The 2x speed gap everyone argues about? It's usually a setup mistake, not a framework limitation.

I've spent weeks profiling inference on everything from Pi Zeros to Snapdragon 888s, and the pattern is consistent: default configurations leave 40-80% of your hardware's capability untouched. This isn't about which framework is "better." It's about not shooting yourself in the foot.