ONNX Runtime Mobile: 8ms Inference on iPhone 13

#onnxruntime #edgeai #mobileml #coreml

The 200ms Problem

Your edge AI model works great on a dev server. 30ms inference, low memory, rock solid. Then you ship it to a phone and suddenly you're hitting 200ms per frame.

This happened on a real-time object detection prototype I was testing last month. The model was a MobileNetV3 backbone with a custom detection head — 4.2MB TFLite file, quantized to INT8, supposedly optimized for mobile. On my MacBook it ran at 28ms. On an iPhone 13 it crawled at 180-220ms, making the app unusable.

Turns out TFLite isn't the only game in town anymore. ONNX Runtime Mobile (ORT Mobile) has quietly become a serious alternative, and in some cases it's significantly faster.

Close-up view of a smartphone showcasing the ChatGPT app against a colorful background. — Photo by Patrick Gamelkoorn on Pexels

Why I Switched from TFLite to ONNX Runtime

TFLite is the default choice for mobile AI — Google's docs are good, the tooling is mature, and it's built into the TensorFlow ecosystem. But it has limitations.

Continue reading the full article on TildAlice

DEV Community

ONNX Runtime Mobile: 8ms Inference on iPhone 13

The 200ms Problem

Why I Switched from TFLite to ONNX Runtime

Top comments (0)