On-Device AI: What Nobody Tells You About the Tradeoffs

#ai #machinelearning #mobile #engineering

Everyone's building cloud AI. I've been building AI that runs with no internet, on a phone, in real-world conditions.

Here's what I've learned.

Model size vs accuracy in the wild

In the lab, your model hits 94% accuracy. In production, it's handling variable lighting, partial occlusion, camera shake, and phones that haven't been updated since 2021. Your 94% becomes something lower.

The instinct is to make the model bigger and more accurate. The problem: bigger models are slower, and on-device speed matters a lot when a person is standing in a room waiting for a result.

The real answer is usually: accept a lower accuracy threshold and design your UX to handle uncertainty gracefully. A confidence score + "tap to confirm" beats a slow high-confidence answer that times out.

Category-level beats object-level at scale

If you're doing object detection across thousands of SKUs, training a model to identify every individual product is a losing strategy. Too many classes, too many edge cases, constant retraining as products change.

Category-level detection — "this is a drinks product, this is a snack, this is a cleaning product" — is dramatically simpler and more stable. You can add object-level identification on top for high-value cases.

The feedback loop problem

On-device models don't automatically improve. You need a pipeline:

User makes a correction (override the model's output)
Correction is logged with context (lighting, device, conditions)
Flagged for review
Feeds the next training cycle

Without this, your model is frozen the moment you ship it. With it, field conditions become training data.

The data pipeline is harder than the model

Getting inference results off the device and into your backend — with context, without data loss, without requiring constant connectivity — is the actual hard problem. The model is the easy part.

Offline-first sync, conflict resolution, context preservation across sessions. That's where the real engineering lives.

On-device AI is genuinely exciting but it's a different discipline from cloud inference. The constraints are real and they change the design of everything.