Trying to Run AI on Low-End Devices? Here’s What We Learned

#ai #deeplearning #productivity #reactnative

We recently built something that frankly we weren’t sure was possible when we started: a lightweight AI engine that runs document detection and face matching directly on the edge — on phones, desktops and even browsers, completely offline. And we didn’t use TensorFlow, ONNX or any of the usual ML frameworks. The idea was to make it run anywhere, without internet or GPU dependency, and actually perform well.

We ran into all kinds of issues — bloated model sizes, slow browser load times, memory limits on mobile and the usual cross-platform headaches. So instead of patching together existing tools, we ended up writing our own inference engine in C++, keeping it minimal, and wrapping it for Android (NDK), iOS, Flutter, React Native, WebAssembly and desktop. For the models, we trained them to stay under 15MB and optimized them to be fast and resource-efficient. One of those models KIMORA is our document understanding model — it can detect corners, fix skew and understand layout even in tricky conditions like poor lighting.

The toughest part was getting it to run smoothly in browsers. We chunked the model and binary files, downloaded them in parallel, and cached them for instant reuse — this alone brought our browser AI load time down. Looking back, building everything from scratch gave us more control, but it also taught us how hard real edge AI is when you actually try to make it work everywhere.

If you’re working on lightweight AI for constrained environments, or just tired of relying on bloated ML stacks, I hope this gives you a few useful ideas.

If you’re curious to know more ([https://blog.extrieve.com/general/from-idea-to-execution-building-a-cross-platform-ai-engine-for-the-edge/]

Would love to hear your thoughts and happy to answer any questions if you’re working on something similar!