dhiraj himani

Posted on Jun 10

What if your AI app didn't need the internet?

#kotlin #ai #offline #aifirst

A story about building Narra — an offline AI e-reader — and what it taught me about what software can actually do without a cloud.

Summary: I built an e-reader that translates any book into a bilingual reading experience entirely on your device — no internet, no subscription, no data leaving your phone. This is what I learned about on-device AI, architectural discipline, and why "offline-first" is a feature, not a constraint. Android on Google Play · GitHub

The moment the idea became real

A few months ago I was reading a novel in German. I'm not fluent. Every two or three sentences I'd hit a word I didn't know, open Google Translate on my phone, switch back to the book, and try to remember where I was.

I stopped counting how many times I broke flow. It was somewhere north of forty in a single chapter.

I thought: this is a solved problem. Surely some app does this elegantly. I looked. The good ones needed an internet connection. The offline ones were clunky. None of them combined translation, context-sensitive vocabulary, and something that actually helped me remember what happened.

So I started building one.

The constraint that changed everything

Early on I made a decision that felt limiting at the time: no cloud AI calls. Ever.

No OpenAI. No DeepL API. No Google Translate endpoint. Nothing that requires a network connection or a subscription.

At first this felt like a technical handicap. Then I realized it was an architectural gift.

When you remove the cloud from the equation, you're forced to think about where computation actually belongs. And the answer, increasingly, is: closer to the user than most apps assume.

The phones and laptops people carry today are astonishingly capable machines. A modern iPhone processes neural network operations faster than the server I learned to program on ten years ago. A mid-range Android phone has more RAM than many developer workstations did in 2015.

We've been offloading to the cloud out of habit, not necessity.

What Narra actually does

Here's the simple version:

You pick any book. You tell Narra which language you want to learn. The app processes the text, one paragraph at a time, using an AI model running on your phone's own processor. Every original paragraph gets a translation placed immediately beneath it — side by side, on the same page.

Tap a word. Get a definition and a usage example. With the on-device AI model enabled, you also get grammar notes and deeper context. All local. All instant.

Export the whole bilingual book as a standard file you can open on any e-reader.

Nothing is sent anywhere. Nothing is stored on a server. Your reading habits, your language choices, your highlighted vocabulary — none of it leaves your device.

The architecture question nobody asks early enough

Most developers starting an AI app ask: "Which model API should I use?"

The better question is: "What does my app actually need the AI to do, and where does that computation belong?"

These are different questions. The first one defaults you to the cloud. The second one makes you think.

When I started designing Narra, I mapped out every AI task the app needed:

Translate a paragraph in the source language to the target language
Extract vocabulary terms and their definitions
Identify grammar patterns worth noting (requires the on-device AI model)

None of these require real-time internet data. None of them need to know what any other user is doing. None of them benefit from being centralised.

They benefit from being fast, private, and available without connectivity.

That's what on-device AI is good for.

What on-device AI looks like in practice

Small Language Models — the kind that run on a phone — are genuinely good at structured, well-scoped tasks. Ask one to translate a single paragraph from English to French with consistent terminology: it does this well. Ask it to generate a structured JSON description of a chapter's mood and key events: it does this reliably if you constrain it properly.

The word "properly" is doing a lot of work there.

Getting an AI model to produce reliable, structured output is one of the most underappreciated engineering problems in the space right now. Prompts alone are not enough. If you want an AI to produce output your code can always parse without errors, you need to work at a lower level — constraining the model's choices during generation itself, not just in the instructions you give it.

Once you understand this, the whole reliability question shifts. You're no longer hoping the model formats things correctly. You're ensuring it structurally cannot do otherwise.

This is the kind of thinking that separates an AI feature from an AI-first product.

The thermal reality of running AI on a phone

Here's something the demos never show you: if you just run a language model as fast as possible on a phone, the device gets hot. Fast. Within a few minutes, the CPU throttles, inference slows to a crawl, and the user's phone becomes uncomfortable to hold.

This isn't a model problem. It's a scheduling problem.

Real on-device AI requires thermal awareness. You need to monitor the device's heat state, pace the model, give the processor breathing room between heavy computation bursts, and back off gracefully when the device signals it's working too hard.

Building this properly — invisibly to the user — taught me more about the gap between "AI demo" and "AI product" than anything else in the project.

A demo runs the model flat-out for thirty seconds.

A product runs the model sustainably for thirty minutes.

The architecture that handles this gracefully is the actual moat.

What surprised me most about building this

Cross-platform is genuinely achievable now. I built Narra with a single codebase that compiles to Android, iOS, and Desktop. The same business logic, the same AI processing pipeline, the same data format — running on three completely different platforms. A few years ago this was a fantasy. Today it's a realistic choice for serious apps.

Privacy is a feature, not a constraint. Every decision to keep data on-device made the app simpler, not harder. No authentication. No sync. No GDPR compliance surface. No server bills. The "100% offline" badge isn't a compromise — it's a selling point.

The translation quality is genuinely impressive. The on-device AI models available today produce translations that rival cloud services for the focused task of paragraph-by-paragraph literary translation. The tradeoff is real: these models are large and demand capable hardware — they run well on modern flagship devices but struggle on older or mid-range phones. For users who want a lighter experience, Narra also supports smaller on-device ML models that work like a smart dictionary — fast and broad, with lower hardware requirements. The grammar-tip depth of the full AI model is worth it for the right device.

What this means for builders

If you're a developer thinking about adding AI to an app — or building an AI-first product from scratch — here's what I'd want you to take from this:

1. Question the cloud assumption.

Ask whether each AI task your app needs actually requires a server. Many don't. On-device models are surprisingly capable for focused, well-scoped tasks.

2. Design for reliability over capability.

A model that reliably produces parseable, useful output for a specific task is more valuable than a more powerful model that sometimes fails. Narrow the task. Constrain the output. Ship with confidence.

3. Build the architecture before the features.

The thing that makes Narra work isn't the translation quality — it's the layered architecture that keeps the AI completely isolated from the UI. Adding a new language, swapping the model, changing the output format: none of these touch the screens or the navigation. That's intentional design, and it only happens if you build the structure first.

4. Privacy is a moat.

In 2026, "your data never leaves your device" is not a minor footnote. It's a reason people choose one app over another. Build offline-first when you can. The users who care about privacy are often the users who pay.

Where it goes from here

Narra is live on Android. The core experience — import, translate, read, export — is running and available today.

The part I'm most looking forward to: hearing from people who actually use it to read books in languages they're learning, and finding out what breaks, what surprises them, and what they need that I haven't built yet.

That's the part no architecture document can prepare you for.

If you found this useful, I'll be writing more about building on-device AI apps with Kotlin Multiplatform — the decisions that worked, the ones that didn't, and what the ecosystem looks like from the inside.

Follow along if that sounds interesting.

DEV Community