Shipping FreeSay to 2GB-RAM Phones: What We Cut, What We Kept

#ai #llm #mobile #performance

I built FreeSay — an AI-powered speaking tutor — and a surprising amount of engineering time went into making it run well on cheap phones. This post is about what got cut and what survived, targeting devices with 2GB of RAM and flaky connections.

Why this matters

Most of FreeSay's audience lives in markets where the median phone is not flagship-class. If the app chokes on a Samsung A03 or a Redmi 9A, we lose the exact users we were built for. So "runs on 2GB" was a hard constraint from day one, not a post-launch optimization.

What we kept

Real-time LLM-backed conversation in 15 target languages. Every turn goes to the cloud for quality; we gave up on fully offline speech-to-text because the accuracy gap was too large for beginners.
Cloud TTS for the tutor voice. We tested Piper on-device for Android — the voice-quality drop was too jarring relative to the APK bloat.
Aggressive per-turn caching on the server. Common corrections, translations, and vocabulary lookups are memoized so repeat learners pay close-to-zero latency on overlap.
A bare-metal server in Korea instead of serverless. Regional subscription pricing cannot survive Lambda bills once conversation volume grows.

What we cut

Heavy onboarding animations. Replaced with a single static screen and a play button. Every frame skipped was a frame the GPU did not have to allocate.
Rich-text chat bubbles. We tried markdown rendering in the chat log, then fell back to plain text with a handful of explicit highlight types — correction, new word, translation.
Pre-downloaded lesson content. Everything is fetched on demand; the APK ships small, and the first launch only downloads what the user actually opens.
Optional in-app video demos. For slow connections, a 30-second video was the difference between "intrigued" and "gave up." We replaced them with text + a single still frame.

The stack, briefly

React Native for a single codebase across iOS and Android. On-demand correction via LLM calls. Cloud TTS. A Puppeteer pipeline for localized Play Store / App Store screenshots.

What I would do differently

If I were starting again, I would benchmark the APK size and cold-start time on a 2GB device before writing a single feature. Almost every hard call we made later — TTS source, animation budget, bundle splitting — came back to that one number: how long until the learner can speak their first sentence?