This is a submission for the Gemma 4 Challenge: Build with Gemma 4
What I Built
NeuralPocket — a private multimodal AI assistant that runs entirely on your device. Available as both an Android app and a web app. No cloud, no subscription, no data leaving your hands.
Honest About My Motivation
I've participated in Google hackathons several times. Each time I built something real, put in the work — and each time walked away with just a participation badge 😄 This time I want to actually place, though I know there are plenty of strong projects out there!
So NeuralPocket is not a demo and not a proof-of-concept. It's a full-featured app with real architecture that solves a real problem.
The problem: modern AI assistants are brilliant — until you lose Wi-Fi. On a plane, in the mountains, roaming abroad, they become useless icons. And every message you type, every photo you send, flies off to someone else's servers.
Google gave me an extra push: the AI Edge Gallery app simply refused to install on my Android 9. Even though the phone has a 64-bit OS — which matters, since LiteRT-LM only runs on 64-bit. Instead of giving up, I figured it out myself. That became the starting point for NeuralPocket.
I wanted an assistant that:
- works fully offline — always, everywhere
- never sends your data anywhere
- understands text, photos, and audio — in one chat
- runs on both Android and in the browser
What NeuralPocket Can Do
- 📷 Photo analysis — snap a menu in Japan → translation and context; photograph a broken part → repair advice; photograph a document → ask questions about it
- 🎤 Voice input — record up to 30 seconds, converted to WAV, processed on-device
- 💬 Multiple independent chats with different system prompts — "Translator", "Tech Assistant", "Personal Journal"
- ⚙️ Configurable context memory — 0–5 conversation pairs to balance coherence and context window
- 🎨 Markdown rendering — model responses display with full formatting: code, lists, emphasis
Demo
🎬 Android Demo Video (will come later...)
Code
Both projects are fully open source:
- 🤖 Android (Kotlin + LiteRT-LM) → github.com/premananda108/NeuralPocket · download APK
- 🌐 Web (React 19 + TypeScript + WebGPU) → github.com/premananda108/NeuralPocketWeb
How I Used Gemma 4
Choosing the Model
I chose Gemma 4 E2B IT (2B parameters, ~2.6 GB) as the primary model for three reasons:
- Native multimodal input — text, image, and audio in a single request, no workarounds needed
- Compact size — fits on a mid-range Android phone with 4+ GB RAM
-
One model, two platforms —
.litertlmfor Android LiteRT-LM,.web.taskfor WebGPU in the browser
For devices with 6+ GB RAM, the app offers Gemma 4 E4B (~3.7 GB) as a more capable option. The 31B Dense model is overkill for on-device use cases for now.
Architecture: Two Platforms, One Model
┌─────────────────────────────────────────────────┐
│ NeuralPocket │
├──────────────────────┬──────────────────────────┤
│ Android App │ Web App │
│ Kotlin │ React 19 + TypeScript │
├──────────────────────┼──────────────────────────┤
│ LiteRT-LM SDK │ MediaPipe Tasks GenAI │
│ (native runtime) │ Web Worker + WebGPU │
├──────────────────────┴──────────────────────────┤
│ Gemma 4 E2B IT / E4B IT │
│ (running locally on device) │
└─────────────────────────────────────────────────┘
Android: LiteRT-LM
Stack: Kotlin + Google AI Edge LiteRT-LM + CameraX + MVVM
The engine automatically selects the best available backend — GPU via Vulkan or OpenCL, falling back to CPU via XNNPack. Concurrent inference calls are serialized through a Mutex to prevent race conditions.
Key architectural decisions:
- A single
StateFlow<ChatUiState>as the source of truth — the UI only observes, never mutates directly - Chat history is written atomically via a temp file — no data loss on crash
- The vision encoder loads only when an image is present — saves RAM
- Preflight check on first launch: RAM, ABI, free storage — the app warns if the device doesn't meet the minimum requirements
Performance:
- GPU (Vulkan/OpenCL): ~15–30 tokens/sec
- CPU-only (XNNPack): ~5–10 tokens/sec
- Requirements: Android 8+, arm64, 4+ GB RAM
All three screenshots were taken in airplane mode — no network, everything running locally:
Web: WebGPU Right in the Browser
Stack: React 19 + TypeScript + Vite + Tailwind CSS v4 + MediaPipe Tasks GenAI
All inference runs inside a Web Worker — generation never blocks the UI, keeping the interface responsive during streaming. Models are cached in OPFS (Origin Private File System): first launch downloads ~2.6 GB, every subsequent launch starts instantly without a network connection.
Three model presets are supported: Gemma 4 E2B, Gemma 4 E4B, and Gemma 3 Multimodal. You can also provide a custom model URL.
The web app is built as a PWA (Progressive Web App) — you can install it on your computer as a standalone app with one click from the browser, just like YouTube or other web services. Once installed, it appears in your app menu and opens in its own window without an address bar.
Web version in action (all computation happens locally in the browser via WebGPU):
Honest caveat about offline: after the first launch the app works without a network. But it's not fully autonomous out of the box: the MediaPipe runtime loads from jsDelivr, and fonts load from Google Fonts. For full offline you'd need to self-host those dependencies.
Honest caveat about multimodal in the web: at the time of development I couldn't find web-optimized multimodal models for Gemma 4 — available versions only support text. However, I found a fully multimodal model from the previous generation — gemma-3n-E2B-it-int4-Web.litertlm — which supports displaying text, images and audio directly in the browser. That became the third preset in the web version.
A note on how fast things move. While building NeuralPocket, Google released Gemini 3.5 Flash — and first impressions suggest it's a notable step up from 3.1. It handles complex multi-step tasks confidently: for example, it wrote a full test suite for the web version of NeuralPocket on the first try, something that used to take several iterations. It's remarkable how fast this space evolves — the world changes while you're still writing the article.
At this pace, in a year you might just need to download the latest Gemma and ask it to build the whole app itself. Probably. Maybe. 😄
Privacy as Architecture, Not Marketing
NeuralPocket sends nothing anywhere — not messages, not photos, not chat history, not analytics. This isn't a setting you toggle. It's a consequence of the architecture: there's no server that could receive anything. Works in airplane mode. No account, no subscription.
Summary: Android vs Web
Two apps, one idea — but different trade-offs:
| 🤖 Android | 🌐 Web | |
|---|---|---|
| Installation | APK (~36 MB) | None — just open in browser |
| Install as app | ✅ native | ✅ PWA |
| Model | Gemma 4 E2B / E4B | Gemma 4 E2B / E4B |
| Text chat | ✅ | ✅ |
| Photo input | ✅ | ⚠️ Gemma 3n only |
| Audio input | ✅ | ⚠️ Gemma 3n only |
| Offline | ✅ after downloading | ⚠️ after first launch |
| models | and downloading models | |
| Performance | ~15–30 tok/s (GPU) | depends on browser WebGPU |
| Requirements | Android 8+, arm64 | Chrome / Edge with WebGPU |
| Multiple chats | ✅ | ✅ |
| Custom model | ❌ | ✅ by URL |
Need maximum multimodality and full offline? Go Android. Want to try it right now without installing anything? Go Web.
📦 Models: gemma-4-E2B-it-litert-lm · gemma-4-E4B-it-litert-lm on HuggingFace
Built with ❤️ on Gemma 4 + Google AI Edge LiteRT-LM







Top comments (0)