DEV Community: Hassan Imam

AccessLens — A persistent on-device visual interpreter for the blind

Hassan Imam — Sun, 24 May 2026 06:46:43 +0000

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

AccessLens is an Android app that turns a Pixel 8 worn on a lanyard into a persistent visual interpreter for blind and low-vision users. Rear camera forward, bone-conduction headphones in, the phone describes the world — and remembers.

The problem with existing visual-assist apps (Be My Eyes, Seeing AI, Envision) is that they are screen-bound, stateless, and cloud-bound. A blind person navigates by sound; an app that needs you to hold up a phone, tap a screen, and wait on a datacenter interrupts that signal stream. AccessLens is different on three axes:

Worn, not held: Two physical buttons drive everything. Volume Up → read text in front of me, verbatim. Volume Down → describe this room with memory from earlier today and recent days. A gyroscope-based SettleTrigger also fires a description automatically when the user stops walking.
Persistent memory across days/weeks: Every gesture writes a SessionEvent to a SQLCipher database. A nightly Gemma 4 worker compresses each day into a DailySummary; Sundays roll into a WeeklyMemory. LONG-press prompts splice that history into the Gemma call, so the model has a world model of this specific apartment, this specific day.
100% on-device: No image, audio, embedding, or location leaves the phone. SQLCipher + Android KeyStore (AES-256-GCM wrapping a SecureRandom DB key) protect everything at rest. A SelfTest on first launch opens a probe DB with the wrong key and asserts the read fails before the app reports encryption healthy.

Face recognition uses MediaPipe FaceLandmarker to produce a 192-dim L2-normalized landmark vector per enrolled person. At identify time, cosine-similar matches inject only the names into the Gemma prompt — Gemma never sees a face crop or an embedding, code-review-verified.

Three gestures, three target latencies (Pixel 8, Tensor G3): SINGLE ≤14 s end-to-end, DOUBLE scales with text length, LONG adds memory retrieval. Voice fillers ("I'm looking…", "Still looking…") cover the prefill gap so the user hears acoustic progress, not dead air. Everything runs with airplane mode on after the model is pushed once.

Demo

Code

hassaninnovate / AccessLens

A blind person's lanyard, powered by Gemma 4 E2B on a Pixel 8. 100% on-device visual assistant with persistent memory and face recognition.

AccessLens

An always-on, on-device visual interpreter for blind and low-vision users — built for the DEV.to "Build with Gemma 4" challenge.

Pitch

A phone worn on a lanyard becomes the user's "eyes." The rear camera is always on; the gyroscope watches for motion. When the user stops walking, AccessLens describes what's in front of them. When a friend whose face has been enrolled walks into frame, the phone says their name. When the user wants to read what's in front of them, they press Volume Up; for a richer description of the room, Volume Down. Bluetooth bone-conduction headphones carry the audio — the user's ears stay free for the world.

What separates AccessLens from existing apps like Be My Eyes, Seeing AI, and Envision is persistent on-device memory + 100% on-device inference. Existing tools are stateless and cloud-bound. AccessLens runs Gemma 4 E2B locally via LiteRT-LM, encrypts…

View on GitHub

Apache 2.0. The repo includes the full Kotlin/Compose source, the encryption self-test, the nightly compression WorkManager job, and a README documenting which file enforces each of the six privacy invariants.

Reference implementation that taught me the LiteRT-LM API: google-ai-edge/gallery — adapted patterns are cited inline in inference/LiteRtLmRuntime.kt.

How I Used Gemma 4

Model Chosen: Gemma 4 E2B (litert-community/gemma-4-E2B-it-litert-lm, ~2.59 GB int4), loaded once at service start via LiteRT-LM 0.12.0 with Backend.GPU() for the vision adapter.

This model was the perfect fit for AccessLens for three core reasons:

Multimodal in one model, on-device: Image input goes in as Content.ImageBytes, text as Content.Text, in that order (per the Gallery's "for accurate last token" comment), all through one Engine.generate call. No separate vision encoder + decoder to stitch, no second model to keep resident. That fits the latency budget and the memory budget on Pixel-class 8 GB RAM.
E2B is the smallest competent multimodal Gemma 4: It fits in RAM alongside MediaPipe FaceLandmarker, a CameraX pipeline, and the Compose UI without OOM-ing on a Pixel 8. I prototyped against E4B (the brief's "quality path") and measured the latency lift on one-sentence scene descriptions — not worth doubling the prefill cost for a use case where the user is waiting in real time, lanyard-mounted, with no screen feedback. The architecture is parametric on the model path (InferenceRuntime.load(modelPath, Modality)), so a future LONG-press branch could swap to E4B in one line. I documented the tradeoff in the README and shipped E2B for all three gestures.
Gemma is the only practical way to do nightly memory compression on-device: The 03:00 CompressionWorker calls Gemma in JSON mode to compress the day's SessionEvent rows into a single DailySummary, and on Sundays into a WeeklyMemory. That's a real LLM task — extracting persistent facts, deduplicating recurring observations, distinguishing "the blue mug is mine" from "I saw a blue mug today" — and it has to happen without a network. E2B handles it in under a minute per day on Tensor G3 while the phone is on the charger.

Production fixes discovered during implementation:

The LiteRT-LM Android artifact must be 0.12.0 or later — 0.11.0 fails vision init inside vision_litert_compiled_model_executor.cc:273 on Tensor G3.
AndroidManifest needs <uses-native-library> declarations for libOpenCL.so, libOpenCL-car.so, libOpenCL-pixel.so (all android:required="false"). Without them, Android 12+ silently denies GPU OpenCL access and the vision backend fails to initialize. Documented at ai.google.dev/edge/litert-lm/android.

The thing I'm proudest of: when you uninstall AccessLens, the KeyStore wrapping key is destroyed with it. The encrypted DB on disk becomes cryptographically unrecoverable. The user can throw the phone away and their memories — kitchen layout, friends' faces, places they've been — go with it. That's what on-device privacy is supposed to mean, and Gemma 4 + LiteRT-LM made it possible without compromising the assistant on quality.

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

Hassan Imam — Sat, 23 May 2026 17:27:35 +0000