<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hassan Shah</title>
    <description>The latest articles on DEV Community by Hassan Shah (@hassan_shah_733ea1eb37c88).</description>
    <link>https://dev.to/hassan_shah_733ea1eb37c88</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3919991%2F2abd43d0-6aa6-47be-a38a-ca0e563dc2d5.jpg</url>
      <title>DEV Community: Hassan Shah</title>
      <link>https://dev.to/hassan_shah_733ea1eb37c88</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hassan_shah_733ea1eb37c88"/>
    <language>en</language>
    <item>
      <title>AccessLens — A persistent on-device visual interpreter for the blind</title>
      <dc:creator>Hassan Shah</dc:creator>
      <pubDate>Sun, 24 May 2026 06:46:43 +0000</pubDate>
      <link>https://dev.to/hassan_shah_733ea1eb37c88/accesslens-a-persistent-on-device-visual-interpreter-for-the-blind-8h6</link>
      <guid>https://dev.to/hassan_shah_733ea1eb37c88/accesslens-a-persistent-on-device-visual-interpreter-for-the-blind-8h6</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;AccessLens&lt;/strong&gt; is an Android app that turns a Pixel 8 worn on a lanyard into a persistent visual interpreter for blind and low-vision users. Rear camera forward, bone-conduction headphones in, the phone describes the world — and &lt;em&gt;remembers&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The problem with existing visual-assist apps (Be My Eyes, Seeing AI, Envision) is that they are screen-bound, stateless, and cloud-bound. A blind person navigates by sound; an app that needs you to hold up a phone, tap a screen, and wait on a datacenter interrupts that signal stream. AccessLens is different on three axes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Worn, not held:&lt;/strong&gt; Two physical buttons drive everything. Volume Up → read text in front of me, verbatim. Volume Down → describe this room with memory from earlier today and recent days. A gyroscope-based &lt;code&gt;SettleTrigger&lt;/code&gt; also fires a description &lt;em&gt;automatically&lt;/em&gt; when the user stops walking.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Persistent memory across days/weeks:&lt;/strong&gt; Every gesture writes a &lt;code&gt;SessionEvent&lt;/code&gt; to a SQLCipher database. A nightly Gemma 4 worker compresses each day into a &lt;code&gt;DailySummary&lt;/code&gt;; Sundays roll into a &lt;code&gt;WeeklyMemory&lt;/code&gt;. LONG-press prompts splice that history into the Gemma call, so the model has a world model of &lt;em&gt;this specific apartment, this specific day&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;100% on-device:&lt;/strong&gt; No image, audio, embedding, or location leaves the phone. SQLCipher + Android KeyStore (AES-256-GCM wrapping a &lt;code&gt;SecureRandom&lt;/code&gt; DB key) protect everything at rest. A &lt;code&gt;SelfTest&lt;/code&gt; on first launch opens a probe DB with the wrong key and asserts the read fails before the app reports encryption healthy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Face recognition uses MediaPipe FaceLandmarker to produce a 192-dim L2-normalized landmark vector per enrolled person. At identify time, cosine-similar matches inject &lt;strong&gt;only the names&lt;/strong&gt; into the Gemma prompt — Gemma never sees a face crop or an embedding, code-review-verified.&lt;/p&gt;

&lt;p&gt;Three gestures, three target latencies (Pixel 8, Tensor G3): SINGLE ≤14 s end-to-end, DOUBLE scales with text length, LONG adds memory retrieval. Voice fillers ("I'm looking…", "Still looking…") cover the prefill gap so the user hears acoustic progress, not dead air. Everything runs with airplane mode on after the model is pushed once.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/4ZlaVqXlAc4"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/hassaninnovate" rel="noopener noreferrer"&gt;
        hassaninnovate
      &lt;/a&gt; / &lt;a href="https://github.com/hassaninnovate/AccessLens" rel="noopener noreferrer"&gt;
        AccessLens
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      A blind person's lanyard, powered by Gemma 4 E2B on a Pixel 8. 100% on-device visual assistant with persistent memory and face recognition.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;AccessLens&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;An always-on, on-device visual interpreter for blind and low-vision users — built for the DEV.to "Build with Gemma 4" challenge.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/hassaninnovate/AccessLens/LICENSE" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/5b60841bea9e11d9d0b0950d690c9bc554e06385634056a7d5d62a15d1a4eabe/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4170616368655f322e302d626c75652e737667" alt="License: Apache 2.0"&gt;&lt;/a&gt;
&lt;a href="https://github.com/hassaninnovate/AccessLens#" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/89ea7be8eadb64987fcd6f39c332b8850d3f1b411750f72dcfc6b907ec73aeac/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f706c6174666f726d2d416e64726f696425323031332532422d677265656e2e737667" alt="Platform"&gt;&lt;/a&gt;
&lt;a href="https://github.com/hassaninnovate/AccessLens#" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/aad418e159887ad6a1a78f63f4092bfe5749f2d8ebcefa4e3187d02db161396f/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6d6f64656c2d47656d6d615f345f4532422d6f72616e67652e737667" alt="Model"&gt;&lt;/a&gt;
&lt;a href="https://github.com/hassaninnovate/AccessLens#privacy-invariants" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/0a8d8b5a8c05c0d263eae2efe64ce37bb841d10d59e9ec1d73cf64b8dce0ac4a/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f696e666572656e63652d3130302532355f6f6e2d2d6465766963652d627269676874677265656e2e737667" alt="Privacy"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Pitch&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;A phone worn on a lanyard becomes the user's "eyes." The rear camera is always on; the gyroscope watches for motion. &lt;strong&gt;When the user stops walking, AccessLens describes what's in front of them.&lt;/strong&gt; When a friend whose face has been enrolled walks into frame, the phone says their name. When the user wants to read what's in front of them, they press &lt;em&gt;Volume Up&lt;/em&gt;; for a richer description of the room, &lt;em&gt;Volume Down&lt;/em&gt;. Bluetooth bone-conduction headphones carry the audio — the user's ears stay free for the world.&lt;/p&gt;
&lt;p&gt;What separates AccessLens from existing apps like Be My Eyes, Seeing AI, and Envision is &lt;strong&gt;persistent on-device memory + 100% on-device inference&lt;/strong&gt;. Existing tools are stateless and cloud-bound. AccessLens runs Gemma 4 E2B locally via LiteRT-LM, encrypts…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/hassaninnovate/AccessLens" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Apache 2.0. The repo includes the full Kotlin/Compose source, the encryption self-test, the nightly compression WorkManager job, and a README documenting which file enforces each of the six privacy invariants.&lt;/p&gt;

&lt;p&gt;Reference implementation that taught me the LiteRT-LM API: &lt;a href="https://github.com/google-ai-edge/gallery" rel="noopener noreferrer"&gt;google-ai-edge/gallery&lt;/a&gt; — adapted patterns are cited inline in &lt;code&gt;inference/LiteRtLmRuntime.kt&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Model Chosen:&lt;/strong&gt; &lt;strong&gt;Gemma 4 E2B&lt;/strong&gt; (&lt;code&gt;litert-community/gemma-4-E2B-it-litert-lm&lt;/code&gt;, ~2.59 GB int4), loaded once at service start via LiteRT-LM 0.12.0 with &lt;code&gt;Backend.GPU()&lt;/code&gt; for the vision adapter. &lt;/p&gt;

&lt;p&gt;This model was the perfect fit for AccessLens for three core reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal in one model, on-device:&lt;/strong&gt; Image input goes in as &lt;code&gt;Content.ImageBytes&lt;/code&gt;, text as &lt;code&gt;Content.Text&lt;/code&gt;, in that order (per the Gallery's "for accurate last token" comment), all through one &lt;code&gt;Engine.generate&lt;/code&gt; call. No separate vision encoder + decoder to stitch, no second model to keep resident. That fits the latency budget &lt;em&gt;and&lt;/em&gt; the memory budget on Pixel-class 8 GB RAM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;E2B is the smallest competent multimodal Gemma 4:&lt;/strong&gt; It fits in RAM alongside MediaPipe FaceLandmarker, a CameraX pipeline, and the Compose UI without OOM-ing on a Pixel 8. I prototyped against E4B (the brief's "quality path") and measured the latency lift on one-sentence scene descriptions — not worth doubling the prefill cost for a use case where the user is waiting in real time, lanyard-mounted, with no screen feedback. The architecture is &lt;em&gt;parametric&lt;/em&gt; on the model path (&lt;code&gt;InferenceRuntime.load(modelPath, Modality)&lt;/code&gt;), so a future LONG-press branch could swap to E4B in one line. I documented the tradeoff in the README and shipped E2B for all three gestures.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemma is the only practical way to do nightly memory compression on-device:&lt;/strong&gt; The 03:00 &lt;code&gt;CompressionWorker&lt;/code&gt; calls Gemma in JSON mode to compress the day's &lt;code&gt;SessionEvent&lt;/code&gt; rows into a single &lt;code&gt;DailySummary&lt;/code&gt;, and on Sundays into a &lt;code&gt;WeeklyMemory&lt;/code&gt;. That's a real LLM task — extracting persistent facts, deduplicating recurring observations, distinguishing "the blue mug is mine" from "I saw a blue mug today" — and it has to happen without a network. E2B handles it in under a minute per day on Tensor G3 while the phone is on the charger.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Production fixes discovered during implementation:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  The LiteRT-LM Android artifact must be &lt;strong&gt;0.12.0 or later&lt;/strong&gt; — 0.11.0 fails vision init inside &lt;code&gt;vision_litert_compiled_model_executor.cc:273&lt;/code&gt; on Tensor G3.&lt;/li&gt;
&lt;li&gt;  AndroidManifest needs &lt;code&gt;&amp;lt;uses-native-library&amp;gt;&lt;/code&gt; declarations for &lt;code&gt;libOpenCL.so&lt;/code&gt;, &lt;code&gt;libOpenCL-car.so&lt;/code&gt;, &lt;code&gt;libOpenCL-pixel.so&lt;/code&gt; (all &lt;code&gt;android:required="false"&lt;/code&gt;). Without them, Android 12+ silently denies GPU OpenCL access and the vision backend fails to initialize. Documented at &lt;a href="https://ai.google.dev/edge/litert-lm/android" rel="noopener noreferrer"&gt;ai.google.dev/edge/litert-lm/android&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The thing I'm proudest of:&lt;/strong&gt; when you uninstall AccessLens, the KeyStore wrapping key is destroyed with it. The encrypted DB on disk becomes cryptographically unrecoverable. The user can throw the phone away and their memories — kitchen layout, friends' faces, places they've been — go with it. That's what on-device privacy is supposed to mean, and Gemma 4 + LiteRT-LM made it possible without compromising the assistant on quality.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
    <item>
      <title>AccessLens — a blind person's lanyard, powered by Gemma 4 on-device</title>
      <dc:creator>Hassan Shah</dc:creator>
      <pubDate>Sat, 23 May 2026 17:27:35 +0000</pubDate>
      <link>https://dev.to/hassan_shah_733ea1eb37c88/accesslens-a-blind-persons-lanyard-powered-by-gemma-4-on-device-3l8b</link>
      <guid>https://dev.to/hassan_shah_733ea1eb37c88/accesslens-a-blind-persons-lanyard-powered-by-gemma-4-on-device-3l8b</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;AccessLens&lt;/strong&gt; is an Android app that turns a Pixel 8 worn on a lanyard into a persistent visual interpreter for blind and low-vision users. Rear camera forward, bone-conduction headphones in, the phone describes the world — and &lt;em&gt;remembers&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The problem with existing visual-assist apps (Be My Eyes, Seeing AI, Envision) is that they are screen-bound, stateless, and cloud-bound. A blind person navigates by sound; an app that needs you to hold up a phone, tap a screen, and wait on a datacenter interrupts that signal stream. AccessLens is different on three axes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Worn, not held.&lt;/strong&gt; Two physical buttons drive everything. Volume Up → read text in front of me, verbatim. Volume Down → describe this room with memory from earlier today and recent days. A gyroscope-based &lt;code&gt;SettleTrigger&lt;/code&gt; also fires a description &lt;em&gt;automatically&lt;/em&gt; when the user stops walking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent memory across days/weeks.&lt;/strong&gt; Every gesture writes a &lt;code&gt;SessionEvent&lt;/code&gt; to a SQLCipher database. A nightly Gemma 4 worker compresses each day into a &lt;code&gt;DailySummary&lt;/code&gt;; Sundays roll into a &lt;code&gt;WeeklyMemory&lt;/code&gt;. LONG-press prompts splice that history into the Gemma call, so the model has a world model of &lt;em&gt;this specific apartment, this specific day&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100% on-device.&lt;/strong&gt; No image, audio, embedding, or location leaves the phone. SQLCipher + Android KeyStore (AES-256-GCM wrapping a &lt;code&gt;SecureRandom&lt;/code&gt; DB key) protect everything at rest. A &lt;code&gt;SelfTest&lt;/code&gt; on first launch opens a probe DB with the wrong key and asserts the read fails before the app reports encryption healthy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Face recognition uses MediaPipe FaceLandmarker to produce a 192-dim L2-normalized landmark vector per enrolled person. At identify time, cosine-similar matches inject &lt;strong&gt;only the names&lt;/strong&gt; into the Gemma prompt — Gemma never sees a face crop or an embedding, code-review-verified.&lt;/p&gt;

&lt;p&gt;Three gestures, three target latencies (Pixel 8, Tensor G3): SINGLE ≤14 s end-to-end, DOUBLE scales with text length, LONG adds memory retrieval. Voice fillers ("I'm looking…", "Still looking…") cover the prefill gap so the user hears acoustic progress, not dead air. Everything runs with airplane mode on after the model is pushed once.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/4ZlaVqXlAc4"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/hassaninnovate" rel="noopener noreferrer"&gt;
        hassaninnovate
      &lt;/a&gt; / &lt;a href="https://github.com/hassaninnovate/AccessLens" rel="noopener noreferrer"&gt;
        AccessLens
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      A blind person's lanyard, powered by Gemma 4 E2B on a Pixel 8. 100% on-device visual assistant with persistent memory and face recognition.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;AccessLens&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;An always-on, on-device visual interpreter for blind and low-vision users — built for the DEV.to "Build with Gemma 4" challenge.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/hassaninnovate/AccessLens/LICENSE" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/5b60841bea9e11d9d0b0950d690c9bc554e06385634056a7d5d62a15d1a4eabe/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4170616368655f322e302d626c75652e737667" alt="License: Apache 2.0"&gt;&lt;/a&gt;
&lt;a href="https://github.com/hassaninnovate/AccessLens#" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/89ea7be8eadb64987fcd6f39c332b8850d3f1b411750f72dcfc6b907ec73aeac/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f706c6174666f726d2d416e64726f696425323031332532422d677265656e2e737667" alt="Platform"&gt;&lt;/a&gt;
&lt;a href="https://github.com/hassaninnovate/AccessLens#" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/aad418e159887ad6a1a78f63f4092bfe5749f2d8ebcefa4e3187d02db161396f/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6d6f64656c2d47656d6d615f345f4532422d6f72616e67652e737667" alt="Model"&gt;&lt;/a&gt;
&lt;a href="https://github.com/hassaninnovate/AccessLens#privacy-invariants" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/0a8d8b5a8c05c0d263eae2efe64ce37bb841d10d59e9ec1d73cf64b8dce0ac4a/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f696e666572656e63652d3130302532355f6f6e2d2d6465766963652d627269676874677265656e2e737667" alt="Privacy"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Pitch&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;A phone worn on a lanyard becomes the user's "eyes." The rear camera is always on; the gyroscope watches for motion. &lt;strong&gt;When the user stops walking, AccessLens describes what's in front of them.&lt;/strong&gt; When a friend whose face has been enrolled walks into frame, the phone says their name. When the user wants to read what's in front of them, they press &lt;em&gt;Volume Up&lt;/em&gt;; for a richer description of the room, &lt;em&gt;Volume Down&lt;/em&gt;. Bluetooth bone-conduction headphones carry the audio — the user's ears stay free for the world.&lt;/p&gt;
&lt;p&gt;What separates AccessLens from existing apps like Be My Eyes, Seeing AI, and Envision is &lt;strong&gt;persistent on-device memory + 100% on-device inference&lt;/strong&gt;. Existing tools are stateless and cloud-bound. AccessLens runs Gemma 4 E2B locally via LiteRT-LM, encrypts…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/hassaninnovate/AccessLens" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Apache 2.0. The repo includes the full Kotlin/Compose source, the encryption self-test, the nightly compression WorkManager job, and a README documenting which file enforces each of the six privacy invariants.&lt;/p&gt;

&lt;p&gt;Reference implementation that taught me the LiteRT-LM API: &lt;a href="https://github.com/google-ai-edge/gallery" rel="noopener noreferrer"&gt;google-ai-edge/gallery&lt;/a&gt; — adapted patterns are cited inline in &lt;code&gt;inference/LiteRtLmRuntime.kt&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Model: Gemma 4 E2B&lt;/strong&gt; (&lt;code&gt;litert-community/gemma-4-E2B-it-litert-lm&lt;/code&gt;, ~2.59 GB int4), loaded once at service start via LiteRT-LM 0.12.0 with &lt;code&gt;Backend.GPU()&lt;/code&gt; for the vision adapter. Three reasons E2B was the right fit:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multimodal in one model, on-device.&lt;/strong&gt; Image input goes in as &lt;code&gt;Content.ImageBytes&lt;/code&gt;, text as &lt;code&gt;Content.Text&lt;/code&gt;, in that order (per the Gallery's "for accurate last token" comment), all through one &lt;code&gt;Engine.generate&lt;/code&gt; call. No separate vision encoder + decoder to stitch, no second model to keep resident. That fits the latency budget &lt;em&gt;and&lt;/em&gt; the memory budget on Pixel-class 8 GB RAM.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;E2B is the smallest competent multimodal Gemma 4.&lt;/strong&gt; It fits in RAM alongside MediaPipe FaceLandmarker, a CameraX pipeline, and the Compose UI without OOM-ing on a Pixel 8. I prototyped against E4B (the brief's "quality path") and measured the latency lift on one-sentence scene descriptions — not worth doubling the prefill cost for a use case where the user is waiting in real time, lanyard-mounted, with no screen feedback. The architecture is &lt;em&gt;parametric&lt;/em&gt; on the model path (&lt;code&gt;InferenceRuntime.load(modelPath, Modality)&lt;/code&gt;), so a future LONG-press branch could swap to E4B in one line. I documented the tradeoff in the README and shipped E2B for all three gestures.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gemma is the only practical way to do nightly memory compression on-device.&lt;/strong&gt; The 03:00 &lt;code&gt;CompressionWorker&lt;/code&gt; calls Gemma in JSON mode to compress the day's &lt;code&gt;SessionEvent&lt;/code&gt; rows into a single &lt;code&gt;DailySummary&lt;/code&gt;, and on Sundays into a &lt;code&gt;WeeklyMemory&lt;/code&gt;. That's a real LLM task — extracting persistent facts, deduplicating recurring observations, distinguishing "the blue mug is mine" from "I saw a blue mug today" — and it has to happen without a network. E2B handles it in under a minute per day on Tensor G3 while the phone is on the charger.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Two production fixes the brief didn't cover, in case they help someone else:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The LiteRT-LM Android artifact must be &lt;strong&gt;0.12.0 or later&lt;/strong&gt; — 0.11.0 fails vision init inside &lt;code&gt;vision_litert_compiled_model_executor.cc:273&lt;/code&gt; on Tensor G3.&lt;/li&gt;
&lt;li&gt;AndroidManifest needs &lt;code&gt;&amp;lt;uses-native-library&amp;gt;&lt;/code&gt; declarations for &lt;code&gt;libOpenCL.so&lt;/code&gt;, &lt;code&gt;libOpenCL-car.so&lt;/code&gt;, &lt;code&gt;libOpenCL-pixel.so&lt;/code&gt; (all &lt;code&gt;android:required="false"&lt;/code&gt;). Without them, Android 12+ silently denies GPU OpenCL access and the vision backend fails to initialize. Documented at &lt;a href="https://ai.google.dev/edge/litert-lm/android" rel="noopener noreferrer"&gt;ai.google.dev/edge/litert-lm/android&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The thing I'm proudest of:&lt;/strong&gt; when you uninstall AccessLens, the KeyStore wrapping key is destroyed with it. The encrypted DB on disk becomes cryptographically unrecoverable. The user can throw the phone away and their memories — kitchen layout, friends' faces, places they've been — go with it. That's what on-device privacy is supposed to mean, and Gemma 4 + LiteRT-LM made it possible without compromising the assistant on quality.&lt;/p&gt;

</description>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>android</category>
      <category>kotlin</category>
    </item>
  </channel>
</rss>
