<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dimitar Hadzhiradev</title>
    <description>The latest articles on DEV Community by Dimitar Hadzhiradev (@dih78).</description>
    <link>https://dev.to/dih78</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3923222%2F7465c026-3fd8-409f-8cc8-39902a094ae6.png</url>
      <title>DEV Community: Dimitar Hadzhiradev</title>
      <link>https://dev.to/dih78</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dih78"/>
    <language>en</language>
    <item>
      <title>Bringing Gemma 4 E2B to the Edge: Building a Privacy-First Dream Analyzer with Flutter &amp; LiteRT</title>
      <dc:creator>Dimitar Hadzhiradev</dc:creator>
      <pubDate>Sat, 23 May 2026 17:42:19 +0000</pubDate>
      <link>https://dev.to/dih78/bringing-gemma-4-e2b-to-the-edge-building-a-privacy-first-dream-analyzer-with-flutter-litert-50e9</link>
      <guid>https://dev.to/dih78/bringing-gemma-4-e2b-to-the-edge-building-a-privacy-first-dream-analyzer-with-flutter-litert-50e9</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Every AI app says it respects your privacy.&lt;/p&gt;

&lt;p&gt;Then it uploads your most personal data to the cloud.&lt;/p&gt;

&lt;p&gt;When we started building &lt;strong&gt;Remora&lt;/strong&gt; — a dream journaling and psychological interpretation app — we faced a difficult question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How do you analyze deeply personal subconscious experiences without sending them to a remote server?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We wanted users to wake up, record a dream, and receive rich AI-powered analysis directly on their phone.&lt;/p&gt;

&lt;p&gt;No cloud inference.&lt;br&gt;
No persistent uploads.&lt;br&gt;
No centralized storage of emotional or psychological data.&lt;/p&gt;

&lt;p&gt;That requirement immediately ruled out most modern AI architectures.&lt;/p&gt;

&lt;p&gt;Then we discovered Gemma 4.&lt;/p&gt;

&lt;p&gt;Its compact E2B footprint, multimodal support, and mobile-first optimization made it uniquely suited for true on-device inference.&lt;/p&gt;

&lt;p&gt;But integrating cutting-edge local AI into a production Flutter app turned out to be far more challenging than expected.&lt;/p&gt;

&lt;p&gt;This is the engineering story behind making it work.&lt;/p&gt;


&lt;h1&gt;
  
  
  Why Gemma 4 Changed the Architecture
&lt;/h1&gt;

&lt;p&gt;Most mobile AI today still relies on a thin-client model:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Capture user data&lt;/li&gt;
&lt;li&gt;Upload to cloud APIs&lt;/li&gt;
&lt;li&gt;Run inference remotely&lt;/li&gt;
&lt;li&gt;Return results&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That approach breaks down completely for sensitive psychological analysis.&lt;/p&gt;

&lt;p&gt;Dream journals often contain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;trauma,&lt;/li&gt;
&lt;li&gt;fears,&lt;/li&gt;
&lt;li&gt;relationships,&lt;/li&gt;
&lt;li&gt;emotional states,&lt;/li&gt;
&lt;li&gt;deeply personal memories.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;offline capability,&lt;/li&gt;
&lt;li&gt;low latency,&lt;/li&gt;
&lt;li&gt;multimodal understanding,&lt;/li&gt;
&lt;li&gt;and strict data locality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gemma 4 E2B gave us a realistic path toward all four.&lt;/p&gt;

&lt;p&gt;Running directly on-device also unlocked:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;instant responses,&lt;/li&gt;
&lt;li&gt;airplane-mode support,&lt;/li&gt;
&lt;li&gt;reduced infrastructure cost,&lt;/li&gt;
&lt;li&gt;and dramatically improved user trust.&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  Challenge 1: The Model Format Wars (GGUF vs LiteRT)
&lt;/h1&gt;

&lt;p&gt;Our first instinct was straightforward:&lt;/p&gt;

&lt;p&gt;Download a &lt;code&gt;.gguf&lt;/code&gt; quantization from Hugging Face and wire it into Flutter.&lt;/p&gt;

&lt;p&gt;That assumption lasted about five minutes.&lt;/p&gt;

&lt;p&gt;The moment the engine initialized on Android, the app crashed with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IllegalArgumentException:
Unsupported model format: .gguf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;The open-source ecosystem heavily favors &lt;code&gt;.gguf&lt;/code&gt; because of tools like &lt;code&gt;llama.cpp&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But Android hardware acceleration operates in a very different ecosystem.&lt;/p&gt;

&lt;p&gt;Google’s mobile AI stack relies on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MediaPipe,&lt;/li&gt;
&lt;li&gt;LiteRT,&lt;/li&gt;
&lt;li&gt;LiteRT-LM delegates,&lt;/li&gt;
&lt;li&gt;and NPU-optimized tensor layouts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means models must be packaged as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;.task&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.bin&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;or &lt;code&gt;.litertlm&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;—not GGUF.&lt;/p&gt;

&lt;p&gt;Once we switched to the official LiteRT package, memory usage dropped significantly and inference stabilized immediately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dart"&gt;&lt;code&gt;&lt;span class="n"&gt;FlutterGemma&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;installModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nl"&gt;modelType:&lt;/span&gt; &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;gemma4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nl"&gt;fileType:&lt;/span&gt; &lt;span class="n"&gt;ModelFileType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;litertlm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromNetwork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s"&gt;'https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm/resolve/main/gemma-4-E2B-it.litertlm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withProgress&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;progress&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Downloading 1.5GB Edge Model: &lt;/span&gt;&lt;span class="si"&gt;${progress}&lt;/span&gt;&lt;span class="s"&gt;%'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was our first major realization:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Edge AI is not just “smaller cloud AI.”&lt;br&gt;
It is an entirely different deployment architecture.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h1&gt;
  
  
  Challenge 2: The “Code 13” Audio Crash
&lt;/h1&gt;

&lt;p&gt;One of the most exciting features of Gemma 4 is native multimodal capability.&lt;/p&gt;

&lt;p&gt;Our goal was simple:&lt;/p&gt;

&lt;p&gt;Users should be able to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;wake up,&lt;/li&gt;
&lt;li&gt;tap record,&lt;/li&gt;
&lt;li&gt;describe their dream verbally,&lt;/li&gt;
&lt;li&gt;and receive private on-device analysis.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We recorded audio and passed it into the local model.&lt;/p&gt;

&lt;p&gt;Immediate crash.&lt;/p&gt;

&lt;p&gt;We switched encoders:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;.m4a&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;PCM16 WAV&lt;/li&gt;
&lt;li&gt;16kHz mono&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Crash again.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Failed to start streaming (code: 13)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Root Cause
&lt;/h2&gt;

&lt;p&gt;After digging through Google’s AI Edge Gallery implementation, we discovered:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Current community LiteRT weights do not yet expose fully fused audio subgraphs&lt;/li&gt;
&lt;li&gt;Qualcomm QNN delegates require certain audio operators to run on CPU&lt;/li&gt;
&lt;li&gt;Current Flutter bindings don’t yet support backend splitting between CPU and NPU execution&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;text generation worked perfectly,&lt;/li&gt;
&lt;li&gt;audio tensor routing did not.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  The Solution: A Secure Hybrid Pipeline
&lt;/h1&gt;

&lt;p&gt;Instead of abandoning voice support, we built a privacy-preserving fallback architecture.&lt;/p&gt;

&lt;p&gt;If local audio inference fails:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;audio is sent to a transient speech-to-text endpoint,&lt;/li&gt;
&lt;li&gt;no audio is persisted,&lt;/li&gt;
&lt;li&gt;only transcription text is returned,&lt;/li&gt;
&lt;li&gt;all psychological interpretation still happens locally via Gemma 4.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That preserved the most sensitive part of the workflow entirely on-device.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dart"&gt;&lt;code&gt;&lt;span class="n"&gt;Future&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;DreamAnalysisResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;analyzeAudio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;String&lt;/span&gt; &lt;span class="n"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;async&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_localEngine&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isReady&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_localEngine&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;analyzeAudio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="n"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'Code 13 detected. Engaging secure fallback.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;readAsBytes&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;dio&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;'/dreams/transcribe'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nl"&gt;data:&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kt"&gt;String&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'transcription'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

  &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;_localEngine&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;analyzeDream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;DreamAnalysisResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nl"&gt;title:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nl"&gt;interpretation:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;interpretation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nl"&gt;tags:&lt;/span&gt; &lt;span class="p"&gt;[..&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'#voice_log'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="nl"&gt;transcribedText:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ended up becoming one of the most important architectural decisions in the app.&lt;/p&gt;

&lt;p&gt;Not because it was perfect —&lt;br&gt;
but because it degraded gracefully while preserving privacy guarantees.&lt;/p&gt;


&lt;h1&gt;
  
  
  Challenge 3: Emulators Lie
&lt;/h1&gt;

&lt;p&gt;During development we tested inference using the Android emulator.&lt;/p&gt;

&lt;p&gt;Everything failed instantly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Connection closed before full header was received
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first we suspected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;networking,&lt;/li&gt;
&lt;li&gt;Flutter isolates,&lt;/li&gt;
&lt;li&gt;or broken FFI bindings.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of those were the problem.&lt;/p&gt;

&lt;p&gt;The real issue was architecture mismatch.&lt;/p&gt;

&lt;p&gt;LiteRT-LM delegates are optimized specifically for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;arm64-v8a&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;mobile NPUs&lt;/li&gt;
&lt;li&gt;physical AI acceleration hardware&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The x86 emulator environment simply could not execute the delegate stack correctly.&lt;/p&gt;

&lt;p&gt;Once we moved testing onto a physical Pixel device:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;binaries mapped correctly,&lt;/li&gt;
&lt;li&gt;NPU acceleration activated,&lt;/li&gt;
&lt;li&gt;inference latency dropped dramatically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That moment changed how we approached mobile AI QA entirely.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Edge AI development without real hardware is basically guesswork.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h1&gt;
  
  
  Looking Forward: Android AI Core &amp;amp; Gemini Nano
&lt;/h1&gt;

&lt;p&gt;Downloading a 1.5GB local model works —&lt;br&gt;
but it is not the ideal long-term UX.&lt;/p&gt;

&lt;p&gt;Large bundled models create:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;storage pressure,&lt;/li&gt;
&lt;li&gt;installation friction,&lt;/li&gt;
&lt;li&gt;and slower onboarding.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To future-proof the architecture, we integrated Android AI Core support.&lt;/p&gt;

&lt;p&gt;Before downloading Gemma 4 locally, Remora now checks whether:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gemini Nano,&lt;/li&gt;
&lt;li&gt;or another system-level model,&lt;/li&gt;
&lt;li&gt;is already available through Android’s native AI layer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If available:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inference becomes instant,&lt;/li&gt;
&lt;li&gt;no model download is required,&lt;/li&gt;
&lt;li&gt;and privacy remains intact.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates a hybrid architecture where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OS-native models are preferred,&lt;/li&gt;
&lt;li&gt;Gemma 4 acts as the portable fallback,&lt;/li&gt;
&lt;li&gt;and all inference still remains local-first.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  What Building with Gemma 4 Taught Us
&lt;/h1&gt;

&lt;p&gt;Working with Gemma 4 fundamentally changed how we think about mobile apps.&lt;/p&gt;

&lt;p&gt;For years, mobile AI has largely meant:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Call an API and wait.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But local multimodal models enable something very different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;applications that function offline,&lt;/li&gt;
&lt;li&gt;preserve privacy by default,&lt;/li&gt;
&lt;li&gt;reduce infrastructure cost,&lt;/li&gt;
&lt;li&gt;and feel dramatically more responsive.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tooling ecosystem is still early.&lt;br&gt;
The documentation is fragmented.&lt;br&gt;
The hardware constraints are real.&lt;/p&gt;

&lt;p&gt;But the direction is obvious.&lt;/p&gt;

&lt;p&gt;Edge AI is becoming a first-class application platform.&lt;/p&gt;

&lt;p&gt;And Gemma 4 is one of the first models that genuinely makes that future practical for mobile developers.&lt;/p&gt;




&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;Remora started as an experiment:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Could we build a psychologically meaningful AI experience without compromising user privacy?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Thanks to Gemma 4, LiteRT, and Android’s emerging edge AI ecosystem, the answer is increasingly yes.&lt;/p&gt;

&lt;p&gt;We still have challenges ahead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;audio graph support,&lt;/li&gt;
&lt;li&gt;smaller quantizations,&lt;/li&gt;
&lt;li&gt;memory optimization,&lt;/li&gt;
&lt;li&gt;and broader device compatibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But for the first time, building truly private multimodal AI apps on smartphones feels achievable.&lt;/p&gt;

&lt;p&gt;And that changes everything. What challenges you the most in Edge AI journey?&lt;/p&gt;




&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/devteam/join-the-gemma-4-challenge-3000-prize-pool-for-ten-winners-23in"&gt;Gemma 4 Challenge Announcement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/gemma" rel="noopener noreferrer"&gt;Gemma by Google&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/edge/litert" rel="noopener noreferrer"&gt;LiteRT Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>flutter</category>
    </item>
    <item>
      <title>The Subconscious Powered by Edge AI</title>
      <dc:creator>Dimitar Hadzhiradev</dc:creator>
      <pubDate>Sat, 23 May 2026 17:23:28 +0000</pubDate>
      <link>https://dev.to/dih78/remoraai-the-subconscious-social-network-powered-by-edge-ai-oe9</link>
      <guid>https://dev.to/dih78/remoraai-the-subconscious-social-network-powered-by-edge-ai-oe9</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;RemoraAI: The Subconscious Social Network Powered by Edge AI&lt;/strong&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;Dreams are our most private thoughts.&lt;/p&gt;

&lt;p&gt;Yet most AI-powered journaling apps require users to upload deeply personal emotions, fears, and subconscious experiences directly to the cloud.&lt;/p&gt;

&lt;p&gt;Remora was built to challenge that assumption.&lt;/p&gt;

&lt;p&gt;Remora is a &lt;strong&gt;privacy-first “Subconscious Social Network”&lt;/strong&gt; powered by Gemma 4 running directly on-device using LiteRT-LM and Flutter.&lt;/p&gt;

&lt;p&gt;The app allows users to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;record dreams via voice,&lt;/li&gt;
&lt;li&gt;receive AI-powered psychological interpretation,&lt;/li&gt;
&lt;li&gt;detect recurring subconscious patterns over time,&lt;/li&gt;
&lt;li&gt;generate surreal dream visuals,&lt;/li&gt;
&lt;li&gt;and optionally publish anonymized dreams to a public community feed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key innovation is that the sensitive psychological analysis happens &lt;strong&gt;entirely on-device&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;No raw dream data needs to leave the smartphone.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Problem
&lt;/h2&gt;

&lt;p&gt;Dream journaling has historically remained a private, offline activity because users are understandably uncomfortable uploading vulnerable psychological content to centralized servers.&lt;/p&gt;

&lt;p&gt;We wanted to answer a difficult question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can modern multimodal AI deliver meaningful emotional analysis while preserving user privacy?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Remora demonstrates that the answer is yes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Core Flow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;User records a dream using voice input&lt;/li&gt;
&lt;li&gt;Gemma 4 processes the narrative locally&lt;/li&gt;
&lt;li&gt;The app generates:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;a dream title,&lt;/li&gt;
&lt;li&gt;emotional interpretation,&lt;/li&gt;
&lt;li&gt;thematic tags,&lt;/li&gt;
&lt;li&gt;and subconscious motif detection

&lt;ol&gt;
&lt;li&gt;User optionally generates AI dream artwork&lt;/li&gt;
&lt;li&gt;User may privately store or anonymously publish the dream&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Demo Content
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Offline “Privacy Mode”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptf5oum7k01m4dyijqcq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptf5oum7k01m4dyijqcq.png" alt="Offline “Privacy Mode”" width="800" height="1778"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI-generated dream art&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fglw1y74wdi899yxy2l4z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fglw1y74wdi899yxy2l4z.png" alt="AI-generated dream art" width="800" height="1778"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Community feed scrolling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk2kbhl7jk9bnwct88z0w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk2kbhl7jk9bnwct88z0w.png" alt="Community feed scrolling" width="800" height="1778"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tech Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Flutter&lt;/li&gt;
&lt;li&gt;LiteRT-LM&lt;/li&gt;
&lt;li&gt;MediaPipe&lt;/li&gt;
&lt;li&gt;Flutter FFI&lt;/li&gt;
&lt;li&gt;FastAPI&lt;/li&gt;
&lt;li&gt;Android AI Core&lt;/li&gt;
&lt;li&gt;Gemini Nano&lt;/li&gt;
&lt;li&gt;Imagen 4&lt;/li&gt;
&lt;li&gt;Vector Embeddings + RAG&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architecture Highlights
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Local AI Layer
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Gemma 4 E2B via LiteRT-LM&lt;/li&gt;
&lt;li&gt;On-device inference&lt;/li&gt;
&lt;li&gt;NPU acceleration&lt;/li&gt;
&lt;li&gt;Offline-capable “Privacy Mode”&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cloud Layer
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Optional dream image generation&lt;/li&gt;
&lt;li&gt;Anonymous community feed&lt;/li&gt;
&lt;li&gt;Secure transient speech-to-text fallback&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Memory Layer
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Vector embeddings for recurring dream motifs&lt;/li&gt;
&lt;li&gt;Retrieval-Augmented Generation (RAG)&lt;/li&gt;
&lt;li&gt;Long-term subconscious pattern analysis&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;We selected the &lt;strong&gt;Gemma 4 E2B&lt;/strong&gt; model because it sits at the ideal intersection of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;mobile performance,&lt;/li&gt;
&lt;li&gt;low memory footprint,&lt;/li&gt;
&lt;li&gt;multimodal capability,&lt;/li&gt;
&lt;li&gt;and meaningful reasoning quality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Previous local models were either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;too large for mobile deployment,&lt;/li&gt;
&lt;li&gt;too slow for real-time inference,&lt;/li&gt;
&lt;li&gt;or incapable of nuanced psychological interpretation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gemma 4 E2B solved all three.&lt;/p&gt;

&lt;p&gt;Using LiteRT-LM, the model runs directly on-device through Android NPUs or Android AI Core (Gemini Nano where available).&lt;/p&gt;

&lt;p&gt;This enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fully offline dream analysis,&lt;/li&gt;
&lt;li&gt;dramatically reduced latency,&lt;/li&gt;
&lt;li&gt;improved privacy,&lt;/li&gt;
&lt;li&gt;and lower infrastructure cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Local Inference Pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dart"&gt;&lt;code&gt;&lt;span class="n"&gt;FlutterGemma&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;installModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nl"&gt;modelType:&lt;/span&gt; &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;gemma4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nl"&gt;fileType:&lt;/span&gt; &lt;span class="n"&gt;ModelFileType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;litertlm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromNetwork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s"&gt;'https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm/resolve/main/gemma-4-E2B-it.litertlm'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Hardest Engineering Problem
&lt;/h2&gt;

&lt;p&gt;One of the biggest challenges was multimodal audio processing.&lt;/p&gt;

&lt;p&gt;Although Gemma 4 supports audio understanding conceptually, current LiteRT community weights lack fully fused audio execution graphs for mobile delegates.&lt;/p&gt;

&lt;p&gt;Attempting native audio inference produced:&lt;/p&gt;

&lt;p&gt;After investigating Google’s AI Edge Gallery implementation, we discovered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unsupported audio tensor routing,&lt;/li&gt;
&lt;li&gt;delegate backend limitations,&lt;/li&gt;
&lt;li&gt;and missing Flutter bindings for CPU/NPU graph splitting.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of abandoning voice dreams entirely, we engineered a &lt;strong&gt;Secure Hybrid Loop&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Audio is transiently transcribed&lt;/li&gt;
&lt;li&gt;No raw data is persisted&lt;/li&gt;
&lt;li&gt;Transcription text returns immediately&lt;/li&gt;
&lt;li&gt;Gemma 4 performs all psychological interpretation locally&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This preserved the most sensitive part of the experience entirely on-device.&lt;/p&gt;




&lt;h2&gt;
  
  
  Subconscious RAG
&lt;/h2&gt;

&lt;p&gt;Remora is not just a dream diary.&lt;/p&gt;

&lt;p&gt;Over time, it becomes a semantic memory system for the user’s subconscious.&lt;/p&gt;

&lt;p&gt;Dream entities are vectorized using embeddings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;characters,&lt;/li&gt;
&lt;li&gt;emotions,&lt;/li&gt;
&lt;li&gt;locations,&lt;/li&gt;
&lt;li&gt;recurring symbols,&lt;/li&gt;
&lt;li&gt;and narrative structures.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If a user repeatedly dreams about:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“A woman in a red coat”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;…the system detects the recurring motif and surfaces psychological pattern insights over months or years.&lt;/p&gt;

&lt;p&gt;This transforms dream logging from passive journaling into longitudinal subconscious analysis.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dream Visualization
&lt;/h2&gt;

&lt;p&gt;After local interpretation is complete, users can optionally generate dream artwork using Imagen 4.&lt;/p&gt;

&lt;p&gt;The backend converts the interpreted dream into a surreal cinematic visual prompt and generates high-resolution dream imagery.&lt;/p&gt;

&lt;p&gt;This creates a hybrid architecture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Psychological analysis&lt;/td&gt;
&lt;td&gt;On-device&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dream embeddings&lt;/td&gt;
&lt;td&gt;On-device&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sensitive interpretation&lt;/td&gt;
&lt;td&gt;On-device&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visual generation&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Community publishing&lt;/td&gt;
&lt;td&gt;Optional&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Community Layer
&lt;/h2&gt;

&lt;p&gt;By default, every dream remains private.&lt;/p&gt;

&lt;p&gt;Users may optionally anonymize and publish dreams to the Remora community feed, creating a surreal stream of humanity’s collective subconscious.&lt;/p&gt;

&lt;p&gt;Other users can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;upvote bizarre dreams,&lt;/li&gt;
&lt;li&gt;react to recurring themes,&lt;/li&gt;
&lt;li&gt;or share dreams with therapists or friends.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This transforms deeply personal subconscious experiences into optional social storytelling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Gemma 4 Matters
&lt;/h2&gt;

&lt;p&gt;Before Gemma 4, building an app like Remora was largely impractical.&lt;/p&gt;

&lt;p&gt;The model needed to be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lightweight enough for smartphones,&lt;/li&gt;
&lt;li&gt;capable of emotional nuance,&lt;/li&gt;
&lt;li&gt;fast enough for real-time interaction,&lt;/li&gt;
&lt;li&gt;and deployable through modern mobile inference stacks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gemma 4 E2B made that architecture possible.&lt;/p&gt;

&lt;p&gt;It allowed us to move psychological AI away from centralized cloud systems and directly into the user’s pocket.&lt;/p&gt;

&lt;p&gt;That shift fundamentally changes what privacy-first AI applications can become.&lt;/p&gt;




&lt;h2&gt;
  
  
  Future Work
&lt;/h2&gt;

&lt;p&gt;We plan to expand Remora with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;native multimodal audio execution,&lt;/li&gt;
&lt;li&gt;local image generation,&lt;/li&gt;
&lt;li&gt;lucid dream detection,&lt;/li&gt;
&lt;li&gt;and cross-dream narrative mapping.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As edge AI tooling matures, applications like Remora will increasingly blur the line between local software and personal AI companions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Building Remora with Gemma 4 demonstrated something important:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Edge AI is no longer experimental.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For the first time, mobile devices are capable of delivering meaningful multimodal AI experiences while preserving user privacy by default.&lt;/p&gt;

&lt;p&gt;That opens the door to an entirely new generation of personal AI applications.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>flutter</category>
    </item>
  </channel>
</rss>
