<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Umer</title>
    <description>The latest articles on DEV Community by Umer (@umarpazir11).</description>
    <link>https://dev.to/umarpazir11</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3920290%2F9e624fb5-3718-4c46-9f51-1ce924b59cbf.jpeg</url>
      <title>DEV Community: Umer</title>
      <link>https://dev.to/umarpazir11</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/umarpazir11"/>
    <language>en</language>
    <item>
      <title>Built a fully offline RAG app for Android using Gemma 4 + LiteRT-LM. Your PDFs never leave the device — GDPR compliance collapses to one bullet point. Here's how. 🔒📱</title>
      <dc:creator>Umer</dc:creator>
      <pubDate>Fri, 08 May 2026 22:02:07 +0000</pubDate>
      <link>https://dev.to/umarpazir11/built-a-fully-offline-rag-app-for-android-using-gemma-4-litert-lm-your-pdfs-never-leave-the-2kch</link>
      <guid>https://dev.to/umarpazir11/built-a-fully-offline-rag-app-for-android-using-gemma-4-litert-lm-your-pdfs-never-leave-the-2kch</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/umarpazir11/your-pdfs-never-leave-your-pocket-building-a-100-offline-rag-app-with-gemma-4-litert-lm-340" class="crayons-story__hidden-navigation-link"&gt;Your PDFs Never Leave Your Pocket: Building a 100% Offline RAG App with Gemma 4 + LiteRT-LM&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
      &lt;a href="https://dev.to/umarpazir11/your-pdfs-never-leave-your-pocket-building-a-100-offline-rag-app-with-gemma-4-litert-lm-340" class="crayons-article__context-note crayons-article__context-note__feed"&gt;&lt;p&gt;Gemma 4 Challenge: Write about Gemma 4 Submission&lt;/p&gt;

&lt;/a&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/umarpazir11" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3920290%2F9e624fb5-3718-4c46-9f51-1ce924b59cbf.jpeg" alt="umarpazir11 profile" class="crayons-avatar__image" width="460" height="460"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/umarpazir11" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Umer
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Umer
                
              
              &lt;div id="story-author-preview-content-3634603" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/umarpazir11" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3920290%2F9e624fb5-3718-4c46-9f51-1ce924b59cbf.jpeg" class="crayons-avatar__image" alt="" width="460" height="460"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Umer&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/umarpazir11/your-pdfs-never-leave-your-pocket-building-a-100-offline-rag-app-with-gemma-4-litert-lm-340" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;May 8&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/umarpazir11/your-pdfs-never-leave-your-pocket-building-a-100-offline-rag-app-with-gemma-4-litert-lm-340" id="article-link-3634603"&gt;
          Your PDFs Never Leave Your Pocket: Building a 100% Offline RAG App with Gemma 4 + LiteRT-LM
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/devchallenge"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;devchallenge&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/gemmachallenge"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;gemmachallenge&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/gemma"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;gemma&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/umarpazir11/your-pdfs-never-leave-your-pocket-building-a-100-offline-rag-app-with-gemma-4-litert-lm-340#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            11 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>ai</category>
      <category>android</category>
      <category>rag</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Your PDFs Never Leave Your Pocket: Building a 100% Offline RAG App with Gemma 4 + LiteRT-LM</title>
      <dc:creator>Umer</dc:creator>
      <pubDate>Fri, 08 May 2026 21:53:24 +0000</pubDate>
      <link>https://dev.to/umarpazir11/your-pdfs-never-leave-your-pocket-building-a-100-offline-rag-app-with-gemma-4-litert-lm-340</link>
      <guid>https://dev.to/umarpazir11/your-pdfs-never-leave-your-pocket-building-a-100-offline-rag-app-with-gemma-4-litert-lm-340</guid>
      <description>&lt;h1&gt;
  
  
  Your PDFs Never Leave Your Pocket: Building a 100% Offline RAG App with Gemma 4 + LiteRT-LM 🔒📱
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"We'd love to use AI on our internal documents… but legal said no."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you've ever worked with a German Mittelstand company — or honestly, any healthcare provider, law firm, or financial services team anywhere in the EU — you've heard a version of this sentence. And legal isn't being paranoid. They're being correct.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The moment an employee pastes a contract, a payslip, or a patient record into ChatGPT, that document becomes someone else's processing activity. Under &lt;strong&gt;GDPR Article 28&lt;/strong&gt;, the cloud AI provider becomes a data processor. You stay the controller. If those servers sit in the US, you've also tripped Chapter V transfer rules and the ghost of &lt;em&gt;Schrems II&lt;/em&gt;. Fines top out at &lt;strong&gt;€20 million or 4% of global turnover&lt;/strong&gt;, and the regulators are warming up.&lt;/p&gt;

&lt;p&gt;So here's the dilemma every European SME is sitting in right now: the productivity gains from "chat with your documents" are real and obvious, but the compliance surface is a minefield. Most teams resolve it the same way — they don't use AI on their sensitive stuff at all. The data just sits there, unsearchable.&lt;/p&gt;

&lt;p&gt;I wanted to fix that. Not by writing another DPA template, but by removing the cloud from the equation entirely.&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;PocketSage&lt;/strong&gt; — a fully offline, on-device RAG assistant for Android. You import a PDF, ask it questions, and get streaming answers from &lt;strong&gt;Gemma 4 E2B running natively on your phone&lt;/strong&gt;. No network calls. No API keys. No "your data may be used to improve our services." The model weights live in your app sandbox; the embeddings live in a Room database; airplane mode works perfectly.&lt;/p&gt;

&lt;p&gt;Let me walk you through how it's built.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5sooqz0rp8ptbqtyokb.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5sooqz0rp8ptbqtyokb.gif" alt="PocketSage demo" width="600" height="1337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🔗 &lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/umarpazir11/pocketsage" rel="noopener noreferrer"&gt;github.com/umarpazir11/pocketsage&lt;/a&gt;&lt;br&gt;
📦 &lt;strong&gt;Pre-built APK:&lt;/strong&gt; Grab it from the &lt;a href="https://github.com/umarpazir11/pocketsage/releases" rel="noopener noreferrer"&gt;Releases page&lt;/a&gt; — no build setup required, just side-load the LLM and go.&lt;br&gt;
⭐ Star it if you find it useful — it genuinely helps others discover the project.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Privacy Argument, Stated Plainly 🇪🇺
&lt;/h2&gt;

&lt;p&gt;I want to spend one more paragraph here because this is the &lt;em&gt;whole point&lt;/em&gt; of the project, not a footnote.&lt;/p&gt;

&lt;p&gt;When you build a cloud RAG pipeline for a German enterprise, here's what your compliance checklist actually looks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Sign a &lt;strong&gt;Data Processing Agreement&lt;/strong&gt; with your LLM provider (Article 28)&lt;/li&gt;
&lt;li&gt;✅ Conduct a &lt;strong&gt;Data Protection Impact Assessment&lt;/strong&gt; — DPIA — for high-risk processing (Article 35)&lt;/li&gt;
&lt;li&gt;✅ Document &lt;strong&gt;legal basis&lt;/strong&gt; under Article 6 for every category of data&lt;/li&gt;
&lt;li&gt;✅ Update your &lt;strong&gt;Record of Processing Activities&lt;/strong&gt; (Article 30)&lt;/li&gt;
&lt;li&gt;✅ Set up &lt;strong&gt;Standard Contractual Clauses&lt;/strong&gt; for any non-EU sub-processors&lt;/li&gt;
&lt;li&gt;✅ Implement &lt;strong&gt;PII redaction&lt;/strong&gt; before vectorization (because the prompt-and-document data hits a third-party server)&lt;/li&gt;
&lt;li&gt;✅ Build a "Right to be Forgotten" mechanism that can purge specific vectors from your store&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's a six-month project before you write a line of feature code.&lt;/p&gt;

&lt;p&gt;Now here's PocketSage's compliance checklist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ The data doesn't leave the device.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. There is &lt;strong&gt;no processor&lt;/strong&gt; because there is no processing happening anywhere except on hardware the user already owns. Article 28 doesn't apply. Chapter V transfers don't apply. There's no DPA to sign because there's no third party. This is &lt;strong&gt;privacy by design&lt;/strong&gt; in the most literal sense the regulation could possibly mean — the architecture itself makes the violation impossible.&lt;/p&gt;

&lt;p&gt;For a German SME evaluating "chat with your contracts" tools, this is the difference between a six-month legal review and a one-week pilot.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Tech Stack 🛠️
&lt;/h2&gt;

&lt;p&gt;PocketSage is a textbook &lt;strong&gt;Modern Android Development (MAD Skills)&lt;/strong&gt; app applied to a non-trivial ML problem. Three layers, clean separation, zero &lt;code&gt;android.*&lt;/code&gt; imports in the domain layer.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Choice&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;UI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Jetpack Compose + Material 3&lt;/td&gt;
&lt;td&gt;Single Activity, dynamic color, recruiter-recognisable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MVVM, Hilt DI, Navigation Compose&lt;/td&gt;
&lt;td&gt;Standard, testable, no surprises&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Concurrency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Kotlin Coroutines + &lt;code&gt;Flow&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Streaming tokens map cleanly onto &lt;code&gt;callbackFlow&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Persistence&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Room (SQLite)&lt;/td&gt;
&lt;td&gt;384-dim embeddings stored as &lt;code&gt;BLOB&lt;/code&gt;, cosine in Kotlin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Embeddings&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LiteRT (&lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;22 MB, well-benchmarked, runs anywhere&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PDF parsing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pdfbox-android&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Mature port, handles most consumer PDFs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM Inference&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;LiteRT-LM&lt;/strong&gt; + &lt;code&gt;gemma-4-E2B-it-litert-lm&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Google's official on-device GenAI orchestration layer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The whole RAG pipeline is roughly 500 lines of Kotlin once you strip the boilerplate. Honestly the hard part wasn't the code — it was choosing the right model file format. (More on that nightmare in a moment.)&lt;/p&gt;




&lt;h2&gt;
  
  
  How RAG Works Here, in Three Paragraphs 📚
&lt;/h2&gt;

&lt;p&gt;When you import a PDF, PocketSage extracts the text with PDFBox, splits it into ~800-character overlapping chunks, embeds each chunk with MiniLM (a tiny BERT-family model), and stores the resulting 384-dimensional vectors as raw bytes in a Room table. One-time per document, runs in the background, progress bar in the UI. Standard stuff.&lt;/p&gt;

&lt;p&gt;When you ask a question, the &lt;em&gt;same&lt;/em&gt; embedding model converts your question into a vector. The app computes cosine similarity between the question vector and every stored chunk, takes the top four matches, and stitches them into a prompt template that explicitly tells the LLM: &lt;em&gt;answer only from the supplied context&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The prompt is fed to Gemma 4 E2B running in LiteRT-LM's &lt;code&gt;Engine&lt;/code&gt; runtime, which streams tokens back through a callback. Each token is appended to a &lt;code&gt;StateFlow&amp;lt;String&amp;gt;&lt;/code&gt; that the chat screen renders in real time, with the retrieved chunks shown beneath each answer so you can verify the model isn't hallucinating. End-to-end, on a Pixel 8, you get first token in roughly 2-3 seconds and a full answer in 10-15. Not GPT-4, but very usable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Under the Hood: The &lt;code&gt;LiteRtLmRunner&lt;/code&gt; 🔧
&lt;/h2&gt;

&lt;p&gt;This is the piece I'm proudest of, and it's also the piece that took the longest to get right. LiteRT-LM is Google's new orchestration layer that sits on top of LiteRT (formerly TensorFlow Lite). It handles KV-cache management, prompt templating, and the streaming token API — all the GenAI-specific plumbing that you used to have to write yourself with raw TFLite.&lt;/p&gt;

&lt;p&gt;Here's the core of how PocketSage talks to Gemma 4:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Singleton&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LiteRtLmRunner&lt;/span&gt; &lt;span class="nd"&gt;@Inject&lt;/span&gt; &lt;span class="k"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nd"&gt;@ApplicationContext&lt;/span&gt; &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;modelRepo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ModelRepository&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;LlmRunner&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="nf"&gt;lazy&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;modelPath&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modelRepo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getModelPath&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;absolutePath&lt;/span&gt;
        &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;i&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;TAG&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Initializing engine — model: $modelPath"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;EngineConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;modelPath&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modelPath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;cacheDir&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cacheDir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;absolutePath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="c1"&gt;// maxNumTokens left at default — overriding it triggers a&lt;/span&gt;
                &lt;span class="c1"&gt;// DYNAMIC_UPDATE_SLICE tensor shape error. Painful lesson.&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;also&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;sessionConfig&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SessionConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;SamplerConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;topK&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BuildConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LLM_TOP_K&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;topP&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BuildConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LLM_TEMPERATURE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toDouble&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;seed&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="py"&gt;activeSession&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;

    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;callbackFlow&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modelRepo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isModelReady&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"Model not ready"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Cancel any in-flight generation — the user has asked something new&lt;/span&gt;
        &lt;span class="n"&gt;activeSession&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;cancelProcess&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;activeSession&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;session&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sessionConfig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;activeSession&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;

        &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateContentStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nf"&gt;listOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;InputData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
            &lt;span class="kd"&gt;object&lt;/span&gt; &lt;span class="err"&gt;: &lt;/span&gt;&lt;span class="nc"&gt;ResponseCallback&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;onNext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;trySend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;onDone&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;onError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;throwable&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Throwable&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;throwable&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;awaitClose&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;activeSession&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;
            &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="nf"&gt;flowOn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Dispatchers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;IO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are three things worth pausing on here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The &lt;code&gt;Engine&lt;/code&gt; is lazy and singleton-scoped.&lt;/strong&gt; Initializing the engine means loading 1.5 GB of model weights into memory and warming the KV cache. You absolutely do not want to do that on every query. Hilt's &lt;code&gt;@Singleton&lt;/code&gt; plus Kotlin's &lt;code&gt;by lazy&lt;/code&gt; give you a clean "load once, on first use" pattern that Just Works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. &lt;code&gt;callbackFlow&lt;/code&gt; is the bridge between LiteRT-LM's callback API and Kotlin's &lt;code&gt;Flow&lt;/code&gt;.&lt;/strong&gt; This is honestly one of the most elegant pieces of the coroutines library. The &lt;code&gt;ResponseCallback&lt;/code&gt; from LiteRT-LM gives you &lt;code&gt;onNext&lt;/code&gt; / &lt;code&gt;onDone&lt;/code&gt; / &lt;code&gt;onError&lt;/code&gt;, which maps perfectly onto &lt;code&gt;trySend&lt;/code&gt; / &lt;code&gt;close()&lt;/code&gt; / &lt;code&gt;close(throwable)&lt;/code&gt;. The &lt;code&gt;awaitClose&lt;/code&gt; block runs when the collector cancels — which means if the user navigates away from the chat screen mid-generation, the session gets cleaned up properly. No leaked native memory, no zombie inference threads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. &lt;code&gt;flowOn(Dispatchers.IO)&lt;/code&gt; keeps the UI thread sacred.&lt;/strong&gt; Token generation is CPU-heavy (XNNPACK is using 4 threads under the hood, hammering your big cores). If any of this leaked onto the main thread, scrolling would jank instantly. The dispatcher switch is one line and saves the entire UX.&lt;/p&gt;

&lt;p&gt;The cancellation logic — activeSession?.cancelProcess() before starting a new one — is the kind of detail you only learn by shipping. Without it, if a user types question A, then immediately question B before A finishes, you get two inference jobs racing for the same engine and your tokens come out interleaved like a poorly-shuffled deck of cards. Ask me how I know — the first session wins the engine lock, the second one's tokens come out scrambled. Took me an evening of "is the model broken?" debugging to spot it.&lt;/p&gt;




&lt;h2&gt;
  
  
  A War Story: The &lt;code&gt;.task&lt;/code&gt; vs &lt;code&gt;.litertlm&lt;/code&gt; Saga 😤
&lt;/h2&gt;

&lt;p&gt;I'll be honest about the part of this project that ate the most time. The first version of PocketSage used &lt;strong&gt;MediaPipe&lt;/strong&gt; + Gemma 2B with a &lt;code&gt;.task&lt;/code&gt; file. That stack works, but it's the older path. With Gemma 4, Google bifurcated the recommendation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;For experimentation:&lt;/strong&gt; Google AI Edge Gallery (you're a user, not a dev)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For production apps:&lt;/strong&gt; &lt;strong&gt;LiteRT-LM&lt;/strong&gt; with &lt;code&gt;.litertlm&lt;/code&gt; files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LiteRT-LM is the more modern API. It supports the new Gemma 4 features properly (per-layer embeddings, the mixed-precision quant scheme, longer context windows up to 32k via the Android runtime, with the model itself supporting 128k natively). But — and this caught me hard — &lt;strong&gt;LiteRT-LM only loads &lt;code&gt;.litertlm&lt;/code&gt; files&lt;/strong&gt;. If you grab the &lt;code&gt;.task&lt;/code&gt; file off Hugging Face and try to feed it to LiteRT-LM's &lt;code&gt;Engine&lt;/code&gt;, you'll get a cryptic init failure that does not mention the word "format" anywhere.&lt;/p&gt;

&lt;p&gt;The fix is finding the right artifact. For PocketSage that's &lt;a href="https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm" rel="noopener noreferrer"&gt;&lt;code&gt;litert-community/gemma-4-E2B-it-litert-lm&lt;/code&gt;&lt;/a&gt; on Hugging Face. It uses Gemma's mixed quantization scheme — a blend of 2-bit, 4-bit, and 8-bit weights — which is what gets the model down to ~1.5 GB of active RAM while keeping output quality solid.&lt;/p&gt;

&lt;p&gt;Lesson: when you're working with a brand-new runtime, &lt;strong&gt;always check that the model artifact and the runtime versions match before you write a single line of integration code.&lt;/strong&gt; Saved me a week of confused debugging the second time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance &amp;amp; Hardware Constraints 📊
&lt;/h2&gt;

&lt;p&gt;Let me be the engineer who actually tells you the trade-offs instead of pretending it's magic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory:&lt;/strong&gt; ~1.5 GB of active RAM for the engine + KV cache. This rules out budget devices with 4 GB total RAM (Android itself wants a chunk, and you'll get killed by the OOM killer the moment you background the app). Realistic minimum is &lt;strong&gt;6 GB RAM&lt;/strong&gt;, comfortable is 8 GB+. A Pixel 7/8, Samsung Galaxy S23+, or any flagship from the last two years is the sweet spot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency:&lt;/strong&gt; On a Pixel 8 with the CPU backend (XNNPACK, 4 threads), I see roughly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time to first token:&lt;/strong&gt; 2-3 seconds (after the engine is warm)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decode speed:&lt;/strong&gt; ~12-18 tokens/second&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cold start:&lt;/strong&gt; 5-10 seconds to load the engine on first query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GPU acceleration via the Mali/Adreno backends or NPU acceleration via Qualcomm QNN gets you another 2-3x on supported chipsets. PocketSage v0.1 ships CPU-only for maximum compatibility; GPU is on the roadmap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why E2B and not E4B?&lt;/strong&gt; This is the question I get most often. The Gemma 4 family ships in &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/" rel="noopener noreferrer"&gt;E2B (~Effective 2B params), E4B, 26B MoE, and 31B Dense&lt;/a&gt; sizes. E4B is meaningfully smarter — better long-context reasoning, better at multi-step questions. But it needs ~3 GB of active RAM and pushes time-to-first-token past 5 seconds on most phones. For an interactive chat experience, &lt;strong&gt;E2B is the sweet spot&lt;/strong&gt;: it fits comfortably on mid-range hardware, generates fast enough that streaming feels alive, and for the &lt;em&gt;extractive&lt;/em&gt; QA task at the heart of RAG ("what does this contract say about termination?"), the marginal accuracy gain from E4B isn't worth the latency tax.&lt;/p&gt;

&lt;p&gt;If you're targeting tablets or high-end flagships only and you need stronger reasoning, swap to E4B — the runner code is identical, just point &lt;code&gt;EngineConfig.modelPath&lt;/code&gt; at the bigger file.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's on the Roadmap 🗺️
&lt;/h2&gt;

&lt;p&gt;v0.1 ships a working end-to-end RAG loop. The goal from here is to keep the codebase &lt;strong&gt;legible&lt;/strong&gt; rather than dense. Help wanted on any of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;good first issue&lt;/code&gt;&lt;/strong&gt; — Settings screen with sliders for chunk size, overlap, and top-K&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;good first issue&lt;/code&gt;&lt;/strong&gt; — Empty-state and error-state polish across both screens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;help wanted&lt;/code&gt;&lt;/strong&gt; — ANN index (Hnswlib JNI or ObjectBox vector search) once chunk count exceeds a few thousand&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;help wanted&lt;/code&gt;&lt;/strong&gt; — Cross-encoder re-ranker over the top 20 → top 4&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;help wanted&lt;/code&gt;&lt;/strong&gt; — Multi-turn chat with rolling summarization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;help wanted&lt;/code&gt;&lt;/strong&gt; — OCR for scanned PDFs (ML Kit Text Recognition)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;research&lt;/code&gt;&lt;/strong&gt; — Multimodal: Gemma 4 natively handles images and audio. Imagine asking "what does this scanned invoice say?" without an OCR step.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It Yourself 🚀
&lt;/h2&gt;

&lt;p&gt;There are two paths depending on whether you want to build from source or just kick the tires.&lt;/p&gt;

&lt;h3&gt;
  
  
  Path A: Just want to test it (5 minutes) 📱
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Download the latest APK from &lt;a href="https://github.com/umarpazir11/pocketsage/releases" rel="noopener noreferrer"&gt;GitHub Releases&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Download gemma-4-E2B-it.litertlm (~2.58 GB on disk, ~1.5 GB active RAM at runtime) from &lt;a href="https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Side-load the APK to your Android device (you'll need to allow "Install unknown apps" — the APK is signed with a personal developer key, not a Play Store key).&lt;/li&gt;
&lt;li&gt;Launch the app, pick the &lt;code&gt;.litertlm&lt;/code&gt; file when prompted, and you're off.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ The APK is signed with a personal developer key, not a Play Store key — Android will warn you about the unknown source. The app itself doesn't transmit data anywhere (that's the whole point), but if you're deploying this in a real enterprise context, audit the source yourself or build it from scratch. Trust your own build, not mine.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Path B: Build from source 🛠️
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/umarpazir11/pocketsage.git
&lt;span class="nb"&gt;cd &lt;/span&gt;pocketsage
./gradlew installDebug
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The MiniLM embedding model and vocab are already in the repo — you only need to side-load the Gemma 4 LLM (~2.58 GB) from &lt;a href="https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt;. The app's first-run screen walks you through picking it.&lt;/p&gt;

&lt;p&gt;Once it's running, open the app, tap &lt;strong&gt;+&lt;/strong&gt; to add a PDF, ask away. Toggle airplane mode mid-conversation just to feel the magic. ✈️&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Picture 🌍
&lt;/h2&gt;

&lt;p&gt;I think we're at an inflection point with on-device AI that mirrors where mobile photography was around 2014. For years, "real" photography meant a DSLR and the cloud meant where you stored the photos. Then computational photography on the SoC got good enough that the phone &lt;em&gt;was&lt;/em&gt; the camera, and the cloud became optional.&lt;/p&gt;

&lt;p&gt;LLMs are following the same arc. For two years, "real" AI has meant a frontier model in someone else's data center. But Gemma 4 E2B running on a Pixel 8 is genuinely useful for a huge class of tasks — document QA, summarization, code explanation, light reasoning. And it's running on hardware your users already paid for, on data that never leaves their device, under regulatory regimes that suddenly become trivial.&lt;/p&gt;

&lt;p&gt;For German SMEs sitting on filing cabinets full of contracts, HR records, and ISO 27001 audit logs they'd love to make searchable but can't legally upload anywhere — this changes the economics. The compliance surface collapses to "the user owns the device." The tooling is open-source, the models are Apache 2.0, the runtime is free.&lt;/p&gt;

&lt;p&gt;If you build for this market, the next twelve months are going to be wild.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get In Touch 💬
&lt;/h2&gt;

&lt;p&gt;PocketSage is &lt;strong&gt;MIT licensed&lt;/strong&gt;, contributions are genuinely welcome, and I read every issue. Whether you're an Android dev curious about on-device ML, an ML engineer who wants to learn Compose, or a documentation hawk — there's something in the roadmap for you.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🐙 &lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/umarpazir11/pocketsage" rel="noopener noreferrer"&gt;github.com/umarpazir11/pocketsage&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💼 &lt;strong&gt;LinkedIn:&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/umardilpazir/" rel="noopener noreferrer"&gt;linkedin.com/in/umardilpazir&lt;/a&gt; — I'm especially interested in talking with German SMEs and Mittelstand companies thinking about privacy-first AI&lt;/li&gt;
&lt;li&gt;📬 &lt;strong&gt;Issues / PRs:&lt;/strong&gt; Open one, I respond fast&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If PocketSage is useful to you, &lt;strong&gt;a star on the repo helps others find it.&lt;/strong&gt; ⭐&lt;/p&gt;

&lt;p&gt;And if you're building privacy-first AI for European enterprises, get in touch on LinkedIn.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with Kotlin, Compose, and an unreasonable amount of respect for Article 28. 🇪🇺&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Gemma4 #AndroidDev #LiteRT #OnDeviceAI #GDPR #Privacy #Kotlin #JetpackCompose #RAG #OpenSource
&lt;/h1&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
  </channel>
</rss>
