<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mark Barnett</title>
    <description>The latest articles on DEV Community by Mark Barnett (@mark_barnett_a50bc71e7433).</description>
    <link>https://dev.to/mark_barnett_a50bc71e7433</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3909285%2Fa1fd569d-1525-45d1-9d38-199278519925.png</url>
      <title>DEV Community: Mark Barnett</title>
      <link>https://dev.to/mark_barnett_a50bc71e7433</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mark_barnett_a50bc71e7433"/>
    <language>en</language>
    <item>
      <title>Bidet AI — on-device Gemma 4 turns a messy brain-dump into clean writing</title>
      <dc:creator>Mark Barnett</dc:creator>
      <pubDate>Mon, 18 May 2026 01:41:05 +0000</pubDate>
      <link>https://dev.to/mark_barnett_a50bc71e7433/bidet-ai-on-device-gemma-4-turns-a-messy-brain-dump-into-clean-writing-4256</link>
      <guid>https://dev.to/mark_barnett_a50bc71e7433/bidet-ai-on-device-gemma-4-turns-a-messy-brain-dump-into-clean-writing-4256</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I'm Mark. I'm a middle-school teacher, and I'm not a coder. A few times a year there's a piece of writing that wrecks me: honest comments about real students — the most personal, highest-stakes writing I do. It always came out the same way: two in the morning, blank page, everything in my head refusing to line up. My brain runs a mile a minute and goes everywhere, faster than I can type or talk. I have ADD. Getting what's actually in my head onto the page has always been the hard part.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;Bidet AI&lt;/strong&gt; — an Android app that turns a spoken brain-dump into clean writing, &lt;strong&gt;100% on the phone, fully offline, on a three-year-old phone&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You hit record and just talk — ramble, stutter, repeat, go off on tangents. It transcribes you as you speak, then reshapes the mess into clear writing. It doesn't &lt;em&gt;summarize&lt;/em&gt; you. It organizes what you actually said and fills in the context other people need, so it reads like you on a good day, finally saying it the way you meant. There's a version cleaned for you, and a version cleaned for other people to read.&lt;/p&gt;

&lt;p&gt;Running entirely on-device isn't a tech flex — it's the whole point. The comments I write are about real students: specific, candid, sometimes hard. There is no version of me that uploads that to someone's server to get cleaned up. With Bidet AI nothing is sent anywhere on its own; the only thing that ever leaves the phone is what &lt;em&gt;I&lt;/em&gt; choose to share. Private here isn't a policy I'm trusting — it's &lt;em&gt;where the computer is&lt;/em&gt;. And the hardware floor is a phone someone already owns, not a subscription and a card on file: a cloud tool serves people who can afford the cloud; one that runs on an old phone serves everyone else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;🎥 &lt;strong&gt;2:43 walkthrough (my own story, with a real on-device demo in airplane mode):&lt;/strong&gt; &lt;a href="https://youtu.be/EAJe4rpJAF0" rel="noopener noreferrer"&gt;https://youtu.be/EAJe4rpJAF0&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🌐 &lt;strong&gt;Project page:&lt;/strong&gt; &lt;a href="https://bidetai.app" rel="noopener noreferrer"&gt;https://bidetai.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the video, the on-device demo is shot with &lt;strong&gt;airplane mode visibly switched on&lt;/strong&gt; — no Wi-Fi, no cellular — and the speech model and Gemma 4 both keep running. The cleaned, organized output appears with the device fully offline. Gemma 4 E2B genuinely takes a couple of minutes to cold-load on a three-year-old phone, so that stretch is honestly time-compressed (shown as proof → cut → payoff) — never claimed as instant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;📦 &lt;strong&gt;Public source (Apache 2.0):&lt;/strong&gt; &lt;a href="https://github.com/MrB-Ed/bidet-ai" rel="noopener noreferrer"&gt;https://github.com/MrB-Ed/bidet-ai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I built on, and what's original — stated plainly:&lt;/strong&gt; the Android project &lt;em&gt;forks Google's AI Edge Gallery&lt;/em&gt; (Apache 2.0; the exact upstream commit is pinned in &lt;code&gt;UPSTREAM_GALLERY_SHA.md&lt;/code&gt;, with attribution in &lt;code&gt;LICENSE&lt;/code&gt; and &lt;code&gt;NOTICE&lt;/code&gt;). I used the fork as the shell so I didn't reinvent model download and lifecycle plumbing. The public repo is a curated extract that intentionally drops the inherited UI, storage, download and branding code so the Gemma 4 work is easy to read. The original engineering on top of the fork is the capture-and-restructure pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A foreground capture service (&lt;code&gt;CleanGenerationService.kt&lt;/code&gt;): 16 kHz audio with overlapping windows and a rolling backbuffer so a brain-dump can run as long as it needs to instead of being capped at one short window, then it runs the on-device Gemma cleanup pass once recording completes.&lt;/li&gt;
&lt;li&gt;An on-device transcription path: a bundled ~27M-parameter &lt;strong&gt;Moonshine-Tiny v2&lt;/strong&gt; model (English, MIT license) run through the &lt;strong&gt;sherpa-onnx&lt;/strong&gt; runtime, with deterministic fuzzy de-duplication to stitch the overlapping chunks into one clean transcript. This replaced an earlier whisper.cpp prototype — smaller, faster, more accurate at this size.&lt;/li&gt;
&lt;li&gt;A single shared &lt;strong&gt;LiteRT-LM&lt;/strong&gt; engine provider (&lt;code&gt;BidetSharedLiteRtEngineProvider.kt&lt;/code&gt;): one Engine per process with an NPU→CPU backend fallback on Tensor G3 and a mutex-guarded single-load state machine, tuned so a 2B-class language model and a small ASR model can co-reside in memory on an old phone without OOM.&lt;/li&gt;
&lt;li&gt;A first-run consent screen that enforces the Gemma Terms of Use. The only network call the app ever makes is a one-time, optional model download. No telemetry, no analytics, no phone-home.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The model choice &lt;em&gt;is&lt;/em&gt; the project — and &lt;code&gt;E2B&lt;/code&gt; was deliberate, not default.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The constraint came first: the language model has to run on a &lt;strong&gt;three-year-old phone, in airplane mode&lt;/strong&gt;, sharing memory with an ASR model that's already resident. The equity argument — "works for people the cloud leaves behind" — is only &lt;em&gt;true&lt;/em&gt; if it runs on hardware people already own. So I let the constraint pick the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I tested the larger &lt;strong&gt;E4B&lt;/strong&gt; first. It blew the memory budget and would not co-reside with the speech model on the target device.&lt;/li&gt;
&lt;li&gt;The 31B Dense and 26B MoE variants are non-starters for offline mobile — they're server/desktop-grade.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;E2B&lt;/strong&gt; is the smallest Gemma 4 variant, explicitly built for edge/ultra-mobile deployment. It is the &lt;em&gt;only&lt;/em&gt; flavor that fits the constraint &lt;strong&gt;and&lt;/strong&gt; still does genuinely good restructuring of messy, disfluent speech. Picking E2B is what turns the offline-on-old-hardware claim from aspirational into real.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Division of labor by design:&lt;/strong&gt; the ~27M Moonshine model only &lt;em&gt;listens&lt;/em&gt; (speech → text). Everything that makes the output &lt;em&gt;good&lt;/em&gt; — cleanup, organizing tangents into a structure, filling in context, producing both a for-me and a for-others version — is &lt;strong&gt;Gemma 4 E2B running locally via LiteRT-LM. No cloud. No fallback.&lt;/strong&gt; Disfluent, out-of-order, ADD-shaped speech is exactly the input a strong small instruction-tuned model is good at, and it's the reason the app is useful rather than just private.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On personalization — honest status:&lt;/strong&gt; because the model is on-device and mine, I ran a small &lt;strong&gt;Unsloth&lt;/strong&gt; LoRA experiment on ~1,300 paired examples from my own brain-dumps, to see if the cleaned output could sound more like &lt;em&gt;me&lt;/em&gt;. This is an &lt;strong&gt;in-progress experiment. It is not in the shipped build and no fine-tuned model is claimed as working here&lt;/strong&gt; — the app ships and runs on the base Gemma 4 E2B weights. The repo's README says the same thing. I'm including it because it's part of the honest story of what on-device open weights make &lt;em&gt;possible&lt;/em&gt;, not because it's a finished result.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Why this is a Gemma 4 build worth your time:&lt;/strong&gt; a non-coder solving a real, personal, high-stakes problem on a phone he already owned — where the choice of &lt;em&gt;which&lt;/em&gt; Gemma 4 model (E2B) is the single thing that makes it work. Private by architecture, offline by construction, and useful for anyone whose brain moves faster than their hands.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Take a brain dump. Bidet AI cleans up your mess.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
  </channel>
</rss>
