Bidet AI — on-device Gemma 4 turns a messy brain-dump into clean writing

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

I'm Mark. I'm a middle-school teacher, and I'm not a coder. A few times a year there's a piece of writing that wrecks me: honest comments about real students — the most personal, highest-stakes writing I do. It always came out the same way: two in the morning, blank page, everything in my head refusing to line up. My brain runs a mile a minute and goes everywhere, faster than I can type or talk. I have ADD. Getting what's actually in my head onto the page has always been the hard part.

So I built Bidet AI — an Android app that turns a spoken brain-dump into clean writing, 100% on the phone, fully offline, on a three-year-old phone.

You hit record and just talk — ramble, stutter, repeat, go off on tangents. It transcribes you as you speak, then reshapes the mess into clear writing. It doesn't summarize you. It organizes what you actually said and fills in the context other people need, so it reads like you on a good day, finally saying it the way you meant. There's a version cleaned for you, and a version cleaned for other people to read.

Running entirely on-device isn't a tech flex — it's the whole point. The comments I write are about real students: specific, candid, sometimes hard. There is no version of me that uploads that to someone's server to get cleaned up. With Bidet AI nothing is sent anywhere on its own; the only thing that ever leaves the phone is what I choose to share. Private here isn't a policy I'm trusting — it's where the computer is. And the hardware floor is a phone someone already owns, not a subscription and a card on file: a cloud tool serves people who can afford the cloud; one that runs on an old phone serves everyone else.

Demo

🎥 2:43 walkthrough (my own story, with a real on-device demo in airplane mode): https://youtu.be/EAJe4rpJAF0

🌐 Project page: https://bidetai.app

In the video, the on-device demo is shot with airplane mode visibly switched on — no Wi-Fi, no cellular — and the speech model and Gemma 4 both keep running. The cleaned, organized output appears with the device fully offline. Gemma 4 E2B genuinely takes a couple of minutes to cold-load on a three-year-old phone, so that stretch is honestly time-compressed (shown as proof → cut → payoff) — never claimed as instant.

Code

📦 Public source (Apache 2.0): https://github.com/MrB-Ed/bidet-ai

What I built on, and what's original — stated plainly: the Android project forks Google's AI Edge Gallery (Apache 2.0; the exact upstream commit is pinned in UPSTREAM_GALLERY_SHA.md, with attribution in LICENSE and NOTICE). I used the fork as the shell so I didn't reinvent model download and lifecycle plumbing. The public repo is a curated extract that intentionally drops the inherited UI, storage, download and branding code so the Gemma 4 work is easy to read. The original engineering on top of the fork is the capture-and-restructure pipeline:

A foreground capture service (CleanGenerationService.kt): 16 kHz audio with overlapping windows and a rolling backbuffer so a brain-dump can run as long as it needs to instead of being capped at one short window, then it runs the on-device Gemma cleanup pass once recording completes.
An on-device transcription path: a bundled ~27M-parameter Moonshine-Tiny v2 model (English, MIT license) run through the sherpa-onnx runtime, with deterministic fuzzy de-duplication to stitch the overlapping chunks into one clean transcript. This replaced an earlier whisper.cpp prototype — smaller, faster, more accurate at this size.
A single shared LiteRT-LM engine provider (BidetSharedLiteRtEngineProvider.kt): one Engine per process with an NPU→CPU backend fallback on Tensor G3 and a mutex-guarded single-load state machine, tuned so a 2B-class language model and a small ASR model can co-reside in memory on an old phone without OOM.
A first-run consent screen that enforces the Gemma Terms of Use. The only network call the app ever makes is a one-time, optional model download. No telemetry, no analytics, no phone-home.

How I Used Gemma 4

The model choice is the project — and E2B was deliberate, not default.

The constraint came first: the language model has to run on a three-year-old phone, in airplane mode, sharing memory with an ASR model that's already resident. The equity argument — "works for people the cloud leaves behind" — is only true if it runs on hardware people already own. So I let the constraint pick the model:

I tested the larger E4B first. It blew the memory budget and would not co-reside with the speech model on the target device.
The 31B Dense and 26B MoE variants are non-starters for offline mobile — they're server/desktop-grade.
E2B is the smallest Gemma 4 variant, explicitly built for edge/ultra-mobile deployment. It is the only flavor that fits the constraint and still does genuinely good restructuring of messy, disfluent speech. Picking E2B is what turns the offline-on-old-hardware claim from aspirational into real.

Division of labor by design: the ~27M Moonshine model only listens (speech → text). Everything that makes the output good — cleanup, organizing tangents into a structure, filling in context, producing both a for-me and a for-others version — is Gemma 4 E2B running locally via LiteRT-LM. No cloud. No fallback. Disfluent, out-of-order, ADD-shaped speech is exactly the input a strong small instruction-tuned model is good at, and it's the reason the app is useful rather than just private.

On personalization — honest status: because the model is on-device and mine, I ran a small Unsloth LoRA experiment on ~1,300 paired examples from my own brain-dumps, to see if the cleaned output could sound more like me. This is an in-progress experiment. It is not in the shipped build and no fine-tuned model is claimed as working here — the app ships and runs on the base Gemma 4 E2B weights. The repo's README says the same thing. I'm including it because it's part of the honest story of what on-device open weights make possible, not because it's a finished result.

Why this is a Gemma 4 build worth your time: a non-coder solving a real, personal, high-stakes problem on a phone he already owned — where the choice of which Gemma 4 model (E2B) is the single thing that makes it work. Private by architecture, offline by construction, and useful for anyone whose brain moves faster than their hands.

Take a brain dump. Bidet AI cleans up your mess.

Top comments (2)

thehwang • May 19

Local-first Mac tools with Gemma 4 represent — built a meeting-transcription variant with similar constraints. Curious if you're hitting the 16GB headroom ceiling when running alongside other apps.

Mark Barnett • May 20

Yeah, Gemma is about the only thing that can be running for me.