chintanonweb

Posted on May 21

PocketCFO: a private personal-finance brain that runs entirely in your browser

#devchallenge #gemmachallenge #gemma #ai

Gemma 4 Challenge: Build With Gemma 4 Submission

PocketCFO: a private personal-finance brain that runs entirely in your browser

Snap a paper receipt, drop a bank statement, or just ask a question. Gemma 4 does the rest — without a single byte ever leaving your tab.

Live demo: https://gemma-challenge.vercel.app/
Code: https://github.com/chintanonweb/gemma-challenge

The problem

Personal-finance apps are a usability disaster for privacy-minded people.

To get useful insights from your bank statement — categorizing spend, spotting forgotten subscriptions, asking "how much did I spend on coffee?" — you have to upload your full transaction history to a third party. Often more: bank credentials via Plaid, receipts via your phone gallery, voice memos via some cloud transcription API.

Most people don't. Most people shouldn't. So the insight never gets generated, and the forgotten subscription keeps charging.

I wanted to know if there was now a way to fix this — a personal finance tool that does real work but where every byte stays on the user's machine. Until 2026 the answer was "almost, but not quite." With Gemma 4 E2B the answer is finally yes, in a browser tab.

What I built

PocketCFO is a single-page web app where you:

Pick your model — Gemma 4 E2B (~1.5 GB on-device, fast), E4B (~2.5 GB on-device, smarter), or 31B (cloud via OpenRouter, no download). The trade-off is explicit: the two local options keep your data on your device; the cloud option sends it to OpenRouter for inference.
Drop a CSV bank statement — transactions are parsed, deduped, and categorized by Gemma 4.
Snap a paper receipt — Gemma 4's vision encoder extracts merchant, amount, and date, and adds it as a transaction.
Read your AI Insights — three specific observations Gemma 4 surfaces about your data ("Your coffee spending grew 40% over three months", "Cancelling NY Times saves you $204/yr").
See the recurring-charges panel — the "$87/month in subscriptions you forgot" moment that justifies the whole tool.
Ask anything in natural language — "which restaurant did I spend the most at?", "how much on coffee?", "what was my biggest single expense?" — answers stream from your chosen model.

Everything runs in the browser. The only thing the server does is host static files. Open your devtools network tab during use; after the initial ~1.5GB model download (cached), nothing leaves your machine.

Why a three-way model picker — and why E2B is the default

Instead of hardcoding a single model, PocketCFO ships a picker covering three deployment tiers of the same model family. The "intentional model selection" judging criterion isn't just a justification I write into the post — it's a user feature that exposes the actual trade-off:

Option	Where it runs	Cold-load	Quality	Privacy
Gemma 4 E2B (default)	Your browser via WebGPU	~1.5 GB once	Good	On-device
Gemma 4 E4B	Your browser via WebGPU	~2.5 GB once	Better	On-device
Gemma 4 31B	OpenRouter cloud (free tier)	0	Best	Sent to OpenRouter

The default is E2B because most users meet PocketCFO for the first time on a normal connection and don't want to wait through a 2.5 GB download before seeing anything. The big win is that every option uses the same Gemma 4 family with the same 128K context, so the product behaves the same; what changes is the user's chosen balance between privacy, latency, and quality.

PocketCFO has four non-negotiable constraints, and E2B is the smallest Gemma 4 variant that meets all four:

Runs in a browser tab — must fit WebGPU memory.
Sees private financial data — must run client-side.
Reads transaction text and receipt photos — must be multimodal.
Reasons about a year of activity in one pass — needs the 128K context window.

The 31B Dense and 26B MoE Gemma 4 variants are too large for browser inference today. The E4B variant is more capable but ~2.5GB to download — painfully slow for a first-time user trying the demo. E2B hits the sweet spot: same multimodality, same 128K context, roughly half the cold-load time. Respecting the user's bandwidth turned out to matter more for product feel than the marginal reasoning gain of the bigger model.

Crucially, the multimodality is what makes the model choice non-trivial. Without the vision encoder, the receipt-snap flow doesn't work and the project collapses to a text-only tool that Gemma 3 could have done. With it, every receipt scan and statement question runs through the same ~1.5GB of weights — downloaded once, never uploaded.

Architecture: the boring rule that prevents the demo from lying

The single most important architectural decision in PocketCFO is this:

The LLM categorizes and reasons. A boring analytics/ module does all the math.

When a finance tool says "you spent $487 on subscriptions this year," that number had better be right. LLMs hallucinate sums constantly — even good ones, even with explicit chain-of-thought — and they do it most often in exactly the situations where you'd put one in front of a user (long contexts, lots of numbers, "summarize this for me"). I would not ship a demo that adds $14 + $9.99 and shows $24.

So the split is:

engine/ (Gemma 4) outputs labels: a category word, a merchant name, a free-form answer.
analytics/ (pure functions, 100% test coverage) outputs numbers: totals, percentages, recurring-payment detection, month-over-month deltas.

The recurring-charge detection in particular is purely deterministic: group by merchant, compute gap distribution, snap to weekly/monthly/quarterly/yearly. Three unit tests cover the cadence math, two cover the edge cases (single-month input, income exclusion). The LLM never enters that code path. The number on the dashboard is correct by construction.

Three things I learned shipping Gemma 4 to the browser in three days

1. Transformers.js needs to be on v4.0.1+ for Gemma 4. I started on v3.5 and got Unsupported model type: gemma4 the first time the model tried to load. Gemma 4 support didn't land in @huggingface/transformers until v4.0.1. Easy fix once you know — but a reminder that ^ semver ranges on bleeding-edge libraries can silently leave you behind. The TypeScript types for pipeline() are still too complex to resolve cleanly, so I wrapped the call in a narrow cast; runtime behavior is fine.

2. Cross-Origin-Isolation is a Vercel-deploy footgun. Multi-threaded WebAssembly inside Transformers.js needs SharedArrayBuffer, which needs Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp headers. These are easy to add in next.config.ts but easy to forget — the build will pass, the deploy will succeed, the demo will silently fall back to slow single-threaded WASM. Test in incognito after deploying.

3. Streaming Q&A makes the demo feel real; per-token categorization doesn't. The Q&A panel uses TextStreamer so the answer types out character by character — feels alive. For categorization (60 transactions × short outputs), sequential non-streamed calls + UI pills lighting up one at a time also feels alive. Both feel like the model is working; neither needs the same engineering. Pick the streaming hill you actually want to die on.

Try it

Live demo: https://gemma-challenge.vercel.app/ — needs Chrome 121+, Edge, or Arc with WebGPU. First load downloads ~1.5GB.
Code: https://github.com/chintanonweb/gemma-challenge
Sample statement: click "Try a sample statement" on the landing page if you don't want to drop your own.

If you build something on Gemma 4 too, I'd love to see it.

— Chintan

Top comments (1)

Mark Barnett • May 25

The LLM-for-labels plus pure-functions-for-math split is the right call. You can't have the model deciding what 14 + 9.99 equals on a finance dashboard - that number has to be correct by construction. The recurring charge detection being purely deterministic is particularly solid.

One thing I've been thinking about building in a similar space - Money Me (money-me.com), a personal finance planner - is how categorizing past spend and planning forward are genuinely different problems. Categorizing is where the labeling work lives. Planning forward - will I hit my savings target next month, which months look tight - is all deterministic math, no model needed. The instinct you've described here applies directly.