Uday Dolas

Posted on May 9

I Embedded Gemma 4 Into a Real Desktop App — Here's Exactly What I Learned

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Write about Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

I Embedded Gemma 4 Into a Desktop App — Here's What I Learned

I didn't set out to write about Gemma 4. I set out to build something.

I've been working on Sowser — a spatial canvas browser for Windows where
every website is a draggable live card on an infinite canvas instead of a
hidden tab. Think of it like a whiteboard where your browser tabs are actual
windows you can move around, group, and connect visually.

The problem I kept running into: users open 20 tabs for a research session
and the canvas gets chaotic fast. I needed something to automatically
understand what all those pages are about and organise them into groups.

That's where Gemma 4 came in. And what happened next genuinely surprised me.

Why I Chose Gemma 4 Over Everything Else

My requirements were specific:

Must run locally — a browser tool that phones home with your tab history is a privacy disaster
Must be fast enough to feel like a UI feature, not a loading screen
Must follow strict JSON instructions reliably — my app can't babysit malformed output
Must run without a GPU — I want regular Windows users to use this

I tested three options before landing on Gemma 4 E4B via Ollama.

GPT-4o API — perfect output, but cloud-only. Every tab URL leaves the
user's machine. Hard no.

Gemma 2B local — fast, but grouping quality was noisy. It kept merging
unrelated topics and sometimes ignored the JSON format requirement entirely.

Gemma 4 E4B local — this is the one. Clean JSON every time, genuinely
smart semantic grouping, runs in 1-3 seconds after the first load, no GPU
needed.

The jump in instruction-following quality from 2B to E4B is not small. It
feels like a completely different category of model.

How the Integration Actually Works

The feature is called AI Smart Organize. Here is exactly what happens
when a user clicks it:

Step 1 — Collect
The app grabs every open browser card's title and URL and builds a JSON array:

[
  { "title": "Neural Networks - Wikipedia", "url": "wikipedia.org/wiki/Neural_network" },
  { "title": "React Docs", "url": "react.dev/learn" },
  { "title": "Nike Running Shoes", "url": "nike.com/search?q=running+shoes" }
]

Step 2 — Prompt
This exact system prompt goes to Gemma 4:
You are a browser tab organiser. You will receive a JSON list of open browser
tabs with their titles and URLs. Group them into 2-6 meaningful clusters based
on topic or purpose. Respond ONLY with a valid JSON array. No explanation, no
markdown, no code fences. Each element must have: groupName (string),
color (hex string like #FF6B6B), urls (array of url strings from the input).

Step 3 — Parse
Gemma 4 returns something like:

[
  { "groupName": "Research", "color": "#3b82f6", "urls": ["wikipedia.org/..."] },
  { "groupName": "Development", "color": "#8b5cf6", "urls": ["react.dev/..."] },
  { "groupName": "Shopping", "color": "#10b981", "urls": ["nike.com/..."] }
]

Step 4 — Organise
The app repositions cards into colour-coded vertical columns on the canvas,
applies group colours, and shows a toast: "Organised into 3 groups!"

The whole thing takes under 3 seconds. It looks like magic.

The Thing That Actually Impressed Me

I expected the JSON compliance to be a struggle. Every time I've used smaller
local models for structured output tasks, I've had to write defensive parsers,
retry logic, and fallback handlers because the model would leak prose or
forget to close a bracket.

Gemma 4 E4B returned valid, parseable JSON on the first call, every time,
across dozens of test runs with wildly different tab combinations. That
reliability is what makes it suitable for embedding into a real desktop
application where the user has no tolerance for errors.

I still kept the defensive parser (strip markdown fences, catch exceptions,
show toast on failure) — but I never actually needed it during testing.

Choosing the Right Gemma 4 Model

The Gemma 4 family has three distinct options and the right choice depends
entirely on your use case:

Model	Parameters	Best For
E2B	2B effective	Ultra-mobile, Raspberry Pi, browser-based, offline-first
E4B	4B effective	Desktop apps, local tools, interactive features needing speed + quality
31B Dense	31B	Server deployments, complex reasoning, batch processing

For an interactive desktop feature where the user is waiting for a
response, E4B is the answer. It is fast enough to feel snappy and smart
enough to produce production-quality output.

If I were building a background batch processor that organised workspaces
overnight, I would use 31B Dense. If I were building something that ran
in a browser extension or on a phone, I would use E2B.

The model selection is not a detail — it is the architecture decision.

What Running AI Locally Actually Means

Here is the thing nobody talks about enough: when your AI runs locally,
the product relationship with your user changes completely.

With a cloud API, every feature powered by AI has a privacy asterisk.
Users have to trust that their data is handled responsibly. For a browser
tool specifically — where the AI sees every URL you have open — that trust
ask is enormous.

With Gemma 4 running via Ollama on the user's own machine, there is no
asterisk. The model runs on their hardware. The data never leaves. The
feature works offline. There is no API cost to the developer per user.

This is what models like Gemma 4 at the E4B size actually unlock — not
just "local AI" as a technical curiosity, but a genuinely different product
category where privacy is a first-class feature, not a footnote.

Getting Started Yourself

If you want to try Gemma 4 locally right now, this is all you need:

Install Ollama from ollama.com (Windows, Mac, Linux)

Pull Gemma 4 E4B:

ollama pull gemma3:4b
ollama serve

Test it instantly:

ollama run gemma3:4b "Group these tabs into categories and return only JSON: 
[{title: 'React Docs', url: 'react.dev'}, {title: 'Nike Shoes', url: 'nike.com'}]"

You will have a locally running, privacy-preserving, genuinely capable AI
model in under 5 minutes. No API key. No credit card. No data leaving
your machine.

That is the Gemma 4 story that matters to me as a developer.

Final Thought

I built Sowser to make browsing spatial. Gemma 4 made it intelligent.

The combination of a capable open model, local execution via Ollama, and
strict instruction following opened up a product feature that simply was
not possible before — not without compromising user privacy or requiring
server infrastructure.

If you are building any kind of desktop tool that touches user data,
Gemma 4 E4B is worth a serious look. The barrier to entry is one terminal
command. The upside is a completely different class of product.

🔗 See Sowser on GitHub: https://github.com/noisyboy08/TREE-TABS