Submission for the Gemma 4 DEV Challenge, Build track. Companion to my Write-track post on the five libs behind it.
What it is
A tool-using research agent that runs locally on Gemma 4 e2b via Ollama, in around 200 lines of Node.
You give it a question. It picks between two tools, reads a Wikipedia page, then returns a structured JSON answer with sources. No API key. No rate limit. Two GB of RAM and an Ollama instance is the whole stack.
ollama pull gemma4:e2b
git clone https://github.com/MukundaKatta/gemma4-safe-agent
cd gemma4-safe-agent && npm install
npm run demo -- "What is RLHF?"
{
"final": "RLHF is a technique that uses human preferences as a reward signal to fine-tune language models.",
"sources": ["https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback"],
"steps": 2
}
Repo: github.com/MukundaKatta/gemma4-safe-agent
Why Gemma 4 e2b specifically
Gemma 4 ships in four sizes: e2b and e4b for edge and mobile, a 26B Mixture-of-Experts model, and a 31B dense model for servers. I picked e2b on purpose.
Reasons:
- Runs anywhere. Two GB of RAM, no network, no key. The agent works on a CI runner, a Raspberry Pi, an old MacBook. The bigger sizes do not.
-
Hardest reliability case. A 2B-class model makes more parse mistakes and more arg mistakes than a 26B. If the scaffolding holds at the 2B level, the bigger ones are a drop-in via
GEMMA_MODEL=gemma4:e4b. - Real product surface. Cheap, fast, local agents are where on-device AI is going. e2b is the right target for the kind of agent you'd actually ship in a desktop app, a mobile shell, or a browser extension.
The same agent runs against any of the four Gemma 4 variants with one env var change.
How it works
The whole agent is a small loop:
for (let step = 0; step < MAX_STEPS; step++) {
const fitted = fit(messages, { maxTokens: 4096, preserveSystem: true, preserveLastN: 2 });
const raw = await ollamaChat(fitted.messages);
const action = parseAction(raw);
if (action.kind === 'tool') {
const result = await TOOLS[action.tool].fn(action.args);
messages.push({ role: 'assistant', content: raw });
messages.push({ role: 'user', content: `tool_result: ${result}` });
continue;
}
return cast({ llm, validate, prompt: 'Restate as JSON: ...' });
}
The whole run is wrapped in an agentguard.firewall block. Each tool is wrapped with agentvet.vet and agentsnap.traceTool. That gives me:
- Context budget management so Gemma 4 e2b never blows its small window
- Network egress allowlist so a prompt injection cannot redirect the agent to fetch an attacker URL
-
Tool-arg validation so a hallucinated
fetch_url({ url: 12345 })never runs - Trace snapshots so swapping models or tweaking prompts shows up as a CI diff, not a production surprise
- Final-answer JSON enforcement with a validate-and-retry loop, which is the load-bearing piece for getting clean JSON out of a 2B model
I wrote about the scaffolding in detail in the Write-track companion post. Here the focus is the agent and the demo.
What you can run
The repo ships three entry points:
-
npm run demo -- "...": real run against your local Gemma 4 e2b -
npm run demo:mock: same agent, withfetch_urlreturning canned pages (no internet needed) -
AGENT_MOCK=1 node examples/run-stub.js: deterministic stub LLM in place of Gemma 4, so the whole pipeline runs in CI without any model at all
The third one is the one I use for snapshot regression tests. It proves the agent's tool-use behavior is stable even with an LLM swapped out.
What surprised me
Two things.
Gemma 4 e2b picks the right tool more often than I expected. The model is small but the tool-selection task is well-bounded ("you have these two tools, here's the schema, return one JSON"). When the surrounding scaffolding catches arg mistakes and JSON glitches, the model's reasoning is the part that doesn't need help.
The final-answer step is where the model really needs the cast loop. Asking for "JSON only, no prose" still produced
Sure here you go: {...}enough of the time that I would not trust the agent withoutagentcastwrapping that step. With it, the post-condition becomes a guarantee.
Try it
Repo: github.com/MukundaKatta/gemma4-safe-agent (MIT)
Issues and PRs welcome. The five scaffolding libs are all on npm under @mukundakatta/* and are zero-dep, so you can pull them into your own Gemma 4 projects one at a time.
If you build something on top of this, drop me a link.
Have fun with Gemma 4.
Top comments (0)