Local, private AI development for the Gemma 4 Challenge—no cloud dependency, no telemetry, pure control.
The Gemma 4 Challenge on Dev.to is live: build innovative projects or write about Google's latest open models and compete for $3,000 across two tracks. The catch? Integrating a state-of-the-art model without sacrificing privacy, speed, or control.
Safe Agent solves that. It's a local-first AI coding assistant built for exactly this.
What's Safe Agent?
Safe Agent runs entirely on your machine—no API calls, no data leaving your laptop. It supports Ollama and LM Studio out of the box, making Gemma 4 a first-class citizen. You get a CLI, a VS Code extension (available on the VS Code Marketplace), and a local HTTP service—all with zero telemetry.
Get Running in 3 Steps
Install:
# macOS
brew tap divyabairavarasu/tap && brew install safeagent
# Linux / macOS (curl)
curl -fsSL https://raw.githubusercontent.com/divyabairavarasu/homebrew-tap/master/install.sh | sh
# Windows (PowerShell)
irm https://raw.githubusercontent.com/divyabairavarasu/homebrew-tap/master/install.ps1 | iex
# VS Code: search "Safe Agent" in the Extensions panel
code --install-extension divyabairavarasu.safe-agent
Pull Gemma 4:
ollama pull gemma2:27b # or gemma2:7b for lighter workloads
Start coding:
safeagent # interactive shell
safeagent --model ollama/gemma2:27b # explicit model
safeagent --model lmstudio/gemma-4-instruct
Built-in Git Workflow
Safe Agent understands your repo. Generate code, review diffs, and commit—without breaking focus:
# Ask Safe Agent to implement a feature
safeagent "add a --output-format json flag to the report command"
# Review, stage, and commit
git diff # see what changed
git add -p # stage selectively
git commit -m "feat: add --output-format json flag"
# Let Safe Agent draft the commit message
safeagent "write a conventional commit message for my staged changes"
Why It Works for the Challenge
- No infrastructure — Gemma 4 runs on your laptop. No cloud GPUs, no quotas, no latency.
- Swap models freely — Switch between 7B and 27B mid-project. No code changes.
- Privacy by default — Every inference stays local. Ideal for healthcare, finance, or any sensitive data use case.
- Reproducible — Same setup runs anywhere. Judges can verify your stack.
Submission Ideas
Build track: Local code analyzer, privacy-first chat interface, AI-assisted developer CLI tool.
Write track: "Deploying Gemma 4 locally for production", "Privacy-first AI development", a case study with Safe Agent + Gemma 4 metrics.
Safe Agent is open-source, the VS Code Marketplace, and via Homebrew. Deadline: May 24, 2026. Build something great. 🚀
Top comments (2)
Good timing with the Gemma 4 Challenge going on. The local-first angle is solid too many AI tools assume you're fine sending everything to the cloud.
One question:
how does it handle context window limits with larger repos?
When you're pointing it at a real codebase and asking it to add a flag to the report command, does it index the repo first or just hope the relevant files fit in context? That's usually where local coding assistants hit a wall vs cloud ones with bigger context windows.
Also any benchmarks on Gemma 2 27B vs 7B for code tasks?
Curious if the quality drop is noticeable or if 7B is good enough for most dev workflows.
Nice project.
The zero-telemetry pitch is underrated.
👍 👍👍
It doesn't do embedding-based indexing. Three mechanisms layer together, and none of them is semantic search:
The VSCode side adds a manual context list — files/folders the user explicitly pins.
So for "add a flag to the report command" on a real codebase: it grabs the structural summary + 5 grep windows + whatever the user pinned. The wall it hits isn't context size — it's retrieval recall. A
cloud assistant with a 200K window doesn't help you if the right file never makes it into the prompt; this one just fails earlier and more visibly. Worth being honest about that tradeoff in the pitch.
Gemma 7B vs 27B benchmarks
No benchmarks in the repo — just an unsourced heuristic ("7B for dev, larger for review"). What does show up is empirical scar tissue: small Gemmas (the 4B class) repeatedly broke the structured edit format the agent needs, which had to be patched with few-shot examples in the prompt contract.
For an actual 7B-vs-27B comparison you'd want to run EvalPlus / HumanEval+ / SWE-bench Lite yourself. Public numbers favor 27B by a meaningful margin on multi-step edits; the gap narrows on single-file completion. But the more practical question for an agent loop isn't raw code quality — it's format adherence, and that's where smaller models disproportionately fail. If the model can't reliably emit the edit envelope, the quality of the code inside it doesn't matter.