Divya Bairavarasu

Posted on May 10

Build AI-Powered Projects with Safe Agent

#ai #local #llm #privacy

Local, private AI development for the Gemma 4 Challenge—no cloud dependency, no telemetry, pure control.

The Gemma 4 Challenge on Dev.to is live: build innovative projects or write about Google's latest open models and compete for $3,000 across two tracks. The catch? Integrating a state-of-the-art model without sacrificing privacy, speed, or control.

Safe Agent solves that. It's a local-first AI coding assistant built for exactly this.

What's Safe Agent?

Safe Agent runs entirely on your machine—no API calls, no data leaving your laptop. It supports Ollama and LM Studio out of the box, making Gemma 4 a first-class citizen. You get a CLI, a VS Code extension (available on the VS Code Marketplace), and a local HTTP service—all with zero telemetry.

Get Running in 3 Steps

Install:

# macOS
brew tap divyabairavarasu/tap && brew install safeagent

# Linux / macOS (curl)
curl -fsSL https://raw.githubusercontent.com/divyabairavarasu/homebrew-tap/master/install.sh | sh

# Windows (PowerShell)
irm https://raw.githubusercontent.com/divyabairavarasu/homebrew-tap/master/install.ps1 | iex

# VS Code: search "Safe Agent" in the Extensions panel
code --install-extension divyabairavarasu.safe-agent

Pull Gemma 4:

ollama pull gemma2:27b   # or gemma2:7b for lighter workloads

Start coding:

safeagent                              # interactive shell
safeagent --model ollama/gemma2:27b   # explicit model
safeagent --model lmstudio/gemma-4-instruct

Built-in Git Workflow

Safe Agent understands your repo. Generate code, review diffs, and commit—without breaking focus:

# Ask Safe Agent to implement a feature
safeagent "add a --output-format json flag to the report command"

# Review, stage, and commit
git diff                  # see what changed
git add -p                # stage selectively
git commit -m "feat: add --output-format json flag"

# Let Safe Agent draft the commit message
safeagent "write a conventional commit message for my staged changes"

Why It Works for the Challenge

No infrastructure — Gemma 4 runs on your laptop. No cloud GPUs, no quotas, no latency.
Swap models freely — Switch between 7B and 27B mid-project. No code changes.
Privacy by default — Every inference stays local. Ideal for healthcare, finance, or any sensitive data use case.
Reproducible — Same setup runs anywhere. Judges can verify your stack.

Submission Ideas

Build track: Local code analyzer, privacy-first chat interface, AI-assisted developer CLI tool.

Write track: "Deploying Gemma 4 locally for production", "Privacy-first AI development", a case study with Safe Agent + Gemma 4 metrics.

Safe Agent is open-source, the VS Code Marketplace, and via Homebrew. Deadline: May 24, 2026. Build something great. 🚀

Top comments (2)

Mamoor Ahmad • May 12

Good timing with the Gemma 4 Challenge going on. The local-first angle is solid too many AI tools assume you're fine sending everything to the cloud.

One question:
how does it handle context window limits with larger repos?
When you're pointing it at a real codebase and asking it to add a flag to the report command, does it index the repo first or just hope the relevant files fit in context? That's usually where local coding assistants hit a wall vs cloud ones with bigger context windows.

Also any benchmarks on Gemma 2 27B vs 7B for code tasks?
Curious if the quality drop is noticeable or if 7B is good enough for most dev workflows.
Nice project.

The zero-telemetry pitch is underrated.
👍 👍👍

Divya Bairavarasu • May 12

It doesn't do embedding-based indexing. Three mechanisms layer together, and none of them is semantic search:

One-time structural scan — walks the repo (capped at ~5,000 files, depth 10) and persists a tiered knowledge tree: language fingerprint, directory purposes, import graph + hub files, and extracted public signatures. This is metadata, not chunked contents. It re-scans incrementally when the HEAD commit changes.
Grep-based retrieval at query time — runs grep -rnFI against the workspace and returns the top 5 hits as ±20-line windows. No embeddings, no semantic fallback. If your query terms don't literally appear in the relevant file, it won't surface.
Conversation summarizer — assumes a 32K-token model context, triggers summarization at 85% fill, keeps the last 4 turns. Token count is estimated at ~4 chars/token.

The VSCode side adds a manual context list — files/folders the user explicitly pins.

So for "add a flag to the report command" on a real codebase: it grabs the structural summary + 5 grep windows + whatever the user pinned. The wall it hits isn't context size — it's retrieval recall. A
cloud assistant with a 200K window doesn't help you if the right file never makes it into the prompt; this one just fails earlier and more visibly. Worth being honest about that tradeoff in the pitch.

Gemma 7B vs 27B benchmarks

No benchmarks in the repo — just an unsourced heuristic ("7B for dev, larger for review"). What does show up is empirical scar tissue: small Gemmas (the 4B class) repeatedly broke the structured edit format the agent needs, which had to be patched with few-shot examples in the prompt contract.

For an actual 7B-vs-27B comparison you'd want to run EvalPlus / HumanEval+ / SWE-bench Lite yourself. Public numbers favor 27B by a meaningful margin on multi-step edits; the gap narrows on single-file completion. But the more practical question for an agent loop isn't raw code quality — it's format adherence, and that's where smaller models disproportionately fail. If the model can't reliably emit the edit envelope, the quality of the code inside it doesn't matter.