Gemma 4 Challenge: Build with Gemma 4

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

IncidentLens — a self-hosted incident response copilot for SRE and platform teams.

When a P1 alert fires at 2 AM, the on-call engineer's first 30 minutes are mostly tab-switching: Grafana → Kibana → Jaeger → GitHub → Slack → runbook wiki. The actual reasoning is maybe 20% of that wall-clock time. The other 80% is swivel-chair work — copy-pasting timestamps, remembering which query syntax goes where, asking on Slack who deployed last.

IncidentLens collapses that loop. Point it at your observability backends (Loki/Elastic, Prometheus, Jaeger/Tempo, GitHub, Slack) via a YAML config. When an incident opens, a Gemma 4–powered agent fans out across all of them in parallel, builds a unified timeline, and returns:

A ranked root-cause hypothesis with the full evidence trail behind it
A remediation command ready to execute, with estimated recovery time
A postmortem draft that writes itself as a byproduct of the diagnosis

Crucially, the whole thing runs inside your VPC. No production logs leave the perimeter. That's the unlock for regulated industries — finance, healthcare, government — where existing AIOps SaaS isn't an option because data residency rules forbid shipping prod telemetry to a third-party cloud.

The engineer's job collapses from investigate to verify and approve. Every tool call the agent made, every signal it weighted, every alternative hypothesis it discarded is auditable — because nobody is going to trust an AI rollback they can't inspect.

Demo

🎥 Video walkthrough —

🔗 Live demo —

Code

📦 Repository — (github.com/ashwithpoojary98/incident-lens)

Stack:

Spring Boot 3 orchestration service (alert ingestion, tool routing, audit logging)
Python agent sidecar serving Gemma 4 via vLLM with native function calling
Kafka between them for streaming tool results into the timeline as they arrive
PostgreSQL for incident state, Redis for active-incident working memory

How I Used Gemma 4

Gemma 4 is the agent. The Spring Boot service is plumbing — every actual diagnostic decision flows through a Gemma 4 function-calling loop.

Current setup (local dev): Gemma 4 E4B

I'm building and iterating on a single laptop, so I started with E4B. Two reasons it was the right pick for the dev phase:

It validates the architecture without GPU infra. If the tool-use loop, prompt scaffolding, and timeline reconstruction work on E4B, they'll work better on larger models. Iterating on a 4B model means seconds per cycle, not minutes.
128K context is already enough to prototype long-context behavior. I can fit 1–2 hours of logs plus a distributed trace plus recent commits and exercise the full end-to-end plumbing without simulating it.

Production target: Gemma 4 26B A4B (MoE)

The plan is to swap to 26B MoE before any real deployment. The reasons are specific, not generic:

MoE economics. 3.8B active params means I can serve it on a single L40S or RTX 6000 and still fan out 5–10 parallel tool calls without the model becoming the bottleneck. Dense 31B would cost meaningfully more per incident.
256K context. Full incident timelines (logs + traces + commits + Slack thread) fit without RAG fragmentation that loses temporal ordering — and temporal ordering is the signal in incident response.
Configurable thinking modes. Fast/shallow on the first signal so the engineer sees a hypothesis in seconds; deep reasoning when they follow up with "but why did the retry storm start?"
Native function calling. Non-negotiable. The agent lives or dies on tool-use accuracy across log/metric/trace backends. Pre-Gemma-4 this had to be bolted on with prompt scaffolding; Gemma 4 ships it.
Apache 2.0 + self-hostable. The whole product premise is "your prod logs never leave your network." Hosted APIs literally cannot solve this market.

I deliberately did not pick 31B Dense. Overkill for tool-using reasoning, harder to serve cheaply, and the marginal quality gain doesn't justify the inference cost when the bottleneck is tool-call accuracy, not raw reasoning ceiling.

Why this E4B → 26B MoE path works

Gemma 4 keeps a consistent function-calling API across the family, so the upgrade is essentially a config line swap. I get to validate the entire product loop locally on commodity hardware, then move to the production-grade model without rewriting the agent. That portability — same prompt format, same tool schemas, same thinking-mode controls from 4B all the way up — is honestly the most underrated thing about the Gemma 4 release, and it shaped my whole development approach.