This is a submission for the Gemma 4 Challenge: Build with Gemma 4
What I Built
IncidentLens — a self-hosted incident response copilot for SRE and platform teams.
When a P1 alert fires at 2 AM, the on-call engineer's first 30 minutes are mostly tab-switching: Grafana → Kibana → Jaeger → GitHub → Slack → runbook wiki. The actual reasoning is maybe 20% of that wall-clock time. The other 80% is swivel-chair work — copy-pasting timestamps, remembering which query syntax goes where, asking on Slack who deployed last.
IncidentLens collapses that loop. Point it at your observability backends (Loki/Elastic, Prometheus, Jaeger/Tempo, GitHub, Slack) via a YAML config. When an incident opens, a Gemma 4–powered agent fans out across all of them in parallel, builds a unified timeline, and returns:
- A ranked root-cause hypothesis with the full evidence trail behind it
- A remediation command ready to execute, with estimated recovery time
- A postmortem draft that writes itself as a byproduct of the diagnosis
Crucially, the whole thing runs inside your VPC. No production logs leave the perimeter. That's the unlock for regulated industries — finance, healthcare, government — where existing AIOps SaaS isn't an option because data residency rules forbid shipping prod telemetry to a third-party cloud.
The engineer's job collapses from investigate to verify and approve. Every tool call the agent made, every signal it weighted, every alternative hypothesis it discarded is auditable — because nobody is going to trust an AI rollback they can't inspect.
Demo
🎥 Video walkthrough —
🔗 Live demo —
Code
📦 Repository — (github.com/ashwithpoojary98/incident-lens)
Stack:
- Spring Boot 3 orchestration service (alert ingestion, tool routing, audit logging)
- Python agent sidecar serving Gemma 4 via vLLM with native function calling
- Kafka between them for streaming tool results into the timeline as they arrive
- PostgreSQL for incident state, Redis for active-incident working memory
How I Used Gemma 4
Gemma 4 is the agent. The Spring Boot service is plumbing — every actual diagnostic decision flows through a Gemma 4 function-calling loop.
Current setup (local dev): Gemma 4 E4B
I'm building and iterating on a single laptop, so I started with E4B. Two reasons it was the right pick for the dev phase:
- It validates the architecture without GPU infra. If the tool-use loop, prompt scaffolding, and timeline reconstruction work on E4B, they'll work better on larger models. Iterating on a 4B model means seconds per cycle, not minutes.
- 128K context is already enough to prototype long-context behavior. I can fit 1–2 hours of logs plus a distributed trace plus recent commits and exercise the full end-to-end plumbing without simulating it.
Production target: Gemma 4 26B A4B (MoE)
The plan is to swap to 26B MoE before any real deployment. The reasons are specific, not generic:
- MoE economics. 3.8B active params means I can serve it on a single L40S or RTX 6000 and still fan out 5–10 parallel tool calls without the model becoming the bottleneck. Dense 31B would cost meaningfully more per incident.
- 256K context. Full incident timelines (logs + traces + commits + Slack thread) fit without RAG fragmentation that loses temporal ordering — and temporal ordering is the signal in incident response.
- Configurable thinking modes. Fast/shallow on the first signal so the engineer sees a hypothesis in seconds; deep reasoning when they follow up with "but why did the retry storm start?"
- Native function calling. Non-negotiable. The agent lives or dies on tool-use accuracy across log/metric/trace backends. Pre-Gemma-4 this had to be bolted on with prompt scaffolding; Gemma 4 ships it.
- Apache 2.0 + self-hostable. The whole product premise is "your prod logs never leave your network." Hosted APIs literally cannot solve this market.
I deliberately did not pick 31B Dense. Overkill for tool-using reasoning, harder to serve cheaply, and the marginal quality gain doesn't justify the inference cost when the bottleneck is tool-call accuracy, not raw reasoning ceiling.
Why this E4B → 26B MoE path works
Gemma 4 keeps a consistent function-calling API across the family, so the upgrade is essentially a config line swap. I get to validate the entire product loop locally on commodity hardware, then move to the production-grade model without rewriting the agent. That portability — same prompt format, same tool schemas, same thinking-mode controls from 4B all the way up — is honestly the most underrated thing about the Gemma 4 release, and it shaped my whole development approach.
Top comments (0)