DEV Community

ashwithpoojary98
ashwithpoojary98

Posted on

Gemma 4 Challenge: Build with Gemma 4

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

IncidentLens — a self-hosted incident response copilot for SRE and platform teams.

When a P1 alert fires at 2 AM, the on-call engineer's first 30 minutes are mostly tab-switching: Grafana → Kibana → Jaeger → GitHub → Slack → runbook wiki. The actual reasoning is maybe 20% of that wall-clock time. The other 80% is swivel-chair work — copy-pasting timestamps, remembering which query syntax goes where, asking on Slack who deployed last.

IncidentLens collapses that loop. Point it at your observability backends (Loki/Elastic, Prometheus, Jaeger/Tempo, GitHub, Slack) via a YAML config. When an incident opens, a Gemma 4–powered agent fans out across all of them in parallel, builds a unified timeline, and returns:

  • A ranked root-cause hypothesis with the full evidence trail behind it
  • A remediation command ready to execute, with estimated recovery time
  • A postmortem draft that writes itself as a byproduct of the diagnosis

Crucially, the whole thing runs inside your VPC. No production logs leave the perimeter. That's the unlock for regulated industries — finance, healthcare, government — where existing AIOps SaaS isn't an option because data residency rules forbid shipping prod telemetry to a third-party cloud.

The engineer's job collapses from investigate to verify and approve. Every tool call the agent made, every signal it weighted, every alternative hypothesis it discarded is auditable — because nobody is going to trust an AI rollback they can't inspect.

Demo

🎥 Video walkthrough —

🔗 Live demo —

Code

📦 Repository — (github.com/ashwithpoojary98/incident-lens)

Stack:

  • Spring Boot 3 orchestration service (alert ingestion, tool routing, audit logging)
  • Python agent sidecar serving Gemma 4 via vLLM with native function calling
  • Kafka between them for streaming tool results into the timeline as they arrive
  • PostgreSQL for incident state, Redis for active-incident working memory

How I Used Gemma 4

Gemma 4 is the agent. The Spring Boot service is plumbing — every actual diagnostic decision flows through a Gemma 4 function-calling loop.

Current setup (local dev): Gemma 4 E4B

I'm building and iterating on a single laptop, so I started with E4B. Two reasons it was the right pick for the dev phase:

  • It validates the architecture without GPU infra. If the tool-use loop, prompt scaffolding, and timeline reconstruction work on E4B, they'll work better on larger models. Iterating on a 4B model means seconds per cycle, not minutes.
  • 128K context is already enough to prototype long-context behavior. I can fit 1–2 hours of logs plus a distributed trace plus recent commits and exercise the full end-to-end plumbing without simulating it.

Production target: Gemma 4 26B A4B (MoE)

The plan is to swap to 26B MoE before any real deployment. The reasons are specific, not generic:

  • MoE economics. 3.8B active params means I can serve it on a single L40S or RTX 6000 and still fan out 5–10 parallel tool calls without the model becoming the bottleneck. Dense 31B would cost meaningfully more per incident.
  • 256K context. Full incident timelines (logs + traces + commits + Slack thread) fit without RAG fragmentation that loses temporal ordering — and temporal ordering is the signal in incident response.
  • Configurable thinking modes. Fast/shallow on the first signal so the engineer sees a hypothesis in seconds; deep reasoning when they follow up with "but why did the retry storm start?"
  • Native function calling. Non-negotiable. The agent lives or dies on tool-use accuracy across log/metric/trace backends. Pre-Gemma-4 this had to be bolted on with prompt scaffolding; Gemma 4 ships it.
  • Apache 2.0 + self-hostable. The whole product premise is "your prod logs never leave your network." Hosted APIs literally cannot solve this market.

I deliberately did not pick 31B Dense. Overkill for tool-using reasoning, harder to serve cheaply, and the marginal quality gain doesn't justify the inference cost when the bottleneck is tool-call accuracy, not raw reasoning ceiling.

Why this E4B → 26B MoE path works

Gemma 4 keeps a consistent function-calling API across the family, so the upgrade is essentially a config line swap. I get to validate the entire product loop locally on commodity hardware, then move to the production-grade model without rewriting the agent. That portability — same prompt format, same tool schemas, same thinking-mode controls from 4B all the way up — is honestly the most underrated thing about the Gemma 4 release, and it shaped my whole development approach.

Top comments (0)