DEV Community

Nidhi Singh
Nidhi Singh

Posted on

I gave one Gemini agent two observability tools. The correlation it found surprised me.


There is a category of production bug in AI systems that I find genuinely fascinating, because the difficulty has almost nothing to do with the bug itself. The bug is often simple. What makes it nearly undebuggable is the way we've chosen to organize our tools. I want to walk through it carefully, because once the shape of the problem is clear, the solution becomes almost forced and that solution turned into a project I'll show you at the end.

Two worlds that never meet

When you put a model into production, you quickly find yourself watching two different things.

The first is the infrastructure. This is the world of memory, CPU, pods, network, latency — the machinery the model runs on. We have excellent tools for this; Dynatrace is the one I used.

The second is the model's own behavior. Is it hallucinating? Are its answers relevant? How is the eval score trending, how many tokens is it consuming? This is a genuinely different kind of observability, and again we have good tools for it; I used Arize Phoenix.

Here is the important part, and it's so ordinary that it's easy to miss: these two worlds are monitored by two different products, and those products do not know about each other. Worse, they're usually watched by two different teams. The infrastructure has its on-call rotation; the model has its own. Each group is fluent in its own dashboard and effectively blind to the other.

The failure that lives in the seam

Now consider a specific incident. A memory leak begins on one of your pods. Under memory pressure, the system does something reasonable in isolation: it trims the buffer that assembles prompts before they go to the model, to reclaim space. The consequence is that the model starts receiving prompts with part of their context silently removed. And a model running on half its context does the only thing it can, it fills the missing pieces by guessing. The hallucination rate climbs.

Watch what each observer sees. The infrastructure engineer sees memory utilization spike. That's a familiar, almost boring signal - restart the pod, reclaim the memory, move on. The ML engineer sees the model's answer quality fall off a cliff and begins the long investigation into prompts, retrieval, weights. Each of them is looking at exactly one link of a single causal chain, and nothing in their tool gives them any reason to suspect that the other link exists, let alone that it belongs to the same story.

This is the insight I kept coming back to: the bug is not technical, it's organizational. Every piece of information required to solve it is already being collected. The failure is purely that the two halves of the chain never arrive in the same place, in the same mind, at the same time.

The forced solution

Once you frame it that way, the fix is almost not a choice. If the problem is that no single observer sees both layers, then you create an observer that does. You put one agent in front of both dashboards.

That's ARIA. It connects to both Dynatrace and Arize Phoenix through their MCP servers, pulls the relevant signals from each, and hands the combined picture to Gemini — orchestrated with Google's Agent Development Kit as a planner → reasoner → executor pipeline — to reason over as one problem instead of two.

The one decision that actually mattered

I'll share the mistake, because it's the most useful part. My first design was two agents: one that understood Dynatrace, one that understood Arize, talking to each other. It felt natural — mirror the org structure in software. It does not work. All it does is faithfully reproduce the two-teams blind spot inside your code. Each agent is still an expert in one half and a stranger to the other.

The correlation only emerges when a single agent holds both toolsets inside one reasoning context. When one mind can call a Dynatrace tool and an Arize tool in the same turn, and keep both results in view at once, it can finally see the chain end to end. That's the whole product compressed into a sentence: one mind, both halves.

Try it

Top comments (0)