Telemetry Is the Context Your Coding Agent Is Missing

#platformengineering #observability #aiengineering #developerproductivity

The question being asked in r/kubernetes this week is whether to feed observability data to coding agents. Traces, metrics, logs, the whole stream the system already emits — should the agent that writes and fixes your code get to read it? The answer is yes. The agent that can see what production actually does will fix the right thing more often than the agent staring at a static repo. But the answer comes with a condition that the question usually skips: the telemetry has to be shaped by someone who runs the system before it is worth anything to the agent.

Raw telemetry is not context. It is access-rent input. A trace tells you a request took 800 milliseconds and touched nine services. It does not tell you whether 800 milliseconds is normal for that path or a five-alarm regression. A spike in a metric is a number going up. Whether the number going up matters depends on what the baseline is, what the team has agreed to care about, and what the system was doing last Tuesday when the same shape showed up and turned out to be a batch job. None of that lives in the data. It lives in the heads of the people on call.

The Agent Reads the Stream, Not the Meaning

The direction of travel here is clear. AI development environments are turning into observability-first platforms that wire real-time telemetry, prompt traces, and evaluation feedback into the developer workflow, connected to the editor over MCP. The plumbing is getting solved. You can hand an agent a live feed of what your system is doing, and the agent can read it. That is a real capability and it is arriving fast.

The plumbing is not the hard part. The hard part is that the agent reads the stream and not the meaning. A coding agent given raw traces will treat every anomaly as equally interesting because it has no way to know which ones the team has already triaged, which ones are load-bearing, and which ones are noise the on-call rotation learned to ignore eight months ago. Feed it everything and you get an agent that is confidently wrong about what is broken.

This is the same failure mode the observability vendors are already warning about at the protocol layer. Honeycomb's engineering team points out that piping raw context to an agent backfires: each MCP server the user configures clogs up the context with instructions and tool definitions, whether the agent needs them for this conversation or not. More telemetry in the context window is not more signal. Past a point it is less, because the agent spends its attention budget on data nobody decided was relevant.

A Healthy Baseline Is a Decision, Not a Reading

So which signals should the agent read? That is not a question the platform answers. It is a question an operator answers.

Consider what it takes to say "this is healthy." Specifically: the p99 latency on the checkout path should sit under 300 milliseconds — except during the nightly reindex, when 900 is fine and expected. Error rate on the auth service above 0.5% is a page; the same rate on the recommendations service is a shrug because it fails open. The queue depth metric that the dashboard renders in red is actually the metric the team stopped trusting after the last migration, and the real signal moved to a different counter that the dashboard does not even show. Every one of those is a judgment call. Every one of them was made by someone who got paged, traced the incident, and decided what the line should be.

That is why the telemetry standard an agent reads against has to be a praxis output. It comes from the people who run the system, written down from the actual experience of operating it — not handed down by a platform team that provisions the dashboards and never gets paged. The platform team can give you the stream. They cannot tell you that a spike in the third service is meaningless and a flutter in the first one means drop everything, because they have never been on the wrong end of either at 2 a.m.

The Shaping Role Is Underbuilt

The infrastructure to do this is further along than the discipline to use it well. Nearly 90 percent of enterprises now run internal platforms, with observability sitting inside the dozen platform capabilities that are now table stakes, according to the field's 2025 year-end review drawing on the State of Platform Engineering and the 2025 DORA report. The same review found that AI adoption and platform maturity now drive each other, which means more agents reading more telemetry, soon.

But only about a quarter of those organizations have a dedicated platform product manager. The platform exists. The role that decides what the platform's signals mean — the person who turns raw telemetry into a standard an agent can act on — is mostly not staffed. We are wiring agents into observability streams faster than we are deciding who owns the question of which signals matter. That gap is where the bad agent behavior is going to come from.

The Standard Is Being Written by Practitioners

There is a healthier version of this already taking shape, and it is being built by the people who operate the systems, not specified from above. OpenTelemetry's AI-agent observability work, driven by its GenAI special interest group, is an effort to standardize the shape of the telemetry that agent apps emit. The point is explicit in their own framing: for a non-deterministic agent, telemetry is a feedback loop used to evaluate and improve the agent's behavior, and the community needs to set standards around that telemetry's shape to avoid lock-in.

That is the model. The standard for what an agent reads gets authored by practitioners — maintainers, on-call engineers, the people who feel the consequences — and it gets written down where the agent can use it. The research backs this up: the documented patterns for feeding telemetry to agents already include local prompt iteration, CI-based optimization, and autonomous agents that adapt using telemetry, built on named frameworks like DSPy and PromptWizard. This is not speculation about a future capability. The architecture exists. What is still missing on most teams is the human decision about what the agent should pay attention to.

What to Actually Do

If you are deciding whether to give your coding agent observability data, give it. The agent that sees production is better than the agent that does not. But do not hand it the raw firehose and call it context.

Write down the baselines. Which paths are hot, what normal looks like on each, which alerts are real and which are theater. Name the anomalies that matter and the ones the team learned to ignore. Make that document the thing the agent reads against, not the unfiltered metric stream. And put it in the hands of someone who actually runs the system, because the value of telemetry to your agent is exactly equal to the quality of the judgment that shaped it. The data is access-rent. The standard is the asset.