Stepan Samko

Posted on Dec 30, 2025 • Originally published at samko.io

How to Give AI Agents Access to Runtime Traces

#ai #observability #debugging #tooling

Debugging Locally with Execution-Aware AI (Using Runtime Traces)

Who is it for?
This post is for developers using AI coding agents (Claude, Cursor, etc.) who debug real applications locally and want faster, more concrete answers than static code analysis can provide.

AI coding agents are very good at reading code. Point one at a repository and it can explain control flow, trace dependencies, spot patterns, and suggest changes almost instantly.

Then you try to debug something locally and everything slows down:

"It's fast in tests but slow when I click around."
"This request hangs once every ~20 runs."
"It works until I restart the server."
"Nothing changed, but now it times out."

These aren't problems of understanding code. They're problems of how the code actually executes.

Right now, agents mostly operate on static inputs. They know what the code says, but they don't directly see how it behaves when it runs. So you end up doing the unglamorous part: reproducing the issue, checking traces or logs, figuring out what's slow or failing, and then translating that back into text.

That translation step is where most of the time goes.

This post shows how to remove that step for local debugging, using OpenTelemetry traces and a small tool I built called otel-mcp.

The current AI debugging workflow (and why it's slow)

If you've used an AI agent to debug anything beyond syntax errors, this loop probably feels familiar.

Before:


You:    The /users endpoint is slow locally.
Agent: I see two DB queries and an external call. Which part is slow?
You:    (spends 10 minutes checking logs / traces / console output)
You:    The external API averages ~600ms.
Agent: Add caching. Here's how.

The agent's suggestion might even be correct.

But notice where the time went:

Not writing code
Not understanding logic
But finding evidence and explaining it

You're acting as a human adapter between the running system and the model.

The problem isn't reasoning.

The problem is visibility.

Reading code doesn't tell you how it behaves

Here's a completely normal endpoint:

app.get('/api/users/:id', async (req, res) => {
  const user = await db.query('SELECT * FROM users WHERE id = ?', [req.params.id]);
  const posts = await db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]);
  const enriched = await enrichUserData(user, posts);
  res.json(enriched);
});

From the source alone, an agent can say sensible things:

There are two database queries.
enrichUserData probably does some I/O.
The awaits are sequential.
A slow dependency or missing index is possible.

All of that is true. It's also not enough to debug.

When you're actually fixing a local issue, you care about things like:

Which step is slow right now, on this machine?
Is it consistently slow or only sometimes?
Are retries or timeouts involved?
Is work queueing behind a pool or a lock?
Is an error happening and getting swallowed?

You can't read your way to answers like:

"This query takes ~430ms because it returns ~50 rows without an index."

"The HTTP call is fine, but connection reuse stalls after restart."

Those answers only show up when the code runs.

Traces: execution data an AI can actually use

Distributed tracing gives you a structured record of how a request executes.

A trace is a tree of spans. Each span represents a timed operation — a database query, an HTTP call, a cache access — with attributes like duration, status, and optional metadata.

With OpenTelemetry auto-instrumentation, you usually get spans for exactly the things that dominate local latency.

That same endpoint might produce a trace like this:

Trace: GET /api/users/123   (total: 892ms)

  db.query users_by_id                 6ms
  db.query posts_by_user_id          438ms   rows=52
  enrichUserData                     440ms
    http.client GET /user-metadata   411ms   status=200
    cache.set                          9ms

This is concrete, timed, hierarchical data. It tells you what ran, in what order, and how long each step took.

This is exactly the information you look for when debugging manually.

So the obvious next question is:

What if the agent could inspect this directly?

otel-mcp: traces for AI agents

MCP (Model Context Protocol) is a way for AI agents to call external tools during a session.

otel-mcp is a small MCP server that makes runtime traces available to AI agents during local development.

It does three simple things:

Runs a local OTLP HTTP collector (localhost only).
Stores recent traces in memory (ephemeral, dev-focused).
Exposes a small set of query tools that agents can call via MCP.

It's not an APM. There's no UI, no dashboards, no retention policies.
It's designed specifically for local debugging with AI agents.

Which agents does this work with?

otel-mcp works with any MCP-compatible AI coding assistant, including:

Claude Code
Cursor
and others that support MCP tools

If your agent can call MCP tools, it can query traces through otel-mcp.

What changes when the agent can see traces

Let's replay the same debugging scenario.

After (with otel-mcp):

You:    The /users endpoint is slow locally.
Agent: (lists slow traces from the last few minutes)
Agent: I found 23 slow requests. The posts query averages ~430ms and
       returns ~50 rows without an index. The external API averages ~70ms.
       Adding an index on posts(user_id) should fix this.

No "can you check the traces and tell me what you see?"
No screenshots.
No paraphrasing what you saw in another tool.

The agent is inspecting the same execution data you would — directly.

What tools the agent gets

otel-mcp exposes a deliberately small and predictable set of tools:

get_summary()
What services are emitting traces? How many? Any recent errors?
list_traces(filters)
Show slow traces, error traces, or traces for a specific service.
get_trace(trace_id)
Retrieve the full span tree for one request.
query_spans(where)
Find spans matching conditions like:

  duration > 100 AND status = error
  http.status_code >= 500

Example:

  Agent calls:
  query_spans({ where: "duration > 100 AND http.status_code >= 500" })

  Result:
  12 spans from the last 5 minutes showing slow, failed API calls

clear_traces() Reset state between test runs.

This mirrors how people actually debug: start broad, narrow down, inspect details.

What you need

Before setting this up, make sure you have:

An OpenTelemetry-instrumented application (or willingness to add it)
An MCP-compatible AI agent (Claude Code, Cursor, etc.)
A runtime with OpenTelemetry support (Node.js, Java, Go, Python, etc.)

You don't need production observability or a hosted backend — this is all local.

Setting it up

If you already use OpenTelemetry, setup is usually trivial — just point your exporter at the local collector.

Here's a typical Node.js configuration:

import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';

const sdk = new NodeSDK({
  serviceName: 'my-app',
  traceExporter: new OTLPTraceExporter({
    url: 'http://localhost:4318/v1/traces',
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Auto-instrumentation usually captures most of what matters locally:

HTTP handlers
Database queries
Outbound HTTP calls
Caches

When it doesn't, adding a manual span around a critical section is usually enough.

Why this works well for local debugging

A few design choices make this effective in practice:

In-memory storage
Fast, simple, and good enough for local iteration.
Constrained queries
The agent asks targeted questions instead of pulling everything into context.
Local-only by default
No risk of accidentally exposing production data.
Shared collector
If you run multiple agent clients, they all see the same traces.

The goal isn't to build a full observability platform. It's to remove friction from debugging.

Common issues

Traces not showing up?
Check that your app is exporting to http://localhost:4318/v1/traces.
Agent not seeing tools?
Make sure otel-mcp is included in your MCP configuration.
Using multiple agents?
The first one starts the collector; others connect automatically.
Nothing interesting in traces?
You may need to add a manual span around the code you're investigating.

Limitations (and why they're fine here)

Not production APM
No dashboards, alerting, or long-term storage.
Auto-instrumentation isn't perfect
Some libraries and async edges need manual spans.
Traces may include sensitive data
Keep the collector on localhost and redact attributes if needed.
There is overhead
Usually fine locally; measure if you're doing something very tight.

For local debugging, these trade-offs are usually acceptable.

Try it

If you're debugging locally with an AI agent and find yourself constantly explaining what you saw in logs or traces, this is worth trying.

👉 https://github.com/moondef/otel-mcp

Keep using OpenTelemetry.
Point your exporter at localhost during dev.
Let the agent query traces instead of asking you to summarize them.

Once the agent can see how the code actually runs, the conversation changes – and debugging gets noticeably faster.

DEV Community