DEV Community

ekb
ekb

Posted on

Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable

How application observability extends to stochastic agent loops — and why the tool boundary matters.


Production failures in LLM systems are often misattributed to the model. In practice, many incidents live in the action layer: a downstream API that time out, a tool that returns a business error inside a successful RPC, a subprocess the host spawned but never joined to the same trace. Standard logs capture completions; they rarely preserve the causal chain decision → tool invocation → observation → next decision.

This article is about that gap. It compares classic APM to agent telemetry, explains how the Model Context Protocol (MCP) gives observability a stable integration point, and points to a minimal reference stack (OpenTelemetry, optional Logfire, Jaeger) where host and tool server share one trace_id.

Reference implementation: github.com/ekb-dev-ai/mcp-trace-demo


LLM telemetry vs classic APM — and what MCP transfers

Classic APM assumes a largely deterministic call graph: a request enters a service, fans out to databases and queues, each hop becomes a span with stable identity, latency, and error semantics. The unit of analysis is the request boundary. OpenTelemetry succeeded because that graph is finite and repeatable across deployments.

Agent systems change the shape of work. Execution is a loop, not a handler: the runtime may call a language model, parse a structured action, invoke an external capability, append the result to context, and repeat. The expensive and risky steps are inference (variable latency, token cost) and side effects (tools, APIs, subprocesses). A trace that only wraps “the agent” or only logs final text cannot answer operational questions such as: which tool ran, with what arguments, how long it took, and whether the failure was transport-level or semantic.

Two partial fixes are common and both are incomplete:

  1. Completion logging — auditable, but detached from tool causality.
  2. A single root span around the agent — one box in Jaeger, no visibility into the tool subprocess or remote server.

What is needed is the same abstraction microservices already use: distributed tracing across process boundaries, with LLM-specific spans nested inside.

MCP does not replace OpenTelemetry; it defines where the tool boundary sits. The protocol specifies discovery, typed tool schemas, and invocation (tools/call). SEP-414 allows W3C trace context in params._meta, so propagation can cross stdio pipes or HTTP the way it crosses service meshes. An MCP server—whether a local subprocess or a remote host—is observability-wise a peer service: its own service.name, its own spans, joinable to the host via trace_id. The host records orchestration and model rounds; the server records tool execution; exporters merge them into one waterfall.

In short: HTTP gave APM a stable wire format for service-to-service calls; MCP gives APM a stable wire format for model-to-tool calls. The agent loop stays stochastic; the act step becomes inspectable.


What a useful agent trace contains

Without tying this to any particular domain, a minimal useful trace for one agent run typically includes:

Layer Span types (examples) Questions answered
Orchestration workflow, task, agent What program ran, in what order?
Inference LLM / completion spans How many model rounds, how slow?
MCP client tools/call on the host Which tool was selected, when?
MCP server handle request, tool-internal spans What did the tool actually do, for how long?

The operational payoff is attribution: distinguish “bad reasoning” from “slow dependency” from “tool returned an error payload the model misread.”


Architecture pattern (host + MCP server)

A common deployment pattern:

  • Host process — agent framework, model client, MCP client. Exports spans as e.g. agent-host.
  • Tool process — MCP server exposing one or more tools. Exports spans as e.g. mcp-tool-server.
  • Transport — often stdio (subprocess with JSON-RPC on stdin/stdout) or HTTP for remote servers.
  • Backend — OTLP to Jaeger, Grafana Tempo, Logfire, etc.
┌──────────────┐     MCP (stdio or HTTP)      ┌─────────────────┐
│  agent-host  │ ─── traceparent in _meta ──► │ mcp-tool-server │
│  LLM + client│                              │  tool handlers  │
└──────┬───────┘                              └────────┬────────┘
       │ OTLP                                          │ OTLP
       └────────────────────► trace backend ◄───────────┘
Enter fullscreen mode Exit fullscreen mode

Propagation requirement: both sides instrument MCP and share a single TracerProvider (or compatible OTLP pipeline). On the host, framework and model instrumentors must attach to that provider—not install a second global one, or spans fragment.


Video walkthrough

The video implements the pattern above end-to-end and walks through one trace in the UI.


Implementation notes (from the reference repo)

Stdio: stdout is the protocol

When MCP uses stdio, stdout must carry only JSON-RPC. Diagnostic libraries that print to stdout in the child process corrupt the stream (Failed to parse JSONRPC message). The fix is process-scoped: disable console export in the server process while keeping OTLP export enabled.

configure_telemetry(service_name="mcp-tool-server", stdio_safe=True)
# → logfire.configure(..., console=False)  # no stdout noise; Jaeger unchanged
Enter fullscreen mode Exit fullscreen mode

Shared setup in both processes

def configure_telemetry(*, service_name: str, instrument_crewai: bool = False, stdio_safe: bool = False):
    logfire.configure(service_name=service_name, console=False if stdio_safe else None)
    logfire.instrument_mcp()  # SEP-414 propagation + MCP span semantics

    if instrument_crewai:
        tp = logfire.DEFAULT_LOGFIRE_INSTANCE.config.get_tracer_provider()
        CrewAIInstrumentor().instrument(tracer_provider=tp)
Enter fullscreen mode Exit fullscreen mode

Local Jaeger (no cloud token): default OTLP endpoint http://localhost:4318/v1/traces.

Spans inside tool handlers

Tool code can add business spans beneath the MCP handle span—latency, domain errors, structured attributes—without changing the protocol:

with logfire.span("tool_operation", key=input_key):
    result = do_work()
    if result.is_business_failure:
        logfire.error("domain failure", detail=result.detail)
Enter fullscreen mode Exit fullscreen mode

Distinguish RPC success + business error in JSON from transport failure; they show up differently in backends and in SLO design.


Reading a trace (generic)

  1. Select the host service; open a recent trace.
  2. Descend through orchestration → LLM spans (inference rounds).
  3. Find MCP client tools/call spans between rounds.
  4. Filter the same trace_id on the tool server service; confirm server-side handle + nested tool spans align in time.

If client and server traces do not share trace_id, propagation is broken—often stdout pollution on stdio, a second TracerProvider, or missing instrument_mcp().


Reference demo (optional)

The linked repository is a toy vertical slice, not a product architecture: CrewAI + Ollama on the host, FastMCP on stdio, three tools, artificial slow path for visible latency in Jaeger. It exists to make the abstract pattern reproducible in one command:

docker compose up -d && poetry install && ./scripts/demo.sh
# UI: http://localhost:16686
Enter fullscreen mode Exit fullscreen mode

Use it to validate your exporter and propagation; replace frameworks and tools with your own stack—the observability model stays the same.


Limits

  • Tracing describes what ran, not whether outputs were correct (evals still required).
  • High-cardinality prompt content may need redaction (TRACELOOP_TRACE_CONTENT=false or equivalent).
  • HTTP MCP adds auth, TLS, and tenancy concerns stdio demos elide.
  • Span cardinality from framework + model + MCP + manual spans can be heavy; sampling may be necessary at scale.

Summary

Application observability matured around request-scoped, deterministic graphs. LLM agents introduce stochastic loops with external actions. MCP standardizes those actions as protocol-level calls across services, and SEP-414 carries trace context across that boundary so existing OpenTelemetry pipelines apply. The engineering work is mostly wiring: one telemetry setup per process, MCP instrumentation, correct stdio discipline, and a backend that can display host and server spans under one id.


Code and video: mcp-trace-demo. Comments on production patterns for MCP tracing welcome.

Top comments (0)