ekb

Posted on May 24

Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable

#mcp #promptengineering #ai #llm

How application observability extends to stochastic agent loops — and why the tool boundary matters.

Production failures in LLM systems are often misattributed to the model. In practice, many incidents live in the action layer: a downstream API that time out, a tool that returns a business error inside a successful RPC, a subprocess the host spawned but never joined to the same trace. Standard logs capture completions; they rarely preserve the causal chain decision → tool invocation → observation → next decision.

This article is about that gap. It compares classic APM to agent telemetry, explains how the Model Context Protocol (MCP) gives observability a stable integration point, and points to a minimal reference stack (OpenTelemetry, optional Logfire, Jaeger) where host and tool server share one trace_id.

Reference implementation: github.com/ekb-dev-ai/mcp-trace-demo

LLM telemetry vs classic APM — and what MCP transfers

Classic APM assumes a largely deterministic call graph: a request enters a service, fans out to databases and queues, each hop becomes a span with stable identity, latency, and error semantics. The unit of analysis is the request boundary. OpenTelemetry succeeded because that graph is finite and repeatable across deployments.

Agent systems change the shape of work. Execution is a loop, not a handler: the runtime may call a language model, parse a structured action, invoke an external capability, append the result to context, and repeat. The expensive and risky steps are inference (variable latency, token cost) and side effects (tools, APIs, subprocesses). A trace that only wraps “the agent” or only logs final text cannot answer operational questions such as: which tool ran, with what arguments, how long it took, and whether the failure was transport-level or semantic.

Two partial fixes are common and both are incomplete:

Completion logging — auditable, but detached from tool causality.
A single root span around the agent — one box in Jaeger, no visibility into the tool subprocess or remote server.

What is needed is the same abstraction microservices already use: distributed tracing across process boundaries, with LLM-specific spans nested inside.

MCP does not replace OpenTelemetry; it defines where the tool boundary sits. The protocol specifies discovery, typed tool schemas, and invocation (tools/call). SEP-414 allows W3C trace context in params._meta, so propagation can cross stdio pipes or HTTP the way it crosses service meshes. An MCP server—whether a local subprocess or a remote host—is observability-wise a peer service: its own service.name, its own spans, joinable to the host via trace_id. The host records orchestration and model rounds; the server records tool execution; exporters merge them into one waterfall.

In short: HTTP gave APM a stable wire format for service-to-service calls; MCP gives APM a stable wire format for model-to-tool calls. The agent loop stays stochastic; the act step becomes inspectable.

What a useful agent trace contains

Without tying this to any particular domain, a minimal useful trace for one agent run typically includes:

Layer	Span types (examples)	Questions answered
Orchestration	workflow, task, agent	What program ran, in what order?
Inference	LLM / completion spans	How many model rounds, how slow?
MCP client	`tools/call` on the host	Which tool was selected, when?
MCP server	handle request, tool-internal spans	What did the tool actually do, for how long?

The operational payoff is attribution: distinguish “bad reasoning” from “slow dependency” from “tool returned an error payload the model misread.”

Architecture pattern (host + MCP server)

A common deployment pattern:

Host process — agent framework, model client, MCP client. Exports spans as e.g. agent-host.
Tool process — MCP server exposing one or more tools. Exports spans as e.g. mcp-tool-server.
Transport — often stdio (subprocess with JSON-RPC on stdin/stdout) or HTTP for remote servers.
Backend — OTLP to Jaeger, Grafana Tempo, Logfire, etc.

┌──────────────┐     MCP (stdio or HTTP)      ┌─────────────────┐
│  agent-host  │ ─── traceparent in _meta ──► │ mcp-tool-server │
│  LLM + client│                              │  tool handlers  │
└──────┬───────┘                              └────────┬────────┘
       │ OTLP                                          │ OTLP
       └────────────────────► trace backend ◄───────────┘

Propagation requirement: both sides instrument MCP and share a single TracerProvider (or compatible OTLP pipeline). On the host, framework and model instrumentors must attach to that provider—not install a second global one, or spans fragment.

Video walkthrough

The video implements the pattern above end-to-end and walks through one trace in the UI.

Implementation notes (from the reference repo)

Stdio: stdout is the protocol

When MCP uses stdio, stdout must carry only JSON-RPC. Diagnostic libraries that print to stdout in the child process corrupt the stream (Failed to parse JSONRPC message). The fix is process-scoped: disable console export in the server process while keeping OTLP export enabled.

configure_telemetry(service_name="mcp-tool-server", stdio_safe=True)
# → logfire.configure(..., console=False)  # no stdout noise; Jaeger unchanged

Shared setup in both processes

def configure_telemetry(*, service_name: str, instrument_crewai: bool = False, stdio_safe: bool = False):
    logfire.configure(service_name=service_name, console=False if stdio_safe else None)
    logfire.instrument_mcp()  # SEP-414 propagation + MCP span semantics

    if instrument_crewai:
        tp = logfire.DEFAULT_LOGFIRE_INSTANCE.config.get_tracer_provider()
        CrewAIInstrumentor().instrument(tracer_provider=tp)

Local Jaeger (no cloud token): default OTLP endpoint http://localhost:4318/v1/traces.

Spans inside tool handlers

Tool code can add business spans beneath the MCP handle span—latency, domain errors, structured attributes—without changing the protocol:

with logfire.span("tool_operation", key=input_key):
    result = do_work()
    if result.is_business_failure:
        logfire.error("domain failure", detail=result.detail)

Distinguish RPC success + business error in JSON from transport failure; they show up differently in backends and in SLO design.

Reading a trace (generic)

Select the host service; open a recent trace.
Descend through orchestration → LLM spans (inference rounds).
Find MCP client tools/call spans between rounds.
Filter the same trace_id on the tool server service; confirm server-side handle + nested tool spans align in time.

If client and server traces do not share trace_id, propagation is broken—often stdout pollution on stdio, a second TracerProvider, or missing instrument_mcp().

Reference demo (optional)

The linked repository is a toy vertical slice, not a product architecture: CrewAI + Ollama on the host, FastMCP on stdio, three tools, artificial slow path for visible latency in Jaeger. It exists to make the abstract pattern reproducible in one command:

docker compose up -d && poetry install && ./scripts/demo.sh
# UI: http://localhost:16686

Use it to validate your exporter and propagation; replace frameworks and tools with your own stack—the observability model stays the same.

Limits

Tracing describes what ran, not whether outputs were correct (evals still required).
High-cardinality prompt content may need redaction (TRACELOOP_TRACE_CONTENT=false or equivalent).
HTTP MCP adds auth, TLS, and tenancy concerns stdio demos elide.
Span cardinality from framework + model + MCP + manual spans can be heavy; sampling may be necessary at scale.

Summary

Application observability matured around request-scoped, deterministic graphs. LLM agents introduce stochastic loops with external actions. MCP standardizes those actions as protocol-level calls across services, and SEP-414 carries trace context across that boundary so existing OpenTelemetry pipelines apply. The engineering work is mostly wiring: one telemetry setup per process, MCP instrumentation, correct stdio discipline, and a backend that can display host and server spans under one id.

Code and video: mcp-trace-demo. Comments on production patterns for MCP tracing welcome.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.