<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ekb</title>
    <description>The latest articles on DEV Community by ekb (@ekb-dev).</description>
    <link>https://dev.to/ekb-dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3949194%2F76000d17-889c-41b6-ad3a-18461f192990.png</url>
      <title>DEV Community: ekb</title>
      <link>https://dev.to/ekb-dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ekb-dev"/>
    <language>en</language>
    <item>
      <title>Distributed Tracing for LLM Agents: When MCP Makes Tool Calls Observable</title>
      <dc:creator>ekb</dc:creator>
      <pubDate>Sun, 24 May 2026 17:50:49 +0000</pubDate>
      <link>https://dev.to/ekb-dev/distributed-tracing-for-llm-agents-when-mcp-makes-tool-calls-observable-54gb</link>
      <guid>https://dev.to/ekb-dev/distributed-tracing-for-llm-agents-when-mcp-makes-tool-calls-observable-54gb</guid>
      <description>&lt;p&gt;&lt;em&gt;How application observability extends to stochastic agent loops — and why the tool boundary matters.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Production failures in LLM systems are often misattributed to the model. In practice, many incidents live in the &lt;strong&gt;action layer&lt;/strong&gt;: a downstream API that time out, a tool that returns a business error inside a successful RPC, a subprocess the host spawned but never joined to the same trace. Standard logs capture completions; they rarely preserve the causal chain &lt;strong&gt;decision → tool invocation → observation → next decision&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article is about that gap. It compares &lt;strong&gt;classic APM&lt;/strong&gt; to &lt;strong&gt;agent telemetry&lt;/strong&gt;, explains how the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; gives observability a stable integration point, and points to a minimal reference stack (OpenTelemetry, optional Logfire, Jaeger) where host and tool server share one &lt;code&gt;trace_id&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference implementation:&lt;/strong&gt; &lt;a href="https://github.com/ekb-dev-ai/mcp-trace-demo" rel="noopener noreferrer"&gt;github.com/ekb-dev-ai/mcp-trace-demo&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  LLM telemetry vs classic APM — and what MCP transfers
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Classic APM&lt;/strong&gt; assumes a largely &lt;strong&gt;deterministic call graph&lt;/strong&gt;: a request enters a service, fans out to databases and queues, each hop becomes a span with stable identity, latency, and error semantics. The unit of analysis is the &lt;strong&gt;request boundary&lt;/strong&gt;. OpenTelemetry succeeded because that graph is finite and repeatable across deployments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent systems&lt;/strong&gt; change the shape of work. Execution is a &lt;strong&gt;loop&lt;/strong&gt;, not a handler: the runtime may call a language model, parse a structured action, invoke an external capability, append the result to context, and repeat. The expensive and risky steps are &lt;strong&gt;inference&lt;/strong&gt; (variable latency, token cost) and &lt;strong&gt;side effects&lt;/strong&gt; (tools, APIs, subprocesses). A trace that only wraps “the agent” or only logs final text cannot answer operational questions such as: which tool ran, with what arguments, how long it took, and whether the failure was transport-level or semantic.&lt;/p&gt;

&lt;p&gt;Two partial fixes are common and both are incomplete:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Completion logging&lt;/strong&gt; — auditable, but detached from tool causality.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A single root span around the agent&lt;/strong&gt; — one box in Jaeger, no visibility into the tool subprocess or remote server.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What is needed is the same abstraction microservices already use: &lt;strong&gt;distributed tracing&lt;/strong&gt; across process boundaries, with LLM-specific spans nested inside.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP does not replace OpenTelemetry; it defines where the tool boundary sits.&lt;/strong&gt; The protocol specifies discovery, typed tool schemas, and invocation (&lt;code&gt;tools/call&lt;/code&gt;). &lt;a href="https://modelcontextprotocol.io/seps/414-request-meta" rel="noopener noreferrer"&gt;SEP-414&lt;/a&gt; allows &lt;strong&gt;W3C trace context&lt;/strong&gt; in &lt;code&gt;params._meta&lt;/code&gt;, so propagation can cross stdio pipes or HTTP the way it crosses service meshes. An MCP server—whether a local subprocess or a remote host—is observability-wise a &lt;strong&gt;peer service&lt;/strong&gt;: its own &lt;code&gt;service.name&lt;/code&gt;, its own spans, joinable to the host via &lt;code&gt;trace_id&lt;/code&gt;. The host records orchestration and model rounds; the server records tool execution; exporters merge them into one waterfall.&lt;/p&gt;

&lt;p&gt;In short: HTTP gave APM a stable wire format for service-to-service calls; &lt;strong&gt;MCP gives APM a stable wire format for model-to-tool calls&lt;/strong&gt;. The agent loop stays stochastic; the &lt;strong&gt;act&lt;/strong&gt; step becomes inspectable.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a useful agent trace contains
&lt;/h2&gt;

&lt;p&gt;Without tying this to any particular domain, a minimal useful trace for one agent run typically includes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Span types (examples)&lt;/th&gt;
&lt;th&gt;Questions answered&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;workflow, task, agent&lt;/td&gt;
&lt;td&gt;What program ran, in what order?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inference&lt;/td&gt;
&lt;td&gt;LLM / completion spans&lt;/td&gt;
&lt;td&gt;How many model rounds, how slow?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP client&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;tools/call&lt;/code&gt; on the host&lt;/td&gt;
&lt;td&gt;Which tool was selected, when?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP server&lt;/td&gt;
&lt;td&gt;handle request, tool-internal spans&lt;/td&gt;
&lt;td&gt;What did the tool actually do, for how long?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The operational payoff is &lt;strong&gt;attribution&lt;/strong&gt;: distinguish “bad reasoning” from “slow dependency” from “tool returned an error payload the model misread.”&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture pattern (host + MCP server)
&lt;/h2&gt;

&lt;p&gt;A common deployment pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Host process&lt;/strong&gt; — agent framework, model client, MCP client. Exports spans as e.g. &lt;code&gt;agent-host&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool process&lt;/strong&gt; — MCP server exposing one or more tools. Exports spans as e.g. &lt;code&gt;mcp-tool-server&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transport&lt;/strong&gt; — often &lt;strong&gt;stdio&lt;/strong&gt; (subprocess with JSON-RPC on stdin/stdout) or HTTP for remote servers.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend&lt;/strong&gt; — OTLP to Jaeger, Grafana Tempo, Logfire, etc.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────┐     MCP (stdio or HTTP)      ┌─────────────────┐
│  agent-host  │ ─── traceparent in _meta ──► │ mcp-tool-server │
│  LLM + client│                              │  tool handlers  │
└──────┬───────┘                              └────────┬────────┘
       │ OTLP                                          │ OTLP
       └────────────────────► trace backend ◄───────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Propagation requirement: both sides instrument MCP and share a &lt;strong&gt;single TracerProvider&lt;/strong&gt; (or compatible OTLP pipeline). On the host, framework and model instrumentors must attach to that provider—not install a second global one, or spans fragment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Video walkthrough
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/qCHK4QlPXh8"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The video implements the pattern above end-to-end and walks through one trace in the UI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Implementation notes (from the reference repo)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Stdio: stdout is the protocol
&lt;/h3&gt;

&lt;p&gt;When MCP uses stdio, &lt;strong&gt;stdout must carry only JSON-RPC&lt;/strong&gt;. Diagnostic libraries that print to stdout in the child process corrupt the stream (&lt;code&gt;Failed to parse JSONRPC message&lt;/code&gt;). The fix is process-scoped: disable console export in the &lt;strong&gt;server&lt;/strong&gt; process while keeping OTLP export enabled.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;configure_telemetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp-tool-server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stdio_safe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → logfire.configure(..., console=False)  # no stdout noise; Jaeger unchanged
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Shared setup in both processes
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;configure_telemetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instrument_crewai&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stdio_safe&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;logfire&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;console&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stdio_safe&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logfire&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;instrument_mcp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# SEP-414 propagation + MCP span semantics
&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;instrument_crewai&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logfire&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DEFAULT_LOGFIRE_INSTANCE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tracer_provider&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nc"&gt;CrewAIInstrumentor&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;instrument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tracer_provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Local Jaeger (no cloud token): default OTLP endpoint &lt;code&gt;http://localhost:4318/v1/traces&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Spans inside tool handlers
&lt;/h3&gt;

&lt;p&gt;Tool code can add business spans beneath the MCP handle span—latency, domain errors, structured attributes—without changing the protocol:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;logfire&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_operation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;input_key&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;do_work&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_business_failure&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logfire&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;domain failure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Distinguish &lt;strong&gt;RPC success + business error in JSON&lt;/strong&gt; from &lt;strong&gt;transport failure&lt;/strong&gt;; they show up differently in backends and in SLO design.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reading a trace (generic)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Select the &lt;strong&gt;host&lt;/strong&gt; service; open a recent trace.
&lt;/li&gt;
&lt;li&gt;Descend through orchestration → &lt;strong&gt;LLM spans&lt;/strong&gt; (inference rounds).
&lt;/li&gt;
&lt;li&gt;Find &lt;strong&gt;MCP client&lt;/strong&gt; &lt;code&gt;tools/call&lt;/code&gt; spans between rounds.
&lt;/li&gt;
&lt;li&gt;Filter the &lt;strong&gt;same &lt;code&gt;trace_id&lt;/code&gt;&lt;/strong&gt; on the &lt;strong&gt;tool server&lt;/strong&gt; service; confirm server-side handle + nested tool spans align in time.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If client and server traces do not share &lt;code&gt;trace_id&lt;/code&gt;, propagation is broken—often stdout pollution on stdio, a second TracerProvider, or missing &lt;code&gt;instrument_mcp()&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reference demo (optional)
&lt;/h2&gt;

&lt;p&gt;The linked repository is a &lt;strong&gt;toy vertical slice&lt;/strong&gt;, not a product architecture: CrewAI + Ollama on the host, FastMCP on stdio, three tools, artificial slow path for visible latency in Jaeger. It exists to make the abstract pattern reproducible in one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; poetry &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; ./scripts/demo.sh
&lt;span class="c"&gt;# UI: http://localhost:16686&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use it to validate your exporter and propagation; replace frameworks and tools with your own stack—the observability model stays the same.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Tracing describes &lt;strong&gt;what ran&lt;/strong&gt;, not whether outputs were &lt;strong&gt;correct&lt;/strong&gt; (evals still required).
&lt;/li&gt;
&lt;li&gt;High-cardinality prompt content may need redaction (&lt;code&gt;TRACELOOP_TRACE_CONTENT=false&lt;/code&gt; or equivalent).
&lt;/li&gt;
&lt;li&gt;HTTP MCP adds auth, TLS, and tenancy concerns stdio demos elide.
&lt;/li&gt;
&lt;li&gt;Span cardinality from framework + model + MCP + manual spans can be heavy; sampling may be necessary at scale.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Application observability matured around &lt;strong&gt;request-scoped, deterministic graphs&lt;/strong&gt;. LLM agents introduce &lt;strong&gt;stochastic loops with external actions&lt;/strong&gt;. MCP standardizes those actions as &lt;strong&gt;protocol-level calls across services&lt;/strong&gt;, and SEP-414 carries &lt;strong&gt;trace context&lt;/strong&gt; across that boundary so existing OpenTelemetry pipelines apply. The engineering work is mostly wiring: one telemetry setup per process, MCP instrumentation, correct stdio discipline, and a backend that can display host and server spans under one id.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Code and video: &lt;a href="https://github.com/ekb-dev-ai/mcp-trace-demo" rel="noopener noreferrer"&gt;mcp-trace-demo&lt;/a&gt;. Comments on production patterns for MCP tracing welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>promptengineering</category>
      <category>mcp</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
