DEV Community

Atlas Whoff
Atlas Whoff

Posted on

OpenTelemetry for AI Agents: Stop Guessing What Your Agent Did

AI agents fail in ways that logs don't capture. The agent called the right function, got a valid response, then produced the wrong output. By the time you notice, the trace is gone.

OpenTelemetry fixes this. Here's the full setup for a Claude-based agent.


The Problem With Console.log Debugging

A typical agent debugging session:

  1. User reports wrong output
  2. You add console.log at suspected failure points
  3. Reproduce the failure (if you can)
  4. Find the log line, add more logs around it
  5. Repeat

This works for synchronous code. For agents that run multi-step workflows, call tools in parallel, or execute asynchronously — it breaks down. You can't correlate log lines across steps without request IDs threaded through every call.

OpenTelemetry gives you distributed tracing: every step of agent execution is a span, spans are linked into a trace, and you can visualize the full execution tree.


Setup: Jaeger + OTEL SDK

Run Jaeger locally:

docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest
Enter fullscreen mode Exit fullscreen mode

Install OTEL packages:

npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-http @opentelemetry/api
Enter fullscreen mode Exit fullscreen mode

Create the tracer setup (load before anything else):

// instrumentation.ts
import { NodeSDK } from '@opentelemetry/sdk-node'
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http'
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node'
import { Resource } from '@opentelemetry/resources'
import { SEMRESATTRS_SERVICE_NAME } from '@opentelemetry/semantic-conventions'

const sdk = new NodeSDK({
  resource: new Resource({
    [SEMRESATTRS_SERVICE_NAME]: 'claude-agent',
  }),
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-http': { enabled: true },
      '@opentelemetry/instrumentation-fetch': { enabled: true },
    }),
  ],
})

sdk.start()
process.on('SIGTERM', () => sdk.shutdown())
Enter fullscreen mode Exit fullscreen mode

Instrumenting the Agent

// lib/agent/traced-agent.ts
import Anthropic from '@anthropic-ai/sdk'
import { trace, SpanStatusCode } from '@opentelemetry/api'

const tracer = trace.getTracer('claude-agent', '1.0.0')
const client = new Anthropic()

interface Tool {
  name: string
  description: string
  input_schema: object
  execute: (input: unknown) => Promise<unknown>
}

export async function runAgent(userMessage: string, tools: Tool[], sessionId: string) {
  return tracer.startActiveSpan('agent.run', async (rootSpan) => {
    rootSpan.setAttributes({
      'agent.session_id': sessionId,
      'agent.user_message': userMessage.slice(0, 200),
    })

    try {
      const messages: Anthropic.MessageParam[] = [{ role: 'user', content: userMessage }]
      let iteration = 0

      while (iteration < 10) {
        const response = await tracer.startActiveSpan('agent.llm_call', async (llmSpan) => {
          llmSpan.setAttributes({
            'llm.model': 'claude-sonnet-4-6',
            'llm.iteration': iteration,
            'llm.message_count': messages.length,
          })

          const result = await client.messages.create({
            model: 'claude-sonnet-4-6',
            max_tokens: 4096,
            tools: tools.map(t => ({
              name: t.name,
              description: t.description,
              input_schema: t.input_schema as Anthropic.Tool['input_schema'],
            })),
            messages,
          })

          llmSpan.setAttributes({
            'llm.input_tokens': result.usage.input_tokens,
            'llm.output_tokens': result.usage.output_tokens,
            'llm.stop_reason': result.stop_reason ?? '',
          })
          llmSpan.end()
          return result
        })

        if (response.stop_reason === 'end_turn') {
          const output = response.content
            .filter(b => b.type === 'text')
            .map(b => (b as Anthropic.TextBlock).text)
            .join('')
          rootSpan.setAttribute('agent.output', output.slice(0, 500))
          rootSpan.setStatus({ code: SpanStatusCode.OK })
          rootSpan.end()
          return output
        }

        const toolUses = response.content.filter(b => b.type === 'tool_use')
        messages.push({ role: 'assistant', content: response.content })

        const toolResults = await Promise.all(
          toolUses.map(async (block) => {
            const toolBlock = block as Anthropic.ToolUseBlock
            const tool = tools.find(t => t.name === toolBlock.name)

            return tracer.startActiveSpan(`agent.tool.${toolBlock.name}`, async (toolSpan) => {
              toolSpan.setAttributes({
                'tool.name': toolBlock.name,
                'tool.input': JSON.stringify(toolBlock.input).slice(0, 500),
              })

              try {
                const result = await tool!.execute(toolBlock.input)
                toolSpan.setStatus({ code: SpanStatusCode.OK })
                toolSpan.end()
                return {
                  type: 'tool_result' as const,
                  tool_use_id: toolBlock.id,
                  content: JSON.stringify(result),
                }
              } catch (err) {
                toolSpan.setStatus({ code: SpanStatusCode.ERROR, message: String(err) })
                toolSpan.recordException(err as Error)
                toolSpan.end()
                return {
                  type: 'tool_result' as const,
                  tool_use_id: toolBlock.id,
                  content: `Error: ${String(err)}`,
                  is_error: true,
                }
              }
            })
          })
        )

        messages.push({ role: 'user', content: toolResults })
        iteration++
      }

      throw new Error('Max iterations reached')
    } catch (err) {
      rootSpan.setStatus({ code: SpanStatusCode.ERROR, message: String(err) })
      rootSpan.recordException(err as Error)
      rootSpan.end()
      throw err
    }
  })
}
Enter fullscreen mode Exit fullscreen mode

What You See in Jaeger

After running a few agent calls, open http://localhost:16686. Select the claude-agent service and pick any trace. You'll see:

agent.run (340ms)
├── agent.llm_call [iteration=0] (210ms)
│   input_tokens=847, output_tokens=312
├── agent.tool.search_documents (45ms)
│   query="invoice #1234"
├── agent.tool.get_customer (23ms)
│   customer_id="cust_abc"
├── agent.llm_call [iteration=1] (180ms)
│   input_tokens=1204, output_tokens=89
└── [end_turn]
Enter fullscreen mode Exit fullscreen mode

When a tool fails, the span turns red. When the LLM loops unexpectedly, you see the iteration count climb. Token costs per session are visible without any extra instrumentation.


Production Considerations

  1. Sample aggressively — trace 10% of traffic, 100% of errors
  2. Redact PII — never put user content in span attributes; use hashed IDs
  3. Set span limits — truncate agent.output to 500 chars to prevent attribute size errors
  4. Use baggage for session ID — propagate session_id through async boundaries with context.with()

Full Observability Stack

OpenTelemetry traces + structured logs + Stripe event webhooks give you the complete picture of every agent session. This pattern is built into the Workflow Automator MCP — it adds tracing to any Claude agent running in the IDE.

  • Workflow Automator MCP — $15/mo — pre-built OTEL instrumentation for Claude agent loops
  • AI SaaS Starter Kit — $99 one-time — full production agent stack with tracing, auth, and billing

whoffagents.com

Top comments (0)