AI agents are opaque. Jaeger v2 + OTel GenAI conventions are the fix.

#ai #observability #opentelemetry #devops

AI agents are distributed systems. They fan out across LLM calls, tool invocations, memory lookups, and multi-step reasoning loops — often asynchronously. But until recently, the observability tooling hadn't caught up. You'd get logs, maybe a dashboard, but no trace of what actually happened across a full agent run.

That's the gap Jaeger v2 is positioned to close — and it's not a stretch.

What actually changed in Jaeger v2

Jaeger v2, released in late 2024, didn't just add features. It replaced its entire internal architecture with the OpenTelemetry Collector framework as the core foundation.

What that means in practice:

Native OTLP ingestion. No more translation layer from OTLP → Jaeger internal format. Telemetry flows in as-is, with no data loss from conversion.
Single binary, OTel-native config. The old jaeger-agent, jaeger-collector, jaeger-ingester, jaeger-query split is gone. One binary, configured via the same YAML model as OTel Collector.
Access to the full OTel Collector ecosystem. Tail-based sampling, span-to-metric connectors, PII filtering processors, Kafka pipelines — all available without Jaeger maintaining separate implementations.
Tail-based sampling, previously hard to retrofit, is now first-class via the upstream OTel contrib processor.

The architecture shift means Jaeger v2 inherits everything OTel ships — including the new GenAI semantic conventions.

The GenAI conventions: tracing AI agents properly

OpenTelemetry is now actively developing semantic conventions specifically for AI workloads. These define how to represent:

Model spans — individual LLM inference calls (token counts, model name, latency)
Agent spans — the higher-level reasoning loops and orchestration steps
Events — prompt inputs, completions, tool call results
Metrics — token usage, latency distributions, error rates

And coverage is already provider-specific: OpenAI, Anthropic, AWS Bedrock, and Azure AI Inference all have dedicated conventions. There's even a draft for Model Context Protocol (MCP) — so tool calls via MCP-compatible servers can be traced as first-class spans.

These conventions are still in Development status, but the instrumentation is shipping now. Libraries like LangChain, LlamaIndex, and OpenAI's own SDKs are beginning to emit OTel-compatible telemetry. Jaeger v2 — being natively OTLP — can receive all of it.

Why this matters for teams building agents

The classic distributed tracing use case is: trace a request across microservices, find the slow hop, fix it. The AI agent version is: trace a user prompt → agent planning span → LLM call → tool invocation → second LLM call → final response. Across potentially different services, with retries, branching, and non-determinism.

Without proper trace context propagation, this is a black box. With OTel GenAI conventions + Jaeger v2, you get the full picture — latency per LLM call, token consumption, which tool calls fired and how long they took, where the reasoning went sideways.

That's debugging capability that didn't exist in a standardised form until now.

What to do

Already on Jaeger v1? Check the v1→v2 migration guide. The architecture shift is real but the storage backends are backward-compatible.
Building AI agents? Start instrumenting with OTel GenAI semconv now, even in Development status. You'll be ahead of the curve when it stabilises, and Jaeger v2 will ingest it today.
Using LangChain/LlamaIndex/OpenAI SDKs? Check their OTel instrumentation status — several already support it or have experimental packages.
Not on Jaeger? The GenAI conventions are backend-agnostic. Any OTLP-compatible backend (Grafana Tempo, Honeycomb, etc.) can receive this telemetry.

Sources: The New Stack · Jaeger v2 release post · OTel GenAI semantic conventions

✏️ Drafted with KewBot (AI), edited and approved by Drew.

Top comments (1)

ArkForge • Apr 30

The MCP draft conventions are worth watching most closely - once tool calls are first-class OTel spans, the full agent action graph becomes traceable end-to-end. The gap that remains: Jaeger traces live inside your own infrastructure, controlled by the same stack that generated them. For external accountability requirements (EU AI Act Article 12, DORA Article 11), a regulator or counterparty can't independently verify your traces - the operator could have modified them after the fact. What's missing alongside observability is call-level attestation: independent cryptographic signing and public log anchoring, outside your trace pipeline, verifiable by third parties without trusting your infrastructure. Traces answer "what happened?" for your team; attestation answers "what happened?" for everyone else.