Deep Dive into OpenTelemetry 1.20's New AI Observability Features and How to Use Them With Vector 0.40
AI workloads, from large language models (LLMs) to computer vision pipelines, introduce unique observability challenges: tracking token usage, model latency, prompt/response quality, and inference errors requires specialized telemetry. OpenTelemetry (OTel) 1.20 addresses these gaps with experimental AI/ML semantic conventions, while Vector 0.40 adds native support for these new signals, enabling end-to-end observability for AI systems. This guide walks through the new features and a step-by-step integration.
What's New in OpenTelemetry 1.20 for AI Observability
OTel 1.20's standout addition for AI workloads is the initial release of generative AI semantic conventions, part of the experimental gen_ai namespace. These conventions standardize how AI-related telemetry is captured across languages and frameworks, eliminating vendor lock-in. Key additions include:
- Trace attributes for LLM requests/responses: Attributes like
gen_ai.system(e.g., "openai", "anthropic"),gen_ai.request.model(e.g., "gpt-4"),gen_ai.request.input_tokens,gen_ai.response.output_tokens, andgen_ai.response.errorlet you track per-request AI resource usage and failures. - Metrics for AI workloads: New metric instruments such as
gen_ai.server.requests(count of inference requests),gen_ai.server.latency(request duration), andgen_ai.server.tokens.total(aggregate input + output tokens) provide out-of-the-box visibility into AI system performance. - Updated SDK support: All core OTel SDKs (Java, Python, Go, JavaScript) now include experimental helpers to populate gen_ai attributes automatically when instrumenting popular AI libraries like LangChain, OpenAI's Python SDK, and Hugging Face Transformers.
Vector 0.40: Enhanced OTel and AI Signal Support
Vector 0.40 builds on its existing OpenTelemetry integration with two critical updates for AI observability:
- Native OTLP 1.20 compatibility: Vector's OTLP source and sink now support the latest OTLP spec version used in OTel 1.20, including full parsing of gen_ai semantic convention attributes.
- New AI-specific transforms: The
remaptransform in Vector 0.40 includes prebuilt functions to extract, filter, and enrich gen_ai telemetry, such as calculating cost per request using token counts and model pricing, or redacting sensitive prompt data.
Step-by-Step Integration Guide
We'll use a simple Python LLM app instrumented with OTel 1.20, an OTel Collector to batch and export telemetry, and Vector 0.40 to process and route signals to backends. Prerequisites include:
- Python 3.9+ installed
- Docker (for running OTel Collector and Vector)
- An OpenAI API key (or local LLM endpoint) for the sample app
1. Instrument the AI App with OTel 1.20
First, install the required Python packages:
pip install opentelemetry-sdk opentelemetry-exporter-otlp opentelemetry-instrumentation-openai
Next, create a sample app that calls an LLM and populates gen_ai attributes:
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from openai import OpenAI
# Initialize OTel tracer
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="localhost:4317", insecure=True))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
# Instrument OpenAI SDK (auto-populates gen_ai attributes)
OpenAIInstrumentor().instrument()
# Sample LLM call
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("llm_chat_request"):
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Explain OpenTelemetry in 2 sentences."}]
)
print(response.choices[0].message.content)
The OpenAIInstrumentor in OTel 1.20 automatically adds gen_ai attributes like gen_ai.request.model and gen_ai.response.output_tokens to the trace span.
2. Configure OpenTelemetry Collector
Create an OTel Collector config (otel-collector-config.yaml) to receive telemetry from the app and export it to Vector via OTLP:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
exporters:
otlp/vector:
endpoint: vector:4317
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp/vector]
metrics:
receivers: [otlp]
exporters: [otlp/vector]
Run the Collector via Docker:
docker run -d --name otel-collector -p 4317:4317 -p 4318:4318 -v $(pwd)/otel-collector-config.yaml:/etc/otel-collector-config.yaml otel/opentelemetry-collector:1.20.0 --config /etc/otel-collector-config.yaml
3. Configure Vector 0.40
Create a Vector config (vector-config.yaml) to receive OTLP telemetry from the Collector, process gen_ai attributes, and export to a Jaeger instance for traces and Prometheus for metrics:
sources:
otel:
type: otlp
address: 0.0.0.0:4317
protocol: grpc
transforms:
enrich_ai_telemetry:
type: remap
inputs: [otel]
source: |
# Add cost estimate for OpenAI requests
if .gen_ai.system == "openai" && .gen_ai.request.model == "gpt-3.5-turbo" {
.gen_ai.estimated_cost_usd = (.gen_ai.response.input_tokens * 0.0000015) + (.gen_ai.response.output_tokens * 0.000002)
}
sinks:
jaeger:
type: jaeger
inputs: [enrich_ai_telemetry]
endpoint: jaeger:14250
protocol: grpc
prometheus:
type: prometheus
inputs: [enrich_ai_telemetry]
address: 0.0.0.0:9090
Run Vector 0.40 via Docker:
docker run -d --name vector -p 4317:4317 -p 9090:9090 -v $(pwd)/vector-config.yaml:/etc/vector/vector.yaml vector:0.40.0 --config /etc/vector/vector.yaml
4. Validate the Pipeline
Run the sample Python app, then check Jaeger (http://localhost:16686) for traces with gen_ai attributes, and Prometheus (http://localhost:9090) for metrics like gen_ai_server_requests_total. You should see the custom gen_ai_estimated_cost_usd attribute in Vector-enriched spans.
Best Practices for AI Observability with OTel and Vector
- Sample high-volume AI telemetry: LLM requests can generate high trace volume; use OTel's probabilistic sampling or Vector's
samplertransform to reduce costs. - Redact sensitive data: Use Vector's
remaptransform to strip sensitive prompt/response content from spans before exporting to backends. - Align metrics with business KPIs: Use Vector to aggregate gen_ai token metrics into cost per user, per model, or per feature to tie technical telemetry to business outcomes.
Conclusion
OpenTelemetry 1.20's new AI semantic conventions and Vector 0.40's enhanced processing capabilities remove the guesswork from AI observability. By standardizing telemetry capture and enabling flexible pipeline configuration, teams can monitor LLM performance, control costs, and troubleshoot issues across their entire AI stack. As the gen_ai conventions move from experimental to stable, this integration will become a core part of any production AI system's observability strategy.
Top comments (0)