ANKUSH CHOUDHARY JOHAL

Posted on May 8 • Originally published at johal.in

Deep Dive into OpenTelemetry 1.20's New AI Observability Features and How to Use Them With Vector 0.40

#deep #dive #into #opentelemetry

Deep Dive into OpenTelemetry 1.20's New AI Observability Features and How to Use Them With Vector 0.40

AI workloads, from large language models (LLMs) to computer vision pipelines, introduce unique observability challenges: tracking token usage, model latency, prompt/response quality, and inference errors requires specialized telemetry. OpenTelemetry (OTel) 1.20 addresses these gaps with experimental AI/ML semantic conventions, while Vector 0.40 adds native support for these new signals, enabling end-to-end observability for AI systems. This guide walks through the new features and a step-by-step integration.

What's New in OpenTelemetry 1.20 for AI Observability

OTel 1.20's standout addition for AI workloads is the initial release of generative AI semantic conventions, part of the experimental gen_ai namespace. These conventions standardize how AI-related telemetry is captured across languages and frameworks, eliminating vendor lock-in. Key additions include:

Trace attributes for LLM requests/responses: Attributes like gen_ai.system (e.g., "openai", "anthropic"), gen_ai.request.model (e.g., "gpt-4"), gen_ai.request.input_tokens, gen_ai.response.output_tokens, and gen_ai.response.error let you track per-request AI resource usage and failures.
Metrics for AI workloads: New metric instruments such as gen_ai.server.requests (count of inference requests), gen_ai.server.latency (request duration), and gen_ai.server.tokens.total (aggregate input + output tokens) provide out-of-the-box visibility into AI system performance.
Updated SDK support: All core OTel SDKs (Java, Python, Go, JavaScript) now include experimental helpers to populate gen_ai attributes automatically when instrumenting popular AI libraries like LangChain, OpenAI's Python SDK, and Hugging Face Transformers.

Vector 0.40: Enhanced OTel and AI Signal Support

Vector 0.40 builds on its existing OpenTelemetry integration with two critical updates for AI observability:

Native OTLP 1.20 compatibility: Vector's OTLP source and sink now support the latest OTLP spec version used in OTel 1.20, including full parsing of gen_ai semantic convention attributes.
New AI-specific transforms: The remap transform in Vector 0.40 includes prebuilt functions to extract, filter, and enrich gen_ai telemetry, such as calculating cost per request using token counts and model pricing, or redacting sensitive prompt data.

Step-by-Step Integration Guide

We'll use a simple Python LLM app instrumented with OTel 1.20, an OTel Collector to batch and export telemetry, and Vector 0.40 to process and route signals to backends. Prerequisites include:

Python 3.9+ installed
Docker (for running OTel Collector and Vector)
An OpenAI API key (or local LLM endpoint) for the sample app

1. Instrument the AI App with OTel 1.20

First, install the required Python packages:

pip install opentelemetry-sdk opentelemetry-exporter-otlp opentelemetry-instrumentation-openai

Next, create a sample app that calls an LLM and populates gen_ai attributes:

import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
from openai import OpenAI

# Initialize OTel tracer
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="localhost:4317", insecure=True))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

# Instrument OpenAI SDK (auto-populates gen_ai attributes)
OpenAIInstrumentor().instrument()

# Sample LLM call
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("llm_chat_request"):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Explain OpenTelemetry in 2 sentences."}]
    )
    print(response.choices[0].message.content)

The OpenAIInstrumentor in OTel 1.20 automatically adds gen_ai attributes like gen_ai.request.model and gen_ai.response.output_tokens to the trace span.

2. Configure OpenTelemetry Collector

Create an OTel Collector config (otel-collector-config.yaml) to receive telemetry from the app and export it to Vector via OTLP:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

exporters:
  otlp/vector:
    endpoint: vector:4317
    insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp/vector]
    metrics:
      receivers: [otlp]
      exporters: [otlp/vector]

Run the Collector via Docker:

docker run -d --name otel-collector -p 4317:4317 -p 4318:4318 -v $(pwd)/otel-collector-config.yaml:/etc/otel-collector-config.yaml otel/opentelemetry-collector:1.20.0 --config /etc/otel-collector-config.yaml

3. Configure Vector 0.40

Create a Vector config (vector-config.yaml) to receive OTLP telemetry from the Collector, process gen_ai attributes, and export to a Jaeger instance for traces and Prometheus for metrics:

sources:
  otel:
    type: otlp
    address: 0.0.0.0:4317
    protocol: grpc

transforms:
  enrich_ai_telemetry:
    type: remap
    inputs: [otel]
    source: |
      # Add cost estimate for OpenAI requests
      if .gen_ai.system == "openai" && .gen_ai.request.model == "gpt-3.5-turbo" {
        .gen_ai.estimated_cost_usd = (.gen_ai.response.input_tokens * 0.0000015) + (.gen_ai.response.output_tokens * 0.000002)
      }

sinks:
  jaeger:
    type: jaeger
    inputs: [enrich_ai_telemetry]
    endpoint: jaeger:14250
    protocol: grpc
  prometheus:
    type: prometheus
    inputs: [enrich_ai_telemetry]
    address: 0.0.0.0:9090

Run Vector 0.40 via Docker:

docker run -d --name vector -p 4317:4317 -p 9090:9090 -v $(pwd)/vector-config.yaml:/etc/vector/vector.yaml vector:0.40.0 --config /etc/vector/vector.yaml

4. Validate the Pipeline

Run the sample Python app, then check Jaeger (http://localhost:16686) for traces with gen_ai attributes, and Prometheus (http://localhost:9090) for metrics like gen_ai_server_requests_total. You should see the custom gen_ai_estimated_cost_usd attribute in Vector-enriched spans.

Best Practices for AI Observability with OTel and Vector

Sample high-volume AI telemetry: LLM requests can generate high trace volume; use OTel's probabilistic sampling or Vector's sampler transform to reduce costs.
Redact sensitive data: Use Vector's remap transform to strip sensitive prompt/response content from spans before exporting to backends.
Align metrics with business KPIs: Use Vector to aggregate gen_ai token metrics into cost per user, per model, or per feature to tie technical telemetry to business outcomes.

Conclusion

OpenTelemetry 1.20's new AI semantic conventions and Vector 0.40's enhanced processing capabilities remove the guesswork from AI observability. By standardizing telemetry capture and enabling flexible pipeline configuration, teams can monitor LLM performance, control costs, and troubleshoot issues across their entire AI stack. As the gen_ai conventions move from experimental to stable, this integration will become a core part of any production AI system's observability strategy.

DEV Community

Deep Dive into OpenTelemetry 1.20's New AI Observability Features and How to Use Them With Vector 0.40

Deep Dive into OpenTelemetry 1.20's New AI Observability Features and How to Use Them With Vector 0.40

What's New in OpenTelemetry 1.20 for AI Observability

Vector 0.40: Enhanced OTel and AI Signal Support

Step-by-Step Integration Guide

1. Instrument the AI App with OTel 1.20

2. Configure OpenTelemetry Collector

3. Configure Vector 0.40

4. Validate the Pipeline

Best Practices for AI Observability with OTel and Vector

Conclusion

Top comments (0)