Matteo Vitali

Posted on Jul 13

AI agent observability with OpenTelemetry and Grafana LGTM

#aiops #observability #opentelemetry #grafana

AI is powerful. But why does observing it matter?

AI agents are becoming more and more common in production. However, they often behave like black boxes: we send a prompt, we get a response… but what happens in between?

In this post, I’ll show why instrumenting AI agents matters more than ever, using open source tools like OpenTelemetry and the Grafana LGTM stack — that is, Loki, Grafana, Tempo, and Mimir (and yes, also “Looks Good To Me”! 😉).

The challenge

LLMs and AI agents are 'black boxes' systems:

Complex, non-deterministic behavior
Internal 'reasoning' is invisible at runtime
Hard to explain, trust, or debug outputs

🛡️ “You can’t govern or secure what you can’t observe.” (semicit.)

An open standards-based approach

The OWASP Agent Observability Standard (AOS)

OWASP AOS wants to bring standardized observability to AI systems:
Mission: Transform AI agents from black boxes into trustworthy systems through standardized observability.

Instrumentation – via OpenTelemetry
Traceability – end-to-end execution flow
Inspectability – insights on inner workings

OpenTelemetry

OpenTelemetry is a CNCF project that provides a standardized framework for telemetry:

Logs (timestamped events)
Metrics (numerical KPIs)
Traces (end-to-end execution paths)

It supports manual and auto instrumentation, and works with many backends — including the Grafana LGTM stack.

💻 Manual instrumentation (code based)

from openai import OpenAI
from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor

OpenAIInstrumentor().instrument()

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
      {
        "role": "user", 
        "content": "Write a short poem on OpenTelemetry."
      }
    ],
)

⚡ Auto instrumentation (zero code)

Alternatively, you can automatically instrument a Python app with:

FROM python:3.13-alpine

ENV OTEL_EXPORTER_OTLP_ENDPOINT=otel-collector:4317

RUN pip install --no-cache-dir \
  fastapi \
  opentelemetry-distro \
  opentelemetry-instrumentation

COPY main.py .

CMD ["opentelemetry-instrument", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

📦 The OpenTelemetry Collector

A key component for managing telemetry data

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  hostmetrics:  # otel-collector-contrib
    root_path: "/hostfs"
    collection_interval: 10s

processors:
  batch:
  resourcedetection:
    detectors: ["env", "system"]

exporters:
  otlp/traces:
    endpoint: tempo:4317
  otlphttp/metrics:
    endpoint: http://mimir:9009/otlp
  otlphttp/logs:
    endpoint: http://loki:3100/otlp

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, resourcedetection]
      exporters: [otlp/traces]
    metrics:
      receivers: [otlp, hostmetrics]
      processors: [batch, resourcedetection]
      exporters: [otlphttp/metrics]
    logs:
      receivers: [otlp]
      processors: [batch, resourcedetection]
      exporters: [otlphttp/logs]

Observability with OpenTelemetry and Grafana LGTM

Once data flows through the collector, the Grafana LGTM stack lets you explore:

Logs in Loki
Metrics in Mimir
Traces in Tempo

PydanticAI agent instrumentation

PydanticAI is a Pythonic AI agent framework with native OpenTelemetry support

from pydantic_ai import Agent

agent = Agent(
    'google-gla:gemini-2.0-flash',
    system_prompt='Be concise, reply with one sentence.',
    instrument=True,
)

result = agent.run_sync('Where does "hello world" come from?')
print(result.output)

📈 Inspect telemetry data

🤔 Final thoughts & takeaways

Even the simplest “hello world” AI agent can be observed via metrics and traces, revealing how complex and resource-hungry LLM calls are,
benefit from structured observability for debugging and trust building.

No business, No logic The AI agent is kept intentionally simple — a basic “hello world” — because the primary goal is to validate the power of observability in the context of agentic AI, not to showcase advanced intelligence.
Analytical and empirical evaluation With OpenTelemetry and Grafana, it's possible to analyze the agent both quantitatively (through metrics and traces) and qualitatively (by observing its flow), demonstrating the real value of observability in monitoring and understanding AI systems.
Conscious use of AI – Sometimes “a cannon is too much for a fly” Even a minimal AI agent benefits greatly from structured observability — it’s essential for understanding, debugging, and building trust. At the same time, it reveals a key insight: generating trivial output can come with a high computational cost, especially when relying on LLMs.

DEV Community