AI is powerful. But why does observing it matter?
AI agents are becoming more and more common in production. However, they often behave like black boxes: we send a prompt, we get a response… but what happens in between?
In this post, I’ll show why instrumenting AI agents matters more than ever, using open source tools like OpenTelemetry and the Grafana LGTM stack — that is, Loki, Grafana, Tempo, and Mimir (and yes, also “Looks Good To Me”! 😉).
The challenge
LLMs and AI agents are 'black boxes' systems:
- Complex, non-deterministic behavior
- Internal 'reasoning' is invisible at runtime
- Hard to explain, trust, or debug outputs
🛡️ “You can’t govern or secure what you can’t observe.” (semicit.)
An open standards-based approach
The OWASP Agent Observability Standard (AOS)
OWASP AOS wants to bring standardized observability to AI systems:
Mission: Transform AI agents from black boxes into trustworthy systems through standardized observability.
- Instrumentation – via OpenTelemetry
- Traceability – end-to-end execution flow
- Inspectability – insights on inner workings
OpenTelemetry
OpenTelemetry is a CNCF project that provides a standardized framework for telemetry:
- Logs (timestamped events)
- Metrics (numerical KPIs)
- Traces (end-to-end execution paths)
It supports manual and auto instrumentation, and works with many backends — including the Grafana LGTM stack.
💻 Manual instrumentation (code based)
from openai import OpenAI
from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor
OpenAIInstrumentor().instrument()
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "user",
"content": "Write a short poem on OpenTelemetry."
}
],
)
⚡ Auto instrumentation (zero code)
Alternatively, you can automatically instrument a Python app with:
FROM python:3.13-alpine
ENV OTEL_EXPORTER_OTLP_ENDPOINT=otel-collector:4317
RUN pip install --no-cache-dir \
fastapi \
opentelemetry-distro \
opentelemetry-instrumentation
COPY main.py .
CMD ["opentelemetry-instrument", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
📦 The OpenTelemetry Collector
A key component for managing telemetry data
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
hostmetrics: # otel-collector-contrib
root_path: "/hostfs"
collection_interval: 10s
processors:
batch:
resourcedetection:
detectors: ["env", "system"]
exporters:
otlp/traces:
endpoint: tempo:4317
otlphttp/metrics:
endpoint: http://mimir:9009/otlp
otlphttp/logs:
endpoint: http://loki:3100/otlp
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, resourcedetection]
exporters: [otlp/traces]
metrics:
receivers: [otlp, hostmetrics]
processors: [batch, resourcedetection]
exporters: [otlphttp/metrics]
logs:
receivers: [otlp]
processors: [batch, resourcedetection]
exporters: [otlphttp/logs]
Observability with OpenTelemetry and Grafana LGTM
Once data flows through the collector, the Grafana LGTM stack lets you explore:
- Logs in Loki
- Metrics in Mimir
- Traces in Tempo
PydanticAI agent instrumentation
PydanticAI is a Pythonic AI agent framework with native OpenTelemetry support
from pydantic_ai import Agent
agent = Agent(
'google-gla:gemini-2.0-flash',
system_prompt='Be concise, reply with one sentence.',
instrument=True,
)
result = agent.run_sync('Where does "hello world" come from?')
print(result.output)
📈 Inspect telemetry data
🤔 Final thoughts & takeaways
Even the simplest “hello world” AI agent can be observed via metrics and traces, revealing how complex and resource-hungry LLM calls are,
benefit from structured observability for debugging and trust building.
- No business, No logic The AI agent is kept intentionally simple — a basic “hello world” — because the primary goal is to validate the power of observability in the context of agentic AI, not to showcase advanced intelligence.
- Analytical and empirical evaluation With OpenTelemetry and Grafana, it's possible to analyze the agent both quantitatively (through metrics and traces) and qualitatively (by observing its flow), demonstrating the real value of observability in monitoring and understanding AI systems.
- Conscious use of AI – Sometimes “a cannon is too much for a fly” Even a minimal AI agent benefits greatly from structured observability — it’s essential for understanding, debugging, and building trust. At the same time, it reveals a key insight: generating trivial output can come with a high computational cost, especially when relying on LLMs.
Top comments (0)