DEV Community

Jason Shotwell
Jason Shotwell

Posted on

I Built a Flight Recorder for AI Agents — Here's What It Catches

AIR Blackbox is an open-source observability layer that records, replays, and enforces policies on every LLM call your AI agents make. Try the live demo.

Your AI agent just sent an email, called an API, or moved money. Someone asks: "Show me exactly what it saw and why it made that decision."

Can you answer that today?

If you're running AI agents in production — LangChain chains, CrewAI crews, OpenAI function calling, AutoGen teams — you've probably hit this wall. The agent did something. Logs show fragments. Token spend spiked. But there's no complete record of the full decision chain.

We built AIR Blackbox to fix that. It's the flight recorder for autonomous AI agents — like the black box on an airplane, but for your LLM calls.

Try it right now (no install needed): Live Interactive Demo


What Does the Demo Show?

The hosted demo lets you run four scenarios and watch what happens inside the AIR Blackbox pipeline in real time:

Scenario 1: Normal Request

A standard agent request flows through the system. You'll see:

  • The Gateway intercepts the call and assigns a trace ID
  • The Policy Engine evaluates it (rate limit, budget cap, tool restrictions)
  • The LLM responds
  • The OTel Collector captures cost, latency, and PII scan results
  • The Episode Store records the full interaction as a replayable episode

Everything green. This is the happy path.

Scenario 2: Runaway Loop 🔴

This is the one that saves you money. An agent gets stuck making the same request over and over — "Check order status #4521" five times in a row.

The OTel Collector detects the repeated pattern at request 3. By request 4, it triggers the kill switch. Request 5 gets blocked before it ever reaches the LLM.

Estimated savings in the demo: $47. In production, we've seen runaway agents burn through hundreds of dollars in minutes.

Scenario 3: PII in Prompt

An agent sends a prompt containing an email address, a Social Security number, and a credit card number. This happens more often than you'd think — agents pulling data from databases or CRMs and stuffing it into prompts.

The OTel Collector detects all three PII fields and redacts them before the trace reaches your observability backend. The redacted fields are hashed so you can still correlate across traces without storing raw sensitive data.

Scenario 4: Dangerous Tool Call

An agent tries to execute rm -rf /var/data/* via a tool call. The Policy Engine blocks it instantly — the tool is on the restricted list and it's a destructive filesystem operation. The request never reaches the LLM. Cost: $0.00.


How It Works

AIR Blackbox sits between your AI agent and the LLM provider as an OpenAI-compatible proxy:

Your Agent → Gateway → Policy Engine → LLM Provider
                ↓              ↓
          OTel Collector   Episode Store
Enter fullscreen mode Exit fullscreen mode

The Gateway (Go) intercepts every LLM call and produces structured OpenTelemetry traces. It's OpenAI-compatible, so your agents don't need code changes — just point them at a different base URL.

The Policy Engine (Python/FastAPI) evaluates every request against your rules in real time. Rate limits, budget caps, tool restrictions, content matching — all configurable.

The OTel Collector runs custom processors for:

  • PII redaction — scrub sensitive data before it hits your trace backend
  • Semantic normalization — consistent attribute names across providers
  • Cost tracking — per-request and cumulative spend
  • Loop detection — catch runaway agents before they drain your budget

The Episode Store (Python/FastAPI) groups raw traces into task-level episodes. Think of it like a DVR for your agent — you can rewind and replay exactly what happened during an incident.


Get Running in 5 Minutes

git clone https://github.com/airblackbox/gateway.git
cd gateway
cp .env.example .env   # add your OPENAI_API_KEY
docker compose up --build
Enter fullscreen mode Exit fullscreen mode
pip install air-blackbox-sdk
Enter fullscreen mode Exit fullscreen mode
from openai import OpenAI
import air

client = air.air_wrap(OpenAI())

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Summarize Q4 revenue"}],
)
# Every call is now recorded with a full audit trail
Enter fullscreen mode Exit fullscreen mode

That's it. One wrapper function. Your agent code stays the same.


Framework Integrations

AIR Blackbox works with whatever you're already using:

# LangChain
from air.integrations.langchain import air_langchain_llm
llm = air_langchain_llm("gpt-4o-mini")

# CrewAI
from air.integrations.crewai import air_crewai_llm
llm = air_crewai_llm("gpt-4o-mini")

# OpenAI Agents SDK
from air.integrations.openai_agents import air_openai_agents_provider
provider = air_openai_agents_provider()

# AutoGen
from air.integrations.autogen import air_autogen_config
config = air_autogen_config("gpt-4o-mini")
Enter fullscreen mode Exit fullscreen mode

What It's Not

AIR Blackbox is not an agent framework. It doesn't build agents, orchestrate tasks, or manage prompts. It's infrastructure — the observability and governance layer for teams that already have agents running and need to answer:

  • What did the agent do?
  • Why did it make that decision?
  • Did it leak any sensitive data?
  • How much did it cost?
  • Can I replay the incident?

The Stack

Component Language What It Does
Gateway Go OpenAI-compatible proxy, OTel trace emission
Policy Engine Python Real-time policy evaluation, kill switches
Episode Store Python Trace → episode grouping, replay
OTel Collector Go PII redaction, cost metrics, loop detection
Python SDK Python air_wrap() + framework integrations
Platform Docker One-command full stack deployment

22 repos. 700+ tests. CI on every push. Apache-2.0.


Try the Demo

Launch the interactive demo →

Run all four scenarios. Watch the traces light up. Then clone the repo and try it with your own agents.

If you have questions about the architecture, the OTel pipeline, or how to write custom policies, drop a comment or open a discussion on GitHub.


AIR Blackbox is open-source under Apache 2.0. Star us on GitHub if this is useful to you.

Top comments (0)