Your AI agent is running in production. It's calling APIs, making decisions, spending money. And right now, if it goes sideways — stuck in a loop, hallucinating tool calls, burning through your API budget — your only option is to manually kill the process and hope nothing broke.
This tutorial adds a real kill switch. Five minutes, no code changes to your agent.
What we're building
A reverse proxy that sits between your agent and the LLM provider. Every request flows through it. You define policies in YAML — and when a policy triggers, the request gets blocked before it ever reaches the model.
By the end of this tutorial, your agent will automatically shut down if it:
- Exceeds a token budget
- Makes too many requests in a time window (loop detection)
- Tries to call a restricted tool
- Hits a risk threshold you define
Prerequisites
- Docker and Docker Compose installed
- An OpenAI API key (or any OpenAI-compatible provider)
- An AI agent that uses the OpenAI API format
Step 1: Clone and start the stack
git clone https://github.com/airblackbox/air-platform.git
cd air-platform
cp .env.example .env
Open .env and add your API key:
OPENAI_API_KEY=sk-your-key-here
Start everything:
make up
Six services start in about 8 seconds. The important one right now is the Gateway running on localhost:8080.
Step 2: Point your agent at the Gateway
This is the only change you make to your agent code. Instead of calling OpenAI directly, point your base_url at the Gateway.
Python (OpenAI SDK):
from openai import OpenAI
# Before — calls OpenAI directly
# client = OpenAI()
# After — calls through AIR Blackbox Gateway
client = OpenAI(base_url="http://localhost:8080/v1")
LangChain:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
openai_api_base="http://localhost:8080/v1",
model="gpt-4o"
)
CrewAI:
import os
os.environ["OPENAI_API_BASE"] = "http://localhost:8080/v1"
# CrewAI picks it up automatically
curl:
curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'
That's it. Your agent works exactly the same — but now every call flows through the Gateway.
Step 3: Define your kill switch policies
Open config/policies.yaml. This is where you define the rules. Here's a starter policy that covers the most common failure modes:
policies:
# Kill switch: stop runaway loops
- name: loop-detector
description: "Kill agent if it makes more than 50 requests in 60 seconds"
trigger:
type: rate-limit
max_requests: 50
window_seconds: 60
action: block
alert: true
# Kill switch: budget cap
- name: budget-cap
description: "Kill agent if it exceeds $5 in token spend"
trigger:
type: cost-limit
max_cost_usd: 5.00
action: block
alert: true
# Kill switch: restrict dangerous tools
- name: tool-restriction
description: "Block agent from executing shell commands"
trigger:
type: tool-call
blocked_tools:
- "execute_command"
- "run_shell"
- "delete_file"
action: block
alert: true
# Risk tiers: require human approval for high-risk actions
- name: high-risk-gate
description: "Flag requests that involve payments or external APIs"
trigger:
type: content-match
patterns:
- "payment"
- "transfer"
- "external_api"
action: flag
risk_tier: critical
Save the file. The Policy Engine picks up changes automatically — no restart needed.
Step 4: Test the kill switch
Let's trigger the loop detector intentionally. Run this script:
from openai import OpenAI
import time
client = OpenAI(base_url="http://localhost:8080/v1")
# Simulate a runaway agent — rapid repeated calls
for i in range(60):
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Request {i}"}]
)
print(f"Request {i}: OK")
except Exception as e:
print(f"Request {i}: BLOCKED — {e}")
break
time.sleep(0.5)
You should see requests go through normally, then get blocked once the rate limit triggers. The agent is stopped. Your budget is safe.
Step 5: See what happened
Open Jaeger to see the full trace of every request your agent made:
http://localhost:16686
Open Prometheus for cost and request metrics:
http://localhost:9091
Open the Episode Store API to replay the full sequence:
http://localhost:8081/episodes
Every request is recorded with the full context — what the agent sent, what it received, how long it took, how much it cost. If something went wrong, you can replay the entire episode step by step.
What you just built
In about 5 minutes, you added:
- Loop detection — runaway agents get killed automatically
- Budget caps — no more surprise API bills
- Tool restrictions — agents can't call dangerous functions
- Risk tiers — high-risk actions get flagged for human review
- Full audit trail — every decision recorded and replayable
And you didn't change a single line of your agent's core logic. The kill switch lives in the infrastructure layer, where it belongs.
Going further
Custom policies: The Policy Engine supports YAML-based rules for any pattern you need. You can block specific models, restrict token counts per request, require human approval for specific tool calls, and more.
Framework plugins: If you want deeper integration, there are trust plugins for CrewAI, LangChain, AutoGen, and OpenAI Agents SDK that add trust scoring and policy enforcement at the framework level.
MCP Security: If you're using Model Context Protocol, the MCP Security Scanner audits your MCP server configs and the MCP Policy Gateway adds policy enforcement to MCP tool calls.
The full platform is open source under Apache 2.0: github.com/airblackbox
AIR Blackbox is a flight recorder for AI agents — record every decision, replay every incident, enforce every policy. If your agents are making decisions in production, they need a black box.
Top comments (0)