Jason Shotwell

Posted on Feb 21

How to Add a Kill Switch to Your AI Agent in 5 Minutes

#tutorial #ai #opensource #python

Your AI agent is running in production. It's calling APIs, making decisions, spending money. And right now, if it goes sideways — stuck in a loop, hallucinating tool calls, burning through your API budget — your only option is to manually kill the process and hope nothing broke.

This tutorial adds a real kill switch. Five minutes, no code changes to your agent.

What we're building

A reverse proxy that sits between your agent and the LLM provider. Every request flows through it. You define policies in YAML — and when a policy triggers, the request gets blocked before it ever reaches the model.

By the end of this tutorial, your agent will automatically shut down if it:

Exceeds a token budget
Makes too many requests in a time window (loop detection)
Tries to call a restricted tool
Hits a risk threshold you define

Prerequisites

Docker and Docker Compose installed
An OpenAI API key (or any OpenAI-compatible provider)
An AI agent that uses the OpenAI API format

Step 1: Clone and start the stack

git clone https://github.com/airblackbox/air-platform.git
cd air-platform
cp .env.example .env

Open .env and add your API key:

OPENAI_API_KEY=sk-your-key-here

Start everything:

make up

Six services start in about 8 seconds. The important one right now is the Gateway running on localhost:8080.

Step 2: Point your agent at the Gateway

This is the only change you make to your agent code. Instead of calling OpenAI directly, point your base_url at the Gateway.

Python (OpenAI SDK):

from openai import OpenAI

# Before — calls OpenAI directly
# client = OpenAI()

# After — calls through AIR Blackbox Gateway
client = OpenAI(base_url="http://localhost:8080/v1")

LangChain:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    openai_api_base="http://localhost:8080/v1",
    model="gpt-4o"
)

CrewAI:

import os
os.environ["OPENAI_API_BASE"] = "http://localhost:8080/v1"
# CrewAI picks it up automatically

curl:

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

That's it. Your agent works exactly the same — but now every call flows through the Gateway.

Step 3: Define your kill switch policies

Open config/policies.yaml. This is where you define the rules. Here's a starter policy that covers the most common failure modes:

policies:
  # Kill switch: stop runaway loops
  - name: loop-detector
    description: "Kill agent if it makes more than 50 requests in 60 seconds"
    trigger:
      type: rate-limit
      max_requests: 50
      window_seconds: 60
    action: block
    alert: true

  # Kill switch: budget cap
  - name: budget-cap
    description: "Kill agent if it exceeds $5 in token spend"
    trigger:
      type: cost-limit
      max_cost_usd: 5.00
    action: block
    alert: true

  # Kill switch: restrict dangerous tools
  - name: tool-restriction
    description: "Block agent from executing shell commands"
    trigger:
      type: tool-call
      blocked_tools:
        - "execute_command"
        - "run_shell"
        - "delete_file"
    action: block
    alert: true

  # Risk tiers: require human approval for high-risk actions
  - name: high-risk-gate
    description: "Flag requests that involve payments or external APIs"
    trigger:
      type: content-match
      patterns:
        - "payment"
        - "transfer"
        - "external_api"
    action: flag
    risk_tier: critical

Save the file. The Policy Engine picks up changes automatically — no restart needed.

Step 4: Test the kill switch

Let's trigger the loop detector intentionally. Run this script:

from openai import OpenAI
import time

client = OpenAI(base_url="http://localhost:8080/v1")

# Simulate a runaway agent — rapid repeated calls
for i in range(60):
    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": f"Request {i}"}]
        )
        print(f"Request {i}: OK")
    except Exception as e:
        print(f"Request {i}: BLOCKED — {e}")
        break
    time.sleep(0.5)

You should see requests go through normally, then get blocked once the rate limit triggers. The agent is stopped. Your budget is safe.

Step 5: See what happened

Open Jaeger to see the full trace of every request your agent made:

http://localhost:16686

Open Prometheus for cost and request metrics:

http://localhost:9091

Open the Episode Store API to replay the full sequence:

http://localhost:8081/episodes

Every request is recorded with the full context — what the agent sent, what it received, how long it took, how much it cost. If something went wrong, you can replay the entire episode step by step.

What you just built

In about 5 minutes, you added:

Loop detection — runaway agents get killed automatically
Budget caps — no more surprise API bills
Tool restrictions — agents can't call dangerous functions
Risk tiers — high-risk actions get flagged for human review
Full audit trail — every decision recorded and replayable

And you didn't change a single line of your agent's core logic. The kill switch lives in the infrastructure layer, where it belongs.

Going further

Custom policies: The Policy Engine supports YAML-based rules for any pattern you need. You can block specific models, restrict token counts per request, require human approval for specific tool calls, and more.

Framework plugins: If you want deeper integration, there are trust plugins for CrewAI, LangChain, AutoGen, and OpenAI Agents SDK that add trust scoring and policy enforcement at the framework level.

MCP Security: If you're using Model Context Protocol, the MCP Security Scanner audits your MCP server configs and the MCP Policy Gateway adds policy enforcement to MCP tool calls.

The full platform is open source under Apache 2.0: github.com/airblackbox

AIR Blackbox is a flight recorder for AI agents — record every decision, replay every incident, enforce every policy. If your agents are making decisions in production, they need a black box.

DEV Community