Cohorte

Posted on Apr 30 • Originally published at cohorte.co

Your AI Agent Needs a Kill Switch. Here’s How to Build One.

#ai #opensource #github #agents

Preview text: A production AI agent should not just be observable. It should be stoppable. Here is the monitoring, anomaly detection, and kill switch pattern we use to keep agents measurable, governable, and safe under pressure.
The first serious control we want in any production AI agent is not a prettier trace.

It is the ability to stop the agent.

Not eventually. Not after someone opens a dashboard, reads twenty logs, squints at a span waterfall, and asks whether the behavior is “expected.”

We mean stop it now.

Because agent failures are weird. They rarely look like clean infrastructure failures. The server can be healthy. The model can be responsive. The queue can be draining. The logs can be boring.

Meanwhile, the agent is looping tool calls, burning API budget, denying every legitimate request, escalating harmless workflows, or preparing to send a beautifully formatted email to exactly the wrong person.

That is why we built theaios-agent-monitor: governance-first observability for AI agents. It records agent events, computes rolling metrics, tracks baselines, detects anomalies, triggers alerts, supports compliance export, and gives operators scoped kill switches for agents, sessions, and global emergencies.

The kill switch is the hook. But the kill switch is not a button floating in space.

A real kill switch needs three things beneath it:

Monitoring — structured events that describe what the agent is doing.
Anomaly detection — metrics and baselines that tell us when behavior has drifted.
Control — scoped policies that stop unsafe behavior before a human has to manually intervene.

This is the pattern:

Agent action
  -> record event
  -> compute rolling metrics
  -> update baseline
  -> detect anomaly
  -> trigger alert or kill policy
  -> block future work if killed

A dashboard watches.

A kill switch governs.

That distinction matters.

The problem: most agents are observable but not controllable

A lot of agent stacks have tracing now. That is good.

We want traces. We want logs. We want spans. We want cost reports. We want dashboards that tell the story after something goes wrong.

But observability alone does not stop anything.

If an agent starts spending too much, the dashboard will show the spend rising. If a prompt injection causes guardrails to fire repeatedly, the logs may record the denials. If a tool loop begins, the trace may become very interesting.

Interesting is not safe.

The production question is sharper:

Can the system stop the agent before the blast radius grows?

That is the job of the kill switch.

The implementation should be boring, explicit, and close to runtime. The monitor records every meaningful agent event. Metrics roll up over a window. Baselines learn normal behavior per agent and metric. Anomaly rules detect weird behavior. Kill policies enforce hard stops when thresholds are crossed.

This is how we move from “we can inspect what happened” to “we can contain what is happening.”

What we learned: stop treating agent behavior like logs

The first mistake is treating agent activity as incidental logging.

Logs are prose. Events are contracts.

A log says something happened. An event says what happened, which agent did it, when it happened, what it cost, how long it took, which session it belonged to, and what metadata we need for investigation.

That matters because the kill switch cannot reason over vibes. It needs metrics. Metrics need events.

So the architecture starts with a simple rule:

Every meaningful agent operation becomes an event.

LLM call? Event.

Tool call? Event.

Guardrail denial? Event.

Approval request? Event.

Error? Event.

This is not paperwork. This is the raw material for control.

Once the events exist, we can compute rolling metrics like cost_per_minute, denial_rate, event_count, error_count, and avg_latency_ms. Once metrics exist, we can establish baselines. Once baselines exist, we can detect anomalies. Once anomalies and thresholds exist, we can stop the agent.

The kill switch is only as good as the signal feeding it.

The architecture: monitor, detect, stop

The core architecture is deliberately simple:

Agent runtime
  -> AgentEvent
  -> Monitor.record(...)
  -> rolling metrics
  -> baselines
  -> anomaly detection
  -> kill switch policies
  -> alerts and audit evidence

Each layer has one job.

The event layer records behavior.

The metrics layer summarizes live behavior over a rolling window.

The baseline layer learns normal behavior per agent and metric.

The anomaly layer detects statistical drift.

The kill switch layer enforces hard containment.

The alert layer tells operators what happened.

The compliance layer turns behavior into evidence.

None of this needs to be exotic. In fact, it should not be. The control plane for agents should be easy to understand when everyone is tired and the incident channel is moving too fast.

Now let’s build the pattern.

Step 1: Install the monitor

Start with the package:

pip install theaios-agent-monitor

Then import the runtime pieces:

import time

from theaios.agent_monitor import AgentEvent, Monitor, load_config

The three imports matter:

Monitor is the runtime control plane.

load_config loads the YAML policy.

AgentEvent is the structured event envelope.

We do not want arbitrary log strings to become our governance interface. We want typed operational facts that the monitor can measure.

Step 2: Write the config with the kill switch already present

Do not bolt the kill switch on later.

If the agent can call tools, spend money, mutate state, message users, touch private data, or trigger external workflows, the kill switch belongs in the first production config.

Here is a minimal production-ready starting point:

# monitor.yaml
version: "1.0"

metadata:
  name: production-agent-monitor
  description: Production agent monitoring

metrics:
  default_window_seconds: 300
  max_window_seconds: 3600

baselines:
  enabled: true
  min_samples: 30
  metrics:
    - denial_rate
    - error_count
    - cost_per_minute
    - avg_latency_ms
  storage_path: .agent_monitor/baselines.json

anomaly_detection:
  enabled: true
  rules:
    - name: cost-spike
      metric: cost_per_minute
      z_threshold: 2.5
      severity: critical
      cooldown_seconds: 600

    - name: denial-surge
      metric: denial_rate
      z_threshold: 3.0
      severity: high
      cooldown_seconds: 300

kill_switch:
  enabled: true
  state_path: .agent_monitor/kill_state.json
  policies:
    - name: auto-kill-on-high-cost
      metric: cost_per_minute
      operator: ">"
      threshold: 5.0
      action: kill_agent
      severity: critical
      message: "Agent exceeded cost-per-minute limit"

alerts:
  channels:
    - type: console

There are a few production choices embedded here.

We use a 300-second rolling metrics window because five minutes is responsive without being twitchy. We enable baselines so the system can learn what normal looks like for each agent. We define anomaly detection for statistical weirdness. Then we define a hard kill policy for unacceptable cost velocity.

Anomaly detection is “this is weird.”

Kill policy is “this is no longer allowed.”

Both are useful. They do different jobs.

Validate the config before it goes anywhere near production:

agent-monitor -c monitor.yaml validate

This is not glamour work. This is the work that prevents 2 a.m. YAML archaeology.

Step 3: Initialize the monitor once

Create the monitor at application startup and reuse it.

from theaios.agent_monitor import Monitor, load_config

monitor = Monitor(load_config("monitor.yaml"))
monitor.kill_switch_engine.load()

The load() call matters when we are using a persisted kill state file. If an agent was killed before the process restarted, we want the application to restore that state on startup.

Otherwise, we risk accidentally reviving an agent that operators intentionally stopped.

That is not resilience.

That is a haunted deployment.

The monitor should be created once. Do not create a new monitor per request. Do not load config per request. Keep the control plane close to the agent runtime and cheap enough to run in the hot path.

If governance is slow, teams route around it. If governance is local, explicit, and boring, teams keep it in the path.

Step 4: Record real agent events

Now we wire the monitor into the agent loop.

A basic action event looks like this:

monitor.record(
    AgentEvent(
        timestamp=time.time(),
        event_type="action",
        agent="sales-agent",
        cost_usd=0.007,
        latency_ms=350.0,
        data={"model": "gpt-4"},
    )
)

That event gives the monitor enough information to update live metrics.

The important fields are straightforward:

timestamp tells us when it happened.

event_type tells us what kind of behavior occurred.

agent tells us which operational unit owns the behavior.

cost_usd and latency_ms feed cost and latency metrics.

data gives us structured context without turning the whole event model into a kitchen sink.

The event types we care about most in production are:

action
guardrail_trigger
denial
approval_request
approval_response
cost
error
session_start
session_end

We usually start with action, denial, and error. Then we add approval events and guardrail events as the agent gets access to higher-risk workflows.

Step 5: Treat denials as first-class signals

A guardrail denial should never disappear into an application log.

It is one of the most important signals in an agentic system.

If denial rate rises, one of two things is happening:

The world is attacking the agent.

Or we broke the policy.

Both are worth knowing.

Record denials as events:

monitor.record(
    AgentEvent(
        timestamp=time.time(),
        event_type="denial",
        agent="sales-agent",
        data={
            "rule": "block-injection",
            "severity": "critical",
        },
    )
)

Now denials feed denial_rate instead of becoming scattered prose in logs.

This gives anomaly detection a signal worth using. A denial surge can alert the team. A cost spike can kill the agent. A tool loop can trip flood protection. A latency anomaly can warn before users feel it.

This is how agent behavior becomes operationally legible.

Step 6: Capture errors without breaking the control loop

Errors should also become events.

monitor.record(
    AgentEvent(
        timestamp=time.time(),
        event_type="error",
        agent="sales-agent",
        data={
            "error_type": "TimeoutError",
            "message": "LLM call timed out",
        },
    )
)

An error event is not a replacement for exception handling. It is the monitoring record that lets the rest of the control plane see failure patterns.

A single timeout is noise.

A rising error_count over five minutes may be a provider outage, a bad tool integration, a broken prompt path, or a downstream service failing. We do not need the monitor to know which one immediately. We need it to make the failure visible and enforce policy when the pattern crosses a line.

Step 7: Read live metrics

During development, inspect the snapshot directly.

snap = monitor.get_metrics("sales-agent")

print(f"Events: {snap.event_count}")
print(f"Cost/min: ${snap.cost_per_minute:.4f}")
print(f"Denial rate: {snap.denial_rate:.1%}")

These are the agent-native vital signs.

event_count tells us whether the agent is suddenly too active.

cost_per_minute tells us whether the agent is burning budget too quickly.

denial_rate tells us whether guardrails are being triggered unusually often.

For production agents, these metrics are more useful than generic infrastructure metrics alone. CPU can be calm while the agent is expensive. Memory can be fine while the agent is unsafe. The model can respond quickly while the workflow is wrong.

Infrastructure health is necessary.

Behavioral health is the missing layer.

Step 8: Let baselines learn what normal means

Static thresholds are useful. We still use them.

They are excellent for hard limits: cost, event floods, repeated errors, or anything with a clean operational boundary.

But agents do not all behave the same way.

A research agent may have long latency and high event volume. A support agent may have frequent guardrail events. A finance agent may be low-volume but high-risk. A coding agent may call tools constantly.

So we also want baselines.

baselines:
  enabled: true
  min_samples: 30
  metrics:
    - denial_rate
    - error_count
    - cost_per_minute
    - avg_latency_ms
  storage_path: .agent_monitor/baselines.json

The min_samples setting matters. We do not want the first three events in a new environment to define reality. The baseline needs enough observations before anomaly detection becomes meaningful.

Persist the baseline.

If the process restarts and forgets everything, anomaly detection has to relearn normal from zero. That is fine in a demo. In production, it is amnesia with an incident channel.

Step 9: Add anomaly detection rules

Once the monitor has metrics and baselines, anomaly rules become simple.

anomaly_detection:
  enabled: true
  rules:
    - name: cost-spike
      metric: cost_per_minute
      z_threshold: 2.5
      severity: critical
      cooldown_seconds: 600

    - name: denial-surge
      metric: denial_rate
      z_threshold: 3.0
      severity: high
      cooldown_seconds: 300

    - name: latency-anomaly
      metric: avg_latency_ms
      z_threshold: 3.0
      severity: medium
      cooldown_seconds: 120

The important key is z_threshold.

This is the threshold for how far the current metric is allowed to drift from the learned baseline before the monitor treats it as anomalous.

We need discipline here.

Not every anomaly should kill the agent.

A latency anomaly may be worth an alert. A denial surge may mean the guardrails are doing their job. A cost spike may deserve immediate containment. An event flood may indicate a runaway loop.

The job is to separate investigate from stop.

A mature setup uses both:

anomaly rules for weirdness
kill policies for unacceptable risk

Step 10: Configure kill policies for hard limits

Now we get to the control layer.

kill_switch:
  enabled: true
  state_path: .agent_monitor/kill_state.json
  policies:
    - name: auto-kill-on-high-cost
      metric: cost_per_minute
      operator: ">"
      threshold: 5.0
      action: kill_agent
      severity: critical
      message: "Agent exceeded cost-per-minute limit"

This policy says: if the agent’s cost_per_minute exceeds 5.0, kill that agent.

Not the fleet.

Not the whole platform.

Not the customer support agent quietly doing its job in the corner.

Just the agent that crossed the line.

That scoping matters.

The kill switch supports three patterns:

kill_agent   -> stop one agent
kill_session -> stop one session
kill_global  -> stop everything

Most incidents should start with the smallest useful scope.

Global kill is an emergency brake. We want it. We test it. We respect it. But we do not reach for it every time one agent gets weird.

Good kill switches reduce blast radius.

Bad kill switches create outages with better branding.

Step 11: Check kill state before expensive or irreversible work

This is where the pattern becomes real.

Before the agent makes an LLM call, calls a tool, sends an email, writes to a database, opens a ticket, updates a CRM, or touches anything with consequence, check kill state.

if monitor.is_killed("sales-agent"):
    raise RuntimeError("Agent sales-agent is currently suspended")

For session-aware agents, pass the session ID:

if monitor.is_killed("sales-agent", session_id="sess-abc-123"):
    raise RuntimeError("Agent sales-agent is suspended for this session")

This is not defensive programming theater.

It is the circuit breaker.

The monitor can reject work for killed agents. The application should still check before meaningful work begins. We do not want to discover the agent was killed after it already sent the message.

Step 12: Use an adapter pattern around agent steps

Here is the article-safe wrapper pattern we recommend adapting in application code.

It is intentionally small. It does not pretend to be a universal agent framework. It shows where the control checks and event recording belong.

import time
from dataclasses import dataclass
from typing import Callable

from theaios.agent_monitor import AgentEvent, Monitor

@dataclass
class AgentStepResult:
    text: str
    cost_usd: float
    latency_ms: float
    model: str

class AgentSuspended(RuntimeError):
    pass

def run_agent_step(
    *,
    monitor: Monitor,
    agent: str,
    session_id: str,
    step: Callable[[], AgentStepResult],
) -> AgentStepResult:
    if monitor.is_killed(agent, session_id=session_id):
        raise AgentSuspended(f"{agent} is suspended")

    start = time.time()

    try:
        result = step()

        monitor.record(
            AgentEvent(
                timestamp=time.time(),
                event_type="action",
                agent=agent,
                session_id=session_id,
                cost_usd=result.cost_usd,
                latency_ms=result.latency_ms,
                data={"model": result.model},
            )
        )

        if monitor.is_killed(agent, session_id=session_id):
            raise AgentSuspended(f"{agent} was suspended by policy")

        return result

    except Exception as exc:
        monitor.record(
            AgentEvent(
                timestamp=time.time(),
                event_type="error",
                agent=agent,
                session_id=session_id,
                latency_ms=(time.time() - start) * 1000,
                data={
                    "error_type": type(exc).__name__,
                    "message": str(exc),
                },
            )
        )
        raise

Notice the two kill checks.

First, we check before the step runs. That blocks agents that are already suspended.

Then we record the event. Recording can update metrics, update baselines, run anomaly detection, and trigger kill policies.

Then we check again.

That second check is the difference between telemetry and control. If the event that just occurred pushed the agent over a hard threshold, we do not let the next step proceed.

The loop is:

check -> act -> record -> evaluate -> stop if needed

That is the kill switch pattern.

Step 13: Add boundary protection for API-based agents

If the agent is exposed through an API, put the kill switch at the request boundary too.

import time

from fastapi import FastAPI, HTTPException, Request
from theaios.agent_monitor import AgentEvent, Monitor, load_config

app = FastAPI()
monitor = Monitor(load_config("monitor.yaml"))
monitor.kill_switch_engine.load()

@app.middleware("http")
async def monitor_middleware(request: Request, call_next):
    agent = request.headers.get("X-Agent-ID", "default")
    session_id = request.headers.get("X-Session-ID", "unknown")

    if monitor.is_killed(agent, session_id=session_id):
        raise HTTPException(
            status_code=503,
            detail=f"Agent {agent} is currently suspended",
        )

    start = time.time()

    try:
        response = await call_next(request)
        elapsed_ms = (time.time() - start) * 1000

        monitor.record(
            AgentEvent(
                timestamp=time.time(),
                event_type="action",
                agent=agent,
                session_id=session_id,
                latency_ms=elapsed_ms,
                data={
                    "method": request.method,
                    "path": request.url.path,
                    "status_code": response.status_code,
                },
            )
        )

        return response

    except Exception as exc:
        monitor.record(
            AgentEvent(
                timestamp=time.time(),
                event_type="error",
                agent=agent,
                session_id=session_id,
                latency_ms=(time.time() - start) * 1000,
                data={
                    "error_type": type(exc).__name__,
                    "message": str(exc),
                    "method": request.method,
                    "path": request.url.path,
                },
            )
        )
        raise

This gives us two layers:

runtime checks inside the agent loop
boundary checks at the API layer

That is the right kind of redundancy.

Not duplicate logic everywhere. Duplicate control at the places where failure matters.

Step 14: Support manual kill and revive

Automatic policies are necessary, but operators still need manual controls.

During an incident, we want the team to stop one agent immediately:

monitor.kill_agent("sales-agent", reason="Cost spike detected")
monitor.kill_switch_engine.save()

For a suspicious session:

monitor.kill_session("sess-abc-123", reason="Suspicious workflow")
monitor.kill_switch_engine.save()

For the emergency brake:

monitor.kill_global(reason="System-wide anomaly")
monitor.kill_switch_engine.save()

Revival should be explicit:

monitor.revive(agent="sales-agent")
monitor.kill_switch_engine.save()

For sessions and global controls:

monitor.revive(session_id="sess-abc-123")
monitor.kill_switch_engine.save()

monitor.revive_global()
monitor.kill_switch_engine.save()

One practical rule: a kill without a reason is an incident smell.

The reason becomes operational memory. It helps the next operator understand what happened. It helps compliance reporting. It helps future us avoid inventing folklore around production events.

Step 15: Give operators CLI controls

Incident response often happens outside the application runtime. The CLI path matters.

Validate the config:

agent-monitor -c monitor.yaml validate

Inspect the monitor:

agent-monitor -c monitor.yaml inspect

Check agent status:

agent-monitor -c monitor.yaml status --agent sales-agent

View action events:

agent-monitor -c monitor.yaml events --agent sales-agent --type action

Kill and revive an agent:

agent-monitor -c monitor.yaml kill sales-agent --reason "Cost spike"
agent-monitor -c monitor.yaml revive sales-agent

Kill and revive a session:

agent-monitor -c monitor.yaml kill sess-abc-123 --session --reason "Suspicious workflow"
agent-monitor -c monitor.yaml revive sess-abc-123 --session

Use the global emergency brake:

agent-monitor -c monitor.yaml kill ALL --global-kill --reason "Emergency shutdown"
agent-monitor -c monitor.yaml revive ALL --global-revive

Export audit evidence:

agent-monitor -c monitor.yaml export --format soc2

The CLI should be part of the runbook, not trivia in the README.

Operators should know how to inspect, kill, revive, and export evidence before the incident starts.

How the layers work together

The architecture works because each layer reinforces the next.

Events create the behavioral record.

Metrics summarize what is happening now.

Baselines define what normal looks like.

Anomaly detection identifies drift.

Kill policies stop unacceptable behavior.

Alerts coordinate humans.

Compliance export preserves evidence.

The production loop is not complicated:

1. Agent receives work.
2. Application checks kill state.
3. Agent performs one bounded step.
4. Application records an AgentEvent.
5. Monitor updates metrics and baselines.
6. Monitor evaluates anomaly and kill rules.
7. Application checks kill state again.
8. Agent either continues or stops.

The important word is bounded.

Agents should not run indefinitely between checks. The kill switch has to live at the seams: before tool calls, after tool calls, before irreversible actions, after expensive calls, and between workflow steps.

A kill switch that only checks once at session start is not a kill switch. It is a lobby sign.

Practical implementation advice

Start with a small event envelope.

Do not model the universe on day one. Capture action, denial, error, cost, latency, agent, session, and enough metadata to investigate.

Separate agent identity from user identity.

Agent IDs should represent operational units: sales-agent, finance-agent, research-agent, support-triage-agent. User or tenant information can live in metadata when appropriate.

Treat cost as a safety metric.

Teams often think of cost as a finance problem. For agents, cost velocity is a failure signal. A sudden jump in cost per minute can indicate loops, tool misuse, prompt injection, bad routing, or a model fallback behaving badly.

Make denial rate visible.

A rising denial rate may mean the guardrails are working because the system is under attack. It may also mean the guardrails are misconfigured and blocking legitimate work. Either way, it is one of the most agent-native signals we have.

Prefer scoped containment.

Agent-level kill beats global kill. Session-level kill is even better when the problem is isolated to one conversation or workflow. Global kill is for platform-wide danger, not ordinary weirdness.

Persist the boring things.

Persist baselines. Persist kill state. Persist events. Production systems restart. Containers move. Nodes die. If the control plane forgets what it knew every time the process restarts, we have built a goldfish with tool access.

Practice revival.

Revival is part of incident response. Operators should know how to inspect kill state, understand the reason, verify the fix, and revive the agent. A killed agent that cannot be safely revived is still an incident.

Build the rest of the platform

The kill switch is one control surface. It becomes much more powerful when it sits inside a complete enterprise agentic platform.

We open-sourced the stack because the same deployment problem kept showing up again and again: teams did not just need agents. They needed reliability certification, policy enforcement, context orchestration, runtime monitoring, and agent-specific authorization working together instead of scattered across five disconnected tools.

Start with the full GitHub organization:

https://github.com/Cohorte-ai

The broader ecosystem includes Agent Monitor plus the other five libraries:

TrustGate — reliability certification for AI endpoints using self-consistency sampling and conformal calibration.https://github.com/Cohorte-ai/trustgate
Guardrails — declarative YAML policy enforcement, approval tiers, audit logs, and framework adapters for AI agents.https://github.com/Cohorte-ai/guardrails
Context Router — intelligent context routing across sources, agents, and retrieval paths.https://github.com/Cohorte-ai/context-router
Context Kubernetes — declarative orchestration of enterprise knowledge for agentic AI systems.https://github.com/Cohorte-ai/context-kubernetes
Agent Auth — agent-specific identity, authorization, sessions, delegation, and A2A access control.https://github.com/Cohorte-ai/agent-auth
Agent Monitor — governance-first observability, anomaly detection, kill switches, alerts, and compliance export.https://github.com/Cohorte-ai/agent-monitor

The research layer is here:

Exploitation Surface — how agentic systems expand the attack and failure surface.https://arxiv.org/abs/2604.04561
MoE Routing — routing architecture for specialized expert systems.https://arxiv.org/abs/2604.04230
TrustGate / Reliability — reliability certification for AI systems.https://arxiv.org/abs/2602.21368

And the architecture layer is here:

The Enterprise Agentic Platform — the book-length operating model for building governed, reliable, enterprise-grade agent systems.https://www.cohorte.co/playbooks/the-enterprise-agentic-platform

FAQ

What is an AI agent kill switch?

An AI agent kill switch is a control mechanism that stops an agent from continuing execution. A useful kill switch supports scoped containment: stopping a single agent, a single session, or the entire system in an emergency.

Why is observability alone not enough for AI agents?

Observability tells us what happened or what is happening. Production agents also need control. If an agent is spending too much, looping through tools, violating policy, or behaving anomalously, the monitoring layer should be able to trigger containment.

What metrics matter most for AI agent monitoring?

The strongest starting metrics are event_count, denial_rate, error_count, cost_per_minute, and avg_latency_ms. These map directly to behavior, safety, reliability, and cost risk.

Should every production agent have a kill switch?

Yes. The scope depends on risk. Read-only internal agents may need simple manual kill controls. Agents with write access, external communication, financial authority, or sensitive data access need stronger automatic policies and audit trails.

Is a global kill switch enough?

No. Global kill is useful for emergencies, but agent-level and session-level controls are safer defaults. Scoped containment reduces blast radius and avoids turning one misbehaving agent into a platform-wide outage.

Final takeaway

A production AI agent should not be trusted because it sounds confident.

It should be trusted because it is observable, measurable, governable, and stoppable.

The kill switch is not a sign that we distrust agents. It is a sign that we understand production.

Every serious system has a way to stop unsafe behavior. Databases have circuit breakers. Networks have rate limits. Payments have fraud holds. Deployment systems have rollbacks. Industrial systems have emergency stops.

Agents need the same operational dignity.

The dashboard tells us what happened.

The anomaly detector tells us what changed.

The kill switch makes sure the agent does not keep making it worse.

That is the difference between watching an AI system and operating one.

— Cohorte Team

DEV Community