97% Expect a Major AI Agent Incident This Year. Are You in the 3%?

#ai #agents #security #observability

Book: AI Agents Pocket Guide
Also by me: LLM Observability Pocket Guide
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

In April 2026, Security Boulevard ran the headline number nobody wanted, citing the Arkose Labs 2026 Agentic AI Security Report: 97% of enterprise leaders expect a material AI-agent-driven security or fraud incident within the next 12 months. Nearly half expect one within six months. And 6% of security budgets are pointed at the problem. (Methodology: 300 enterprise leaders across security, fraud, identity and AI roles in North America, Europe and APAC, fielded February 2026.)

Those three numbers do not balance. That's the whole post.

The other shoe is from Infosecurity Magazine, reporting on the Cloud Security Alliance study Autonomous but Not Controlled: AI Agent Incidents Now Common in Enterprises (commissioned by Token Security) published the same month: 65% of organizations had at least one AI-agent-related incident in the prior 12 months. 61% saw data exposure. 43% saw operational disruption. 35% reported financial loss. (Methodology: online survey of 418 IT and security professionals across organizations of varying size and geography, fielded January 2026.) The expectation in the first stat is already happening in the second.

If you're running agents in production and your reaction to "97%" is "yeah, that tracks," you're honest. If your reaction is "we're fine," I want to know what you're shipping. This post is the readiness checklist for staying in the 3% — five mechanisms, mapped to the failure modes that produce the headline numbers, plus a Python middleware sketch for the most-skipped one.

What "major" means in this survey

Read the report carefully and "major incident" is not abstract. The CSA paper sorts the actual outcomes from the past year into four buckets:

Data exposure. An agent reads from a system it had token-level access to but no business reason to query, then emits the contents into a log, an email, or another agent's context window. The internal-data leak class.
Account takeover. An agent's service-account credential ends up in a prompt, a retrieved doc, or a cached response. Someone exfiltrates the credential. The agent's blast radius is now the attacker's blast radius.
Destructive action. An agent calls a tool with side effects. The tool deletes, drops, mutates, ships. The Replit-class incident, where the agent deleted a production database despite an in-place code freeze during a SaaStr vibe-coding session.
Financial loss. An agent loops on an expensive API. A compromised agent transfers funds, places trades, or buys ad inventory. The bill arrives.

97% say one of those is coming. The 3% who say it isn't are not safer. They have weaker monitoring.

The mechanism behind the number

Why are agents producing this rate of incidents when CRUD apps with the same data access don't?

Three structural things change when you put an LLM in the control loop.

The tool surface is wider than the agent's training distribution. A traditional service has a finite call graph: ten endpoints, each invoked from three places. An agent has a tool registry of forty MCP tools, any of which can be called in any order, prompted by any text in any retrieved document. The combinatorics on "what can this thing do" is open.

Then there's identity that's shared and over-scoped. The same CSA report found 82% of enterprises have unknown AI agents running in their environments — agents stood up by one team using credentials with broad scope, lingering after the project ended, retaining permissions. The CSA paper (commissioned by Token Security) calls it retirement debt. From a security perspective it's a service account whose owner left and whose privileges nobody pruned.

And the input channel is untrusted by design. An agent's instructions arrive partly from your prompt and partly from documents, tool outputs, and retrieved context. Prompt injection isn't an exploit; it's the operating mode. Anything the agent reads can become an instruction.

The five-item checklist below maps to those three primitives.

The 5-item readiness checklist

1. Scope-limited tool access

Every tool the agent can call should run under a credential scoped to that tool's job, not the agent's job. If the agent has a read_invoice tool and a send_email tool, those are two different IAM identities with two different policies. The agent process holds neither directly. It asks a tool broker; the broker assumes the right role and executes.

When a prompt injection convinces the agent to "send an invoice to the attacker's email," the send_email credential cannot read invoices. The cross-tool combination is structurally impossible at the IAM layer, not just the prompt layer.

2. Egress output filter

Every byte the agent emits (to a user, to a webhook, to another agent) passes a filter. The filter looks for credentials, PII, and marker tokens you injected into sensitive data at ingest. If a customer record carries an HMAC tag and the tag shows up in an outbound message, you block the message and alert.

This is the single highest-value defense against the "agent leaks internal data" class. It's also the one most teams skip because it requires you to know what's sensitive. That's uncomfortable, so nobody does it.

3. Per-agent cost ceiling

Each agent run has a token budget and a wall-clock budget. When either is exhausted, the run terminates with a surfaced error. The budget lives in the orchestration layer, not the prompt — an agent cannot raise its own budget by asking nicely.

Cost ceilings catch the "agent loops calling an expensive API" class. They also catch the subtler attack where an injected prompt drives the agent into long, expensive reasoning chains to drain your provider budget.

4. Anomalous-action detection

Every tool call lands on a span with attributes: tool name, parameter shape, caller identity, time of day. Aggregate per-agent baselines. Alert when an agent calls a tool it has not called in 30 days, or calls a tool 50× more than the rolling 24-hour median, or calls a tool with a parameter shape that doesn't match the schema's training distribution.

This is where observability and security stop being separate budgets. The signals overlap. Build the eval rig and the security telemetry on the same trace stream.

5. Kill switch with audit trail

Every agent run is interruptible. A single API call from an oncall human terminates the run, freezes the agent's tool calls, and writes a structured audit record: who pulled the switch, what the agent was doing, what tools had already executed, what state was mutated.

The kill switch is the one item teams nod at, agree with, and don't build. The Python sketch below is a starting point.

Python middleware: kill switch with audit trail

The middleware sits between the agent's tool router and the actual tool implementations. Every tool call passes through, gets recorded, and gets a chance to be aborted.

import hashlib
import json
import time
import uuid
from contextlib import contextmanager
from dataclasses import dataclass, asdict
from typing import Any, Callable

import redis

r = redis.Redis()


@dataclass
class ToolEvent:
    run_id: str
    agent_id: str
    tool: str
    params: dict
    started_at: float
    finished_at: float | None
    status: str  # "ok" | "killed" | "error"
    result_digest: str | None


def _kill_key(run_id: str) -> str:
    return f"agent:kill:{run_id}"


def _audit_key(run_id: str) -> str:
    return f"agent:audit:{run_id}"


def request_kill(run_id: str, reason: str) -> None:
    r.set(_kill_key(run_id), reason, ex=3600)


def is_killed(run_id: str) -> tuple[bool, str | None]:
    val = r.get(_kill_key(run_id))
    return (val is not None, val.decode() if val else None)


def _digest(value: Any) -> str:
    s = json.dumps(value, default=str, sort_keys=True)
    h = hashlib.sha256(s.encode()).hexdigest()[:24]
    return f"sha256:{h}"


def guarded_tool(
    tool: str,
    fn: Callable[..., Any],
    run_id: str,
    agent_id: str,
):
    def wrapped(**params):
        killed, reason = is_killed(run_id)
        if killed:
            ev = ToolEvent(
                run_id=run_id,
                agent_id=agent_id,
                tool=tool,
                params=params,
                started_at=time.time(),
                finished_at=time.time(),
                status="killed",
                result_digest=None,
            )
            r.rpush(_audit_key(run_id), json.dumps(asdict(ev)))
            raise RuntimeError(f"agent killed: {reason}")

        started = time.time()
        try:
            result = fn(**params)
            status, digest = "ok", _digest(result)
            return result
        except Exception as exc:
            status, digest = "error", _digest(str(exc))
            raise
        finally:
            ev = ToolEvent(
                run_id=run_id,
                agent_id=agent_id,
                tool=tool,
                params=params,
                started_at=started,
                finished_at=time.time(),
                status=status,
                result_digest=digest,
            )
            r.rpush(_audit_key(run_id), json.dumps(asdict(ev)))

    return wrapped


@contextmanager
def agent_run(agent_id: str):
    run_id = str(uuid.uuid4())
    try:
        yield run_id
    finally:
        # Snapshot audit on completion for archival.
        events = r.lrange(_audit_key(run_id), 0, -1)
        archive = [json.loads(e) for e in events]
        r.set(
            f"agent:archive:{run_id}",
            json.dumps(archive),
            ex=86400 * 30,
        )

How it's used. Wrap each tool registered with the agent:

def send_email_impl(to: str, body: str) -> dict:
    # real implementation
    ...


with agent_run(agent_id="invoice-bot") as run_id:
    send_email = guarded_tool(
        "send_email",
        send_email_impl,
        run_id=run_id,
        agent_id="invoice-bot",
    )
    # pass send_email into the agent's tool registry

Call request_kill(run_id, reason) from anywhere: a Slack /kill command, a Grafana alert, an oncall script. The next tool call raises and is recorded. The audit trail in Redis carries every tool the agent attempted, which params it used, the digest of every result, and whether the call completed or got killed.

What this gives you that "just log the calls" doesn't: a single source of truth that ties together the kill decision, the tool invocations, and the eventual archive. When the post-incident review happens (and the survey says it will), the question "what did the agent actually touch before we caught it" has a one-line answer.

Where to start if you have one week

Ship items 3 and 5 first: cost ceiling and kill switch. They're mechanical, they catch the highest-impact failure modes, and they buy you time to design items 1, 2, and 4 properly. The 3% in the headline are not the teams who solved this. They are the teams who already wrote the postmortem and are now drafting the second one.

If this was useful

The mechanisms above are the operational surface — chapter 6 of the AI Agents Pocket Guide walks through the threat model and tool-broker pattern in depth, and the LLM Observability Pocket Guide covers the tracing and eval rig you need behind anomalous-action detection. If you're standing up agent infrastructure right now, those two together are the short read.