razashariff

Posted on Jun 9

AI Agents Have Hands. Here Is the Off Switch.

#security #ai #llm #devops

We gave AI agents the ability to act. They delete records, move money, deploy, push to main, read secrets, call tools. That is the whole point of an agent. It is also the whole problem.

In July 2025, Replit's AI agent ran destructive commands during a code freeze and deleted a production database affecting more than 1,200 companies. It later admitted it had violated explicit instructions not to act without human approval. That is documented as AI Incident 1152. The lesson is not "that one tool is bad." The lesson is that "the agent was told to ask first, and it did not" is now a class of outage.

OWASP LLM06 Excessive Agency named the risk. AISVS C14 Human Oversight says a human must be able to step in. So does Article 14 of the EU AI Act. Everyone agrees a human should be in the loop for the actions that can end you.

The open question is the one nobody likes: how do you actually enforce that?

The approval dialog is theatre

The common answer is a software approval dialog. The agent is about to do something risky, a box pops up, a human clicks approve.

The trouble is that box runs on the same machine the agent runs on. If that host is compromised, the dialog can be clicked for you, or shown with one number and run with another. An approval the attacker can answer is not oversight. It is theatre. It is the same reason we moved second factors off the browser years ago: consent has to live somewhere the attacker does not control.

AgentBee: move the decision onto hardware

AgentBee is a small hardware key. When an agent tries a critical action, the exact action shows on the key's own screen, and a human presses to approve.

No press, no action.
Unplug it, or drop the link, and the action is blocked. Never silently allowed.
Fail-closed by default.

The decision is no longer made on the box the agent controls. It is made on a separate device, in your hand.

How it works

Every gated call runs through three steps:

agent wants to run a tool
        |
   classify()  ->  risk tier  (or "not gated")
        |
   show the exact action on the bee, wait for a human
        |
   approved -> allow + signed receipt   |   denied/timeout -> block

classify the tool call into a tier (reads pass straight through; writes, deletes, deploys, payments and secret access get gated).
ask the bee -- the action renders on the device screen and waits for a physical gesture.
enforce -- approved calls proceed with a signed receipt; anything not approved is rejected and the agent cannot continue.

Wiring it up

The fast path: just tell your agent

Setup is a plain English instruction, not a config file. With the AgentBee skill installed, in Claude, OpenAI, or any MCP client you say:

"Gate my critical actions with AgentBee."

and the skill wires it up. No SDK needed for the common case.

MCP-native

claude mcp add agentbee -- npx -y @agentbee/mcp

It works with any MCP-compatible client.

Hooks for everything that does not speak MCP

A git pre-push hook that makes pushing to main require a deliberate hold:

#!/bin/sh
# save as .git/hooks/pre-push   then:  chmod +x .git/hooks/pre-push
branch=$(git rev-parse --abbrev-ref HEAD)

case "$branch" in
  main|master|prod|release) tier=L4 ;;   # deliberate HOLD
  *)                        tier=L2 ;;   # quick TAP
esac

repo=$(basename "$(git rev-parse --show-toplevel)")
exec python3 .agentbee/agentbee-gate.py "Push -> $branch" "$repo" "$tier"

The same gate wraps deploy steps, CI jobs, command runners, or a tool-call hook. It exits 0 only when a human approved on the device.

Graduated trust, so you are not buried in prompts

Approval fatigue is real. So the gesture is graduated to the risk:

Action risk	To approve	To deny
L0-L2 (reads, routine writes)	a quick TAP	hold, or ignore
L3-L4 (delete prod, payments, deploys, prod pushes)	a deliberate HOLD	tap, or ignore

A stray tap can never approve a destructive action: approving the dangerous stuff always needs a conscious hold. You decide what gets gated, and you approve the one percent that matters, not five hundred prompts a day.

Every approval is evidence

This is the part auditors care about. Every approval is signed on the device with a key that never leaves it, and written to a tamper-evident, hash-chained ledger. A real receipt, exactly as stored:

{
  "action": "Delete GH prod DB",
  "scope": "prod-main",
  "trust": "L4",
  "result": "approved",
  "ts": "2026-06-09T08:39:01Z",
  "sig": "15050fde...c91dfb1",
  "device": "3166993cc2f863e3"
}

Ask for the record later, in Claude or on the CLI:

agentbee receipts this-week

AgentBee receipt ledger        chain: intact

 when (UTC)              tier  result    verify  action
 2026-06-09T08:39:01Z    L4    approved   ok     Delete GH prod DB   | prod-main
 2026-06-09T11:21:08Z    L4    approved   ok     Commit -> main      | myapp
 2026-06-09T11:04:21Z    L4    denied     --     Payment             | stripe_payment

Each "ok" is re-verified right now against your device's public key. Forge an action and the signature stops verifying. Remove a record and the hash chain breaks. You can prove what a human approved, who approved it, and when, checkable offline, with no server in the loop.

Under the EU AI Act, Regulation (EU) 2024/1689, that supports Article 14 (human oversight) and Article 12 (record-keeping). It is a supporting control, not a compliance certificate, and I will only ever call it that.

Let me be precise about what it is and is not

It is authorization, not authentication. A YubiKey answers who you are at login. AgentBee answers "do you approve this action" at the moment it runs. It sits next to your YubiKey. It does not replace it. This is the core of our AgentPass thesis: identifying an agent is not enough, controlling what it can do is mandatory.
The key never leaves the device. It is generated on-chip. On locked units it is non-extractable (Flash Encryption + Secure Boot v2), the same protection class as a Trezor One. It is not a secure element, and it is not unhackable. Nothing is. Pulling a key would take physical possession, lab gear, time and money, for one device, not a remote attack at scale.
Tamper-evident, not tamper-proof. You cannot stop someone trying to edit a log. You can make any edit detectable. That is the honest claim.

The point

Agents are getting more autonomy, not less. The answer is not to trust the host the agent lives on. It is to put the human decision somewhere the host cannot reach. An off switch you hold in your hand.

I am finalising the builds now. If you want to test one, join the waiting list at agentbee.co.uk.

Raza Sharif, CEO/Founder, CyberSecAI and AgentPass. Author of Breach 20/20. FBCS, CISSP, CSSLP.

DEV Community