The missing primitive: out-of-band human approval for AI agents

mightbesaad — Sat, 06 Jun 2026 19:34:01 +0000

In April 2026, a Cursor agent running Claude Opus 4.6 deleted PocketOS's production database — and its
volume-level backups — in nine
seconds. The founder had
written the rules in caps: never guess, never run destructive commands unprompted. Pressed afterward,
the agent admitted it had "violated every principle I was
given." A few months earlier, an
agent asked only to tidy a desktop deleted roughly 15 years of family
photos — files it was never asked to
touch; iCloud later clawed most of them back, but in the moment they were gone.

Two patterns run through these — and only one is true of both, which turns out to be the one that
matters. PocketOS shows the first: the operator had already told the agent not to do the dangerous
thing, in caps, and it went ahead anyway. Both show the second: there was no moment where a human
could say "wait — what?" before it was done.

The lesson most people drew was "agents need better guardrails." But PocketOS shows the ceiling of
prompt-level guardrails: the rules were right there, and the agent stepped over them. Instructions are
advice. They don't bind.

The thing that binds is an approval step on irreversible actions — and specifically an out-of-band
one.

## Why out-of-band

Most human-in-the-loop tooling assumes the human is right there — or that you'll adopt its framework to
reach them otherwise. LangGraph's interrupt can pause a run and resume it later, async — but only once
you've built on LangGraph. MCP's elicitation asks the user in-session. The cloud platforms (AWS Bedrock
AgentCore) gate tool calls — once you've migrated onto their platform.

But the whole point of an autonomous agent is that nobody is watching the terminal. It runs on a
schedule, in CI, or while you're asleep. An approval step only helps if it can reach a human who isn't
in the loop — on their phone, minutes or hours later — and make the agent wait for them.

That's a specific shape:

the agent calls an "ask a human" tool before the irreversible action;
the account's configured approver gets a link — not necessarily whoever kicked off the run;
they approve or deny from anywhere;
the agent blocks until they answer (or it times out);
and the whole exchange survives the session ending.

## What it is, and isn't

This isn't a smarter model or a safety net that catches everything. And the cheap way to wire it —
telling the agent "call request_approval before anything destructive" — just rebuilds the problem:
that's one more instruction it can step over, the exact softness PocketOS exposed. The binding has to
live a layer down. You put the capability behind the gate — the deploy, the delete, the send can't fire
without a human-issued token — so it isn't the agent choosing to ask permission, it's the execution
layer refusing to act until a human decides. That's the line between a prompt rule (the model's judgment)
and a checkpoint (deterministic). What the primitive gives you is a place to stand: a hard,
human-decided checkpoint a prompt rule fundamentally can't be.

## The drop-in

I built this as a plain MCP tool. request_approval returns a mobile link; check_approval polls for
the decision. It runs in any MCP host — no platform to adopt, no wallet, no SDK. It's deliberately small,
and I'll be honest about the edges: today it's email-delivered and single-approver (no multi-sig, no SMS
yet). That's enough to answer the only question worth answering first:

Does an out-of-band approval gate, as a drop-in tool, solve a problem you actually have?

If you run agents that send, post, deploy, move money, or touch files you can't un-touch, I'd genuinely
like to know whether this is the shape you'd reach for — or what's missing from it.

Try it / read the docs: gvnr.dev — the two tools are request_approval and check_approval
Tell me I'm wrong: @mightbesaad / saad@gvnr.dev

Howdy. I built budget controls for AI agents, does this solve a problem you actually have?

mightbesaad — Fri, 05 Jun 2026 21:21:35 +0000

been building AI agent infrastructure for the past few months. The two things that kept biting me —
and kept coming up when I talked to other devs building agents — were runaway costs and agents doing
irreversible things without asking first.

So I built gvnr: an open-source MCP server that gives agents per-agent spend caps (hard-stop before a
call if the budget's gone) and a human approval gate (agent asks, you get a mobile link, you approve or
deny, agent waits). Both work as plain REST calls or MCP tools — no platform to adopt, no SDK.

It's live. You can get an API key in one curl command and try the approval gate for free (it doesn't burn
the trial ops). Source is at github.com/mightbesaad/gvnr.

Here's what I genuinely want to know from devs building in this space:

Does the spend-cap shape match how you think about cost control, or do you manage that somewhere else entirely?
Is the approval gate useful if it's email-only and single-approver, or does that make it a toy?
What flag would stop you from wiring this into an agent you actually run?

Not fishing for encouragement — if this is solving the wrong problem, or solving it the wrong way, I'd
rather know now.

DEV Community: mightbesaad

The missing primitive: out-of-band human approval for AI agents

Howdy. I built budget controls for AI agents, does this solve a problem you actually have?