Human-in-the-Loop Approval Gates: Where to Pause an Autonomous Agent

#ai #python #reliability #agents

Book: AI Agents Pocket Guide: Patterns for Building Autonomous Systems with LLMs
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You give the agent a refund tool. The reasoning looks fine in testing. Then a real ticket comes in, the model reads an angry customer who claims a $4,000 charge was wrong, and the agent refunds it on its own authority. No human saw the decision. The customer was lying. The money is gone, and your audit log says an autonomous process approved it.

That is the failure shape that approval gates exist to prevent. Most tools an agent calls are safe to run unattended: a search, a read, a calculation. A small set are not. Anything that moves money, deletes data, or sends a message to a real person should stop and wait for a human to say yes. The trick is doing that without turning your agent into a synchronous chat window that blocks a worker thread for an hour while someone is at lunch.

This is a how-to. By the end you have a gate decorator, an approval payload worth showing a human, a resume token that survives a process restart, and timeout defaults that fail in the safe direction.

Which tools get a gate

Sort your tools into three buckets before you write any code.

Auto-run. Read-only or trivially reversible. Search, fetch, list, summarize. The agent calls these as often as it wants. A gate here only adds latency and trains your reviewers to click approve without reading.

Gated. Side effects that cost money or touch a person. Issue a refund, delete a record, send an email or a Slack message, place an order, write to a production database. These pause for a human.

Forbidden. Things the agent should never do regardless of approval. Drop a table, rotate a credential, push to main. Do not gate these. Remove them from the tool list entirely.

The bucket a tool lands in is a property of the tool, not the prompt. Decide it once, at registration time, and the agent never gets a vote.

The gate decorator

The cleanest place to attach the rule is the tool definition itself. A decorator marks a function as gated and carries the metadata the gate needs.

import functools


def gated(risk: str):
    """Mark a tool as requiring human approval."""
    def decorator(fn):
        fn.requires_approval = True
        fn.risk = risk

        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            return fn(*args, **kwargs)
        return wrapper
    return decorator


@gated(risk="high")
def issue_refund(order_id: str, amount_cents: int):
    # real side effect lives here
    return payments.refund(order_id, amount_cents)


def search_orders(query: str):
    # no decorator: auto-run
    return db.search(query)

The dispatcher reads requires_approval before it runs anything. A gated tool does not execute on the first pass. It produces an approval request and stops.

def dispatch(tool_fn, args):
    if getattr(tool_fn, "requires_approval", False):
        return request_approval(tool_fn, args)
    return tool_fn(**args)

That is the whole control point. One attribute, checked in one place. No tool can side-step it because the dispatcher owns the decision, not the model.

The approval payload

A human cannot approve what they cannot see. The payload you show a reviewer is the most important part of the system, and it is the part people skip.

A bad payload says "Agent wants to call issue_refund. Approve?" A reviewer who sees that twenty times an hour clicks yes on muscle memory. The payload has to make the reviewer understand the blast radius without reading the transcript.

import uuid


def request_approval(tool_fn, args):
    request_id = str(uuid.uuid4())
    payload = {
        "request_id": request_id,
        "tool": tool_fn.__name__,
        "risk": tool_fn.risk,
        "args": args,
        "summary": describe(tool_fn.__name__, args),
        "reversible": is_reversible(tool_fn.__name__),
        "requested_at": time.time(),
    }
    approvals.save(payload, status="pending")
    notify_reviewer(payload)
    return Paused(request_id)


def describe(tool, args):
    if tool == "issue_refund":
        amount = args["amount_cents"] / 100
        return (
            f"Refund ${amount:.2f} to order "
            f"{args['order_id']}. Money leaves the account."
        )
    return f"Run {tool} with {args}"

The summary field is plain English written by you, not by the model. It states the action and the consequence in one line. reversible tells the reviewer whether a mistake can be undone. Those two fields do more for approval quality than any UI work.

Show the reviewer the args, the summary, and enough of the agent's recent reasoning to judge intent. Do not show the raw token stream. They are approving an action, not auditing a model.

Resume tokens that survive a restart

Here is where naive designs break. The obvious version blocks the agent thread on a queue.get() until the reviewer clicks. That holds a worker for as long as the human takes, and the whole pending state lives in memory. Deploy a new version, or have the box restart, and every in-flight approval is gone.

Treat a pause like a durable checkpoint. The agent run stops, writes its state to storage keyed by a resume token, and returns. The reviewer's click is a separate event that loads the state back and continues the run.

def pause_run(session_id, request_id, messages):
    token = f"{session_id}:{request_id}"
    store.save(token, {
        "session_id": session_id,
        "messages": messages,
        "pending_request": request_id,
        "status": "awaiting_approval",
    })
    return token


def resume_run(token, decision):
    state = store.load(token)
    if state["status"] != "awaiting_approval":
        raise ValueError("not awaiting approval")

    request_id = state["pending_request"]
    if decision == "approve":
        result = execute_approved(request_id)
        tool_result = {"approved": True, "result": result}
    else:
        tool_result = {
            "approved": False,
            "result": "Human denied this action.",
        }

    state["messages"].append({
        "role": "tool",
        "content": tool_result,
    })
    state["status"] = "running"
    store.save(token, state)
    return continue_agent(state)

The agent never holds a thread while it waits. The pending approval lives in the same store you already trust for session state. A denied action feeds a normal tool result back to the model, so the agent can adapt and try a different path instead of crashing.

One rule for execute_approved: re-validate the args at execution time against the saved payload. The reviewer approved a specific refund amount on a specific order. Run that exact call. Never let the model rewrite the arguments between approval and execution, or the gate approves one thing and runs another.

Timeout defaults that fail safe

A pending approval cannot wait forever. The reviewer goes home. The Slack message scrolls off. You need a default that fires when nobody answers, and the default has to be deny.

def expire_stale_approvals(max_wait_seconds=3600):
    now = time.time()
    for req in approvals.pending():
        age = now - req["requested_at"]
        if age > max_wait_seconds:
            approvals.update(
                req["request_id"], status="expired"
            )
            token = req["resume_token"]
            resume_run(token, decision="deny")

When an approval expires, you resume the run with a denial, not a silent drop. The agent gets told the action was not approved and reports back to the user. Nobody is left wondering whether the refund happened.

Pick the timeout by risk and reversibility. A reversible internal action can wait a day. A wire transfer should expire in minutes, because a stale high-stakes approval is more dangerous the longer it sits. The wrong default is no timeout at all, which leaves money-moving actions in limbo until someone notices.

A second guard worth adding: bind the approval to the exact request. If the agent retries and generates a new request_id, the old approval does not carry over. One human yes maps to one action, once.

What to log

Every gate event is an audit record, and for money-moving tools it may be a compliance one. Log on a durable trace:

request_id, session_id, tool name, full args
the summary you showed the reviewer
decision (approve, deny, expired), reviewer identity, decision latency
execution result, including failures after approval

The line that matters in the postmortem is who approved this and what did they see. If your log has the args but not the summary the human read, you cannot answer it. Store both.

Where this stops

Approval gates make a single dangerous action safe. They do not size the queue. Route every gated call to one reviewer and you have built a bottleneck that approves on autopilot the moment volume climbs, which is the same risk you started with. Batch low-risk gated actions, reserve a human for the high-risk ones, and watch your approve-without-reading rate the way you watch error rate.

They also do not fix a tool that should have been forbidden. If a tool is dangerous enough that you would never approve it under load, it does not belong in the agent's hands behind a gate. Cut it from the list.

Start with the smallest move. Take your one tool that moves money or deletes data. Add the decorator. Write the one-line summary a reviewer will read. Ship the gate before that tool sees a real user, because the first uncaught autonomous refund costs more than the gate ever will.

The AI Agents Pocket Guide covers the patterns around this one: durable pause-and-resume, the human-in-the-loop control point, and how approval fits next to budgets and circuit breakers in a loop you can actually run in production. The chapter on autonomy boundaries pairs directly with the gate above.