A Capability Token for Agent Tool Calls: One Signed Object That Is Both the Gate and the Audit

#ai #security #agents #python

Disclaimer: This article was drafted with AI assistance and reviewed and edited by the author. The technical design and opinions are my own.

When an LLM agent decides to call a tool, something has to say "yes." In most codebases that "yes" is one of two things: a boolean returned from a policy check, or a row appended to an event log after the fact. Both are weak. A boolean carries no evidence — it's gone the instant the branch is taken. An event log carries no authority — it's written after the executor already committed to running the tool, so it can't gate anything, and it can be edited later without anyone noticing.

This piece is the sequel to my earlier one, Stop trusting the agent: bind tool-call approvals to the exact call, where I argued you should bind the approval to the exact call's arguments so an approval for one payload can't be reused on another. That fixed one attack. It left three others standing. Here I want to define the full object — a capability token — and show that the same object is simultaneously (a) the thing the executor checks before running the tool and (b) the audit record after. Enforcement and evidence collapse into one signed value. The earlier article bound one approval to one call; this one specifies the whole token and checks it, unchanged, across three agent frameworks.

The token

CapabilityToken = {
    "tool": "transfer",
    "args_hash": "sha256(canonical(args))",
    "caller_context_hash": "sha256(agent_id | session_id | user_id)",
    "approved_for": {"step_index": 7, "attempt": 0},
    "policy_version": "pol-2026-07-03:9f21c...",   # or content-hash of the rule set
    "prev_entry_hash": "sha256(previous ledger entry)",
    "exp": 1751560000,                             # wall-clock, still present
    "sig": "ed25519(all of the above)"
}

Each field exists to kill a specific failure that a boolean or a plain event log cannot catch.

1. args_hash — bind to the exact arguments. This is the predecessor's whole point, so briefly: an approval for transfer(amount=10) must not be replayable onto transfer(amount=10000). Hash the canonicalized args into the signed token; the executor recomputes the hash from the actual call and rejects on mismatch. Done. Move on.

The next three are the failure classes that per-call args-binding alone cannot catch. This is where the article advances past the last one.

2. caller_context_hash — bind to who is calling. Args-binding stops payload swaps but says nothing about context. A token minted for agent A in session S is still a perfectly valid signature over transfer(amount=10). Lift it into agent B's run, or a different user's session, and the args still match. Bind a hash of the caller identity (agent id, session id, user id) into the token and the executor rejects any call whose live context doesn't reproduce the hash. A token becomes non-transferable across contexts.

3. approved_for {step_index, attempt} — two clocks, not one. Wall-clock exp is necessary but not sufficient. Consider a retry queue: an approval is minted, the attempt fails, the payload sits in the queue, and a later retry picks it up and executes "fresh" — still inside its wall-clock window, args still matching, context still matching. Time-based freshness passed and the wrong thing happened. The fix is a second clock: bind the approval to a point in the execution sequence — approved for step N, attempt M. An approval is for a specific attempt, not for every retry that happens to reuse its payload. The executor checks both: not expired and this is the attempt it was minted for.

4. policy_version — reconstructable authority, not a dangling pointer. Suppose you log "rule R42 fired." R42 lives in a mutable policy store. Six weeks later, during an incident review, you look up R42 — and it now says something different, because someone edited the policy. Your log told you which rule fired but not what it said at decision time. Bind the policy version (or, better, a content-hash of the exact rule set) into the token. Now the entry reconstructs the authority under which the call ran — the decision is reproducible, not merely telemetry pointing at a moving target.

5. prev_entry_hash + a periodic external anchor — chain integrity. Here's the failure that individually-perfect tokens still miss: absence. Every entry can be well-formed, correctly signed, args-bound, context-bound — and the tail can be silently gone. A crash mid-write, an aggressive log rotation, or a deliberate tamper drops the last N entries, and a dropped tail is indistinguishable from "those calls never happened." You cannot tell missing from removed. So hash-chain the entries — each token carries the hash of the previous one — and periodically publish a checkpoint hash outside the ledger's own trust domain (a transparency log, a second account, anything the ledger's writer doesn't control). Now a broken chain is visible, and a truncation past the last anchor is detectable. Absence of an entry becomes distinguishable from removal of an entry.

6. sig — the audit half. Sign with an asymmetric key so the token is non-repudiable and verifiable by parties who can't mint tokens. This is what lets the same object be evidence: anyone with the public key can check it, later, offline.

The same token, three frameworks

The token is the invariant. Frameworks only differ in where you check it.

Google ADK (adk-python). BaseTool gives you before_tool_callback and after_tool_callback. The before-callback is the gate; the after-callback writes the evidence — the same object.

def before_tool_callback(tool, args, ctx):
    tok = ctx.state["pending_token"]
    if not verify(tok, tool.name, args, ctx):   # args_hash, caller_context, approved_for, policy, sig
        return {"error": "capability check failed"}   # block the call

def after_tool_callback(tool, args, ctx, result):
    append_to_ledger(sign(finalize(ctx.state["pending_token"], result)))  # chain + anchor

Microsoft Semantic Kernel. The natural hook is an auto-function-invocation filter. DENY = don't call next() / return a refusal. REDACT = mutate context.arguments before next(). The genuinely missing primitive is REQUIRE_APPROVAL, and SK's shape forces its meaning: the auto-invoke loop is synchronous, so "approval" cannot be an in-loop await — that would hold the chat-completion connection open while a human decides. It has to mean terminate-and-resume:

async def on_auto_invoke(context, next):
    tok = context.arguments.get("_cap_token")
    verdict = verify(tok, context.function, context.arguments)
    if verdict == DENY:
        return                                   # skip next(); refuse
    elif verdict == REQUIRE_APPROVAL:
        context.terminate = True                 # exit the loop; resume later
        # ...resume by re-invoking with a fresh, argument-bound token for this exact call
    elif verdict == REDACT:
        context.arguments = redact(context.arguments)
        await next()
    else:
        await next()

pydantic-ai. Check the token in the tool wrapper / RunContext before the body runs.

def guarded(fn):
    def wrapper(ctx: RunContext, **kwargs):
        if not verify(ctx.deps.token, fn.__name__, kwargs, ctx):
            raise ToolDenied(fn.__name__)
        return fn(ctx, **kwargs)
    return wrapper

Three call sites, one object. The freshness bug (field 3), the context-lift bug (field 2), and the moving-policy bug (field 4) are caught identically in all three, because they live in the token, not the framework. These are the same problems being worked through right now in upstream discussions I've taken part in — the ADK decision-ledger issue, the Semantic Kernel auto-function-invocation approval gap, and a pydantic-ai proposal to replace the plain tool_call_approved bool with an HMAC-bound approval token carrying (run_id, tool_call_id, expiry) — where the recurring question is always where the check belongs, once you accept that the invariant is a single bound object.

Evidence, not telemetry

That's the whole payoff. A plain event log is telemetry: it tells you a story about the past that you have to trust the storyteller for. A decision ledger of capability tokens is evidence — each entry is tamper-evident (signed + chained + anchored) and still meaningful when replayed later (args, caller, sequence position, and the exact policy text are all bound in). You can hand it to someone who wasn't there, who can't mint tokens, weeks after the fact, and they can check it. A boolean can't do that. A log line can't do that. One signed object does both jobs.

The token itself is maybe a hundred lines of your own code. The discipline is deciding it's a first-class object in your agent, not an afterthought bolted on when something has already gone wrong.

DEV Community

A Capability Token for Agent Tool Calls: One Signed Object That Is Both the Gate and the Audit

The token

The same token, three frameworks

Evidence, not telemetry

Top comments (0)