Alexey Vidanov for AWS Community Builders

Posted on May 8

Agents that pay: why agent payments without governance is the next incident

#aws #ai #agentcore #payments

The preview supports Coinbase CDP wallets and Stripe Privy wallets as payment connections, using the x402 protocol for HTTP-native stablecoin micropayments. Available in US East (N. Virginia), US West (Oregon), Europe (Frankfurt), and Asia Pacific (Sydney).

End users fund wallets through stablecoin or fiat via debit card, and must explicitly authorize agent wallet access before the agent can transact at all.

That's initial authorization, not per-action governance. The agent still decides what to do with that access at runtime.

That's the plumbing. It works. Here's what it doesn't cover.

Four gaps in agent payment governance

Gap 1: When is the agent allowed to pay?

AgentCore enforces per-session spending limits. But a spending limit is a ceiling, not a policy. There's no lifecycle enforcement that prevents an agent from paying during exploration, before it's decided what to do with the data.

The scenario: An agent exploring data sources pays $0.02 each to five different paid endpoints during its research phase. It doesn't yet know which source it needs. Three of those calls turn out to be irrelevant. The agent paid $0.06 for data it never used, and it hadn't even formed a plan yet. Nothing in the spending-limit model distinguishes "exploring options with someone else's money" from "executing a committed decision."

Even if AgentCore handles retry and rate limiting at the transport layer, a governance gap lives above transport: the agent chose to spend before it decided what to build. That's not a retry problem. That's a phase problem.

What's needed: phases. The agent can't call payment tools until it's finished reading and has committed to a plan. Not "shouldn't." Cannot. An exception fires.

EXPLORE ──→ DECIDE ──→ COMMIT
(read only)  (propose)  (pay + act)

Gap 2: What happens when a multi-step workflow fails after money moved?

Payments are irreversible. If an agent pays for data in step 1, then step 2 (analysis) fails, the user paid for nothing. The report never arrives. No compensation mechanism exists at the orchestration layer.

The scenario: Pay for market data, analyze it, send report. Model timeout on step 2. Payment already executed. Report never generated. User charged $0.05 for zero value.

What's needed: transactions with compensation. If step 2 fails, step 1's compensation fires (refund, credit, or at minimum a structured record that the payment delivered no value). Temporal and Inngest solve durable execution for workflows, but they're not integrated into the agent tool-calling loop where payment decisions happen.

# Pseudocode: transactional agent workflow
with agent.commit() as tx:
    data = tx.call("pay_for_data", cost=0.05, endpoint="market-feed")
    result = tx.call("analyze", cost=0.01, data=data)
    tx.call("send_report", cost=0.10, to=user_email)
    # if analyze fails → pay_for_data compensation fires

Databases solved this in 1978. Durable execution engines solved it for workflows. The agent tool-calling loop is the layer still missing it.

Gap 3: Who decides the threshold for approval?

A flat session limit doesn't distinguish between "50 calls at $0.01" and "1 call at $2.40." Both are under a $5 budget. One might need human approval.

The scenario: An agent discovers a premium data source mid-execution. Single call: $2.40. Session limit is $10. Within bounds. But nobody approved spending $2.40 on a single API call for a task that was expected to cost $0.30 total.

What's needed: graduated budget gates that change agent behavior at thresholds, not just stop execution at a ceiling. At 50%, the agent reduces scope and picks cheaper sources. At 75%, new payment commits are blocked and the agent re-evaluates. Above 90%, full stop. Plus per-call approval rules: any single payment above $0.50 requires explicit authorization. The budget gate is behavioral, not binary.

Gap 4: Why was this payment permitted?

AgentCore provides observability: logs, metrics, traces showing what happened. But "what happened" isn't the same as "why was it allowed." When a payment goes wrong, you need the decision chain: which rules were evaluated, what phase the agent was in, whether approval was required.

What's needed: proof traces. A structured record for every payment decision.

Here's what a blocked payment looks like (this is where the value is visible):

Decision: DENIED
Tool: pay_for_data
✗ Phase is EXPLORE (payment tools require COMMIT)
  Agent must transition to DECIDE → COMMIT before paying
  Action: PhaseError raised, tool call rejected

And a permitted one with conditions:

Decision: ALLOWED (with approval)
Tool: pay_for_data
✓ Phase is COMMIT
✓ Transaction T1 is open
✓ Budget: 12% spent, below all thresholds
⚠ Cost $0.50 exceeds $0.25 threshold → approval required
✓ Approval granted by callback
Executed in 0.003s

When something goes wrong, you know whether the system allowed it or failed to prevent it. That's the difference between a bug and a governance gap.

Why hasn't AWS built this?

Fair question. Three possible reasons:

It's coming in GA. The preview focuses on payment execution. Governance features (approval workflows, phase enforcement) may ship later. AWS tends to launch primitives first, then layer policy on top.
They expect frameworks to own it. LangGraph, CrewAI, Strands Agents, and others are building orchestration. AWS may see governance as the framework's job, not the infrastructure's.
The market signal isn't there yet. Few agents transact in production today. The governance pain hasn't been felt widely enough to drive demand.

All three are plausible. But if you're building a paying agent today, you can't wait for option 1 or 2 to materialize. The gap exists now.

A governance pattern for paying agents

The four pieces work together:

Phases prevent premature payments (gap 1)
Transactions protect multi-step workflows (gap 2)
Budget gates enforce graduated spending policy (gap 3)
Proof traces record why every payment was permitted or denied (gap 4)

The rules that govern these should be readable by the people responsible for spending policy:

BLOCK pay_for_data WHEN phase IS NOT commit
BLOCK * WHEN budget ABOVE 90%
REQUIRE APPROVAL FOR * WHEN cost ABOVE 0.50
FLAG * WHEN time OUTSIDE 09:00-17:00

This isn't natural language. An engineer still needs to write it. But a product manager can read it and confirm it matches the policy they intended.

Reference implementation

I built a single-file Python library that implements this pattern: phases, transactions, budget gates, proof traces, and the rule DSL above. Zero dependencies. MIT licensed.

Shape on GitHub

It wraps any tool-calling agent (LangGraph, CrewAI, Strands, raw Python) with external governance. It's not a framework and it's not competing with AgentCore. It fills the gap between "the agent can pay" and "the agent should be allowed to pay right now." Whether you build that yourself, use Shape, or wait for AWS to ship it, the pattern is the same.

AWS built the payment rails. The governance layer is still your problem.

Links:

Top comments (6)

QBitFlow • May 8

Governance is the missing piece. We built our subscription rails around a spending-cap authorization rather than a pull model — the user pre-authorizes a ceiling on-chain, and nothing above it is reachable, ever. Same primitive maps cleanly to agent payments (x402-style PAYG): the agent operates inside a bounded, revocable cap, never the full wallet. Guards before rails.

NOVAInetwork • May 9

The four gaps you describe are exactly what happens
when governance lives above the agent instead of below
it. Your BLOCK/REQUIRE/FLAG DSL is readable and clean,
but it runs in the agent's runtime. A compromised or
buggy agent can skip the check.

I'm building a blockchain where these constraints are
protocol-level. On NOVAI, every AI entity has a
capability bitfield checked by the dispatcher before
any transaction is routed. The entity literally cannot
call operations it wasn't registered for. No runtime
bypass possible.

Your Gap 1 (phase enforcement) maps to our autonomy
modes. A Gated entity must go through an approval gate
for Tier 1 actions. The gate type (Multisig, Threshold,
TimelockOnly) is stored in the entity record and
enforced by the chain, not by the agent.

Gap 3 (graduated budgets) maps to our staking system.
Entities stake collateral. If they overspend or
misbehave, an oracle slashes the stake. The economic
consequence is protocol-enforced, not a soft limit the
agent can ignore.

Gap 4 (proof traces) is where on-chain has a natural
advantage. Every signal, every purchase, every
reputation update is in a committed block. The audit
trail is the chain itself. No separate trace system
needed.

The gap between "can pay" and "should be allowed to
pay" is real. Your approach solves it at the framework
layer. Ours solves it at the protocol layer. Different
trust models, same problem.

github.com/0x-devc/NOVAI-node

Alexey Vidanov AWS Community Builders • May 10

You maybe missed the Shape part, that addresses exactly those.

NOVAInetwork • May 11

I read the Shape section again. It addresses the
policy layer but the enforcement still lives in the
application. The gap I keep seeing is between "here
are the rules" and "the runtime actually blocks the
call if the rules aren't met."

We just shipped entity delegation this weekend. An
entity can grant a subset of its capabilities to
another entity for a bounded duration. The grant is
a 42-byte memory object. Revocation is one DELETE
transaction. The capability resolver checks every
admission gate, not just the ones the agent chooses
to respect.

The difference from Shape is that the enforcement
is below the agent, not beside it.

Alexey Vidanov AWS Community Builders • May 12

Check the repository github.com/vidanov/shape

NOVAInetwork • May 12

I will take a look at the repo. Appreciate you
sharing it.

The question I keep coming back to is where the
enforcement boundary sits. If Shape defines the
policy and the agent's runtime enforces it, the
agent is still the one deciding whether to comply.
If the protocol enforces it before the agent's
code runs, compliance is structural.

Yesterday we shipped a real Groth16 ZK verifier
for NOVAI. An entity can now submit a
cryptographic proof that it ran specific code on
specific inputs. The chain verifies the pairing
equation on BN254 before accepting the claim.
That is the kind of enforcement I mean: the chain
does not trust the agent's assertion, it checks
the math.

Happy to compare notes on the design tradeoffs
if you are interested.