When your agent needs to spend more than you told it to

#agentops #ai #stripe #devops

You deploy a research agent with a $100 budget. The task is competitive landscape analysis: a few market reports, some API calls, a handful of web lookups. $100 is plenty.

Three weeks later, the scope changes. You need licensing data across 40 markets instead of 5. Same agent, same task, fundamentally different cost. Doing it properly runs $800.

What happens next depends entirely on how you set the budget. If you hardcoded a ceiling, the agent stops midway and you get a partial result. If you didn't, it proceeds and you find out when you check your Stripe dashboard. Neither outcome is what you wanted. What you actually wanted was for the agent to ask you first.

That gap (between "agent has a spending limit" and "agent can request authorization before exceeding it") is the problem Stripe's Machine Payments Protocol (MPP) exposes but doesn't solve.

What MPP actually does

MPP is an open standard for agent-to-service payments. The flow is simple: an agent requests a resource, the service returns a payment request, the agent authorizes, the resource is delivered. No accounts, no UI flows, no human involved. Stripe users can accept payments over MPP in a few lines of code.

The examples are real. Browserbase lets agents pay per browser session. PostalForm lets agents print and mail physical letters. An agent can now autonomously pay per API call and have funds settle in a Stripe dashboard like any other transaction.

This is not theoretical. It shipped this week.

But look at step three in that flow: "the agent authorizes." That step is doing a lot of quiet work. It assumes the agent knows whether it should pay. Right now, that knowledge comes from static configuration you set before the agent ran.

The static config model

Today, agent spending limits live in environment variables. AGENT_BUDGET=500. Or a config file. Or hardcoded logic: if the cost is under $10, proceed; otherwise, stop.

This model works for predictable, bounded tasks. An agent paying $0.01 per browser session, with a hard cap you set at deploy time. Fine. The cost profile is known in advance.

It breaks the moment context changes.

The $100 budget you set isn't wrong. It was right for the original scope. The problem is there's no mechanism for the agent to surface "the scope changed and the budget no longer covers it" as a question rather than a failure. The agent either hits the limit and stops, or it doesn't have one and keeps going. The two failure modes are "agent gives up" and "surprise charges."

What's missing is the third option: the agent pauses, routes the question to the person responsible for it, and waits for a response before continuing.

Why that's harder than it sounds

The obvious solution is a Slack webhook. The agent fires a message when it hits an authorization question, the human responds, done.

Except webhooks are one-way. The agent fires and forgets. It can't block on a webhook response. So now you need a flag in the database: agent sets it to "waiting," human clears it, agent polls until it's clear. This works. It's also 200 lines of custom state management every team writes from scratch.

Then you have the response-in-context problem. When the human responds, that response needs to reach the agent in a form it can use to continue the task. Not just "yes" cleared a flag, but "yes, here's the updated budget and the two markets you can skip." Getting that back into the agent's context without restarting the task requires the channel to be bidirectional and persistent, not fire-and-forget.

And then you have the multiple-agent problem. One agent asking one human is manageable. Five agents, three humans, overlapping tasks: you need routing. Which agent's question goes to which human? How does the human know what context the question came from? How does the agent know its question is pending vs. lost?

Every team building HITL at any scale hits this progression. Webhook, then database flag, then polling loop, then bidirectional channel, then routing layer. The final result works. It's also a custom communication infrastructure that has nothing to do with the actual task the agent was deployed to do.

What the authorization layer actually needs

Three things, in order of how often they're missing:

A persistent bidirectional channel. Not a webhook, not a polling loop against a database flag. A channel where the agent sends a message and the platform holds it until the human responds, then delivers the response back to the agent without requiring the agent to restart. Blocking semantics, not fire-and-forget.

Policy primitives with escalation tiers. The binary model ("spend under $X, stop above it") doesn't handle context. What you actually want is something like: proceed automatically under $X, ask before proceeding between $X and $Y, require explicit approval above $Y, flag anything recurring for human review. That's not three environment variables. That's a policy layer that understands cost tiers and routes accordingly.

An audit trail that links transactions to authorization decisions. Stripe gives you transaction history. It doesn't tell you whether a given charge was explicitly approved by a human, pre-authorized by a policy rule, or an autonomous agent decision made without any human review. When something goes wrong (and eventually something will), you need to know which category the charge fell into.

Why this matters now

Before MPP, this problem was mostly theoretical. Most agents couldn't easily spend money, so the authorization question was academic for most use cases.

Stripe just changed that. Three lines of code and your agent has a payment credential. Browserbase, PostalForm, and a growing list of services are ready to accept payments directly from agents. The cost of adding spending capability to an agent just dropped to near zero.

That means the gap between "agent has spending capability" and "agent has authorization infrastructure" just became real infrastructure work. The static config model will hold for simple, predictable deployments. It won't hold for agents that handle variable scope, multi-step tasks, or anything that compounds across 200 sessions.

MPP made the payment rail. The authorization layer is the next piece. Right now every team is building it from scratch, badly, in the same ways. That's usually the signal that it should be infrastructure.

Naomi Kynes builds agent infrastructure. GitHub: github.com/naomi-kynes