My agents needed spending limits. Every option wanted the wallet first

Taha Baş — Thu, 25 Jun 2026 07:10:09 +0000

I've been building agents that call paid APIs and settle real payments. The moment you let an agent spend money on its own, two questions show up and never leave:

How do I stop it before it overspends, instead of finding out on next month's invoice?
Who holds the funds while it does that?

Most of what I found answered the first by quietly taking over the second. To cap an agent's spend, you hand a service your wallet (keys, or a prefunded balance), and it meters from there. That works. But now a third party sits between your agent and its money, and for anything touching real settlement, that trade-off felt backwards: keep control, but only by surrendering custody.

So I built it the other way around.

Enforcement without taking the wallet

The agent keeps its own API key and its own wallet. Nothing is handed over. A drop-in proxy sits in front of the agent's outbound calls and enforces the budget in the request path, so the call is checked before it leaves, not logged after it returns.

import { createPaymentClient } from "@gatewards/agent-sdk";

// the agent's own key; nothing is handed to us to hold
const client = createPaymentClient({ apiKey: process.env.AGENT_KEY, proxy: true });

const res = await client.get("https://api.example.com/data");
// over the cap → throws before the upstream is ever called

When an agent blows its budget, concretely:

per-call or daily limit exceeded → 429, the request never reaches the upstream
a runaway loop hammering the same resource → the pipeline auto-pauses with 423

No funds moved through us to make that happen. The gateway never generates, stores, or sees a private key. Onboarding an agent returns walletMode: "external", and that's the whole custody story.

The part I didn't expect to matter: dedup

Once every outbound call goes through one place, you notice how often agents in the same fleet make the identical call. Two agents asking the same API the same question inside the cache window, and the second one shouldn't pay twice.

So calls are deduplicated across the fleet. First call is a miss and pays; an identical call within the window is a hit and returns free. In practice that's the gap between a 1284ms paid round-trip and a 49ms cached one. It isn't the headline, but it's the piece that quietly pays for itself as the fleet grows, and it only works because everything already flows through one gate.

When it does pay

The agent pays directly: x402 on Base, USDC, from its own wallet. We're the rail and the governance layer, not the bank. This is on testnet today; mainnet is the next step, and I'd rather say that plainly than imply otherwise.

Where this honestly is

It runs. The governance and dedup above are live and tested. But it's early: pre-traction, still finding the operators it's actually for. I'm writing this partly to find them.

If you're running a fleet that spends real money on outside APIs and you've run into that same wall, I'd like to compare notes: what does your spend enforcement look like right now, and where does it break?

Observability told me exactly how much money my agents wasted. I wanted something that says no.

Taha Baş — Mon, 22 Jun 2026 07:49:25 +0000

Most AI cost tooling is an autopsy. It tells you, in detail, what you already spent — token counts, per-call traces, a
dashboard that turns red after the bill is locked in. None of it does the one thing I kept wanting: refuse the call before
it goes out.

I ran into this building agent tooling. Once I had more than a couple of agents hitting paid APIs on a schedule, two
problems showed up that nothing off the shelf solved cleanly.

Problem 1: observability is not control

Watching spend and stopping spend are different systems, and every tool I tried lived on the watching side. I could
reconstruct, after the fact, that agent 4 had a bad night. What I couldn't do was tell agent 4 "you're done for today"
without a hard limit that fires before the request leaves.

The closest thing providers offer is per-key budgeting. That sounds right until you run more than one agent. Keys get
shared, and the moment three agents share an API key a per-key cap can't tell them apart — you've lost the unit that
actually matters, which is the agent.

So the cap I wanted was specific:

per agent, not per key
enforced in the request path — over budget means the call is refused before it goes out, not logged after it returns
two dimensions: calls/day and a max per single call
a kill-switch on call-rate spikes, because the runaway-loop case is the one that hurts at 3am

Problem 2: I didn't want to hand over my keys

Plenty of "AI gateway" products will do governance for you — by becoming the thing that holds your API keys and signs
requests on your behalf. For a fleet that touches real money, handing custody of credentials to a third party is a hard no.
I wanted enforcement without custody: keep my own keys, let something in front of the fleet enforce the rules.

What I ended up building

Couldn't find a drop-in that did per-agent, request-path enforcement without taking custody, so I built one. It's a proxy
you point agents at. They keep their own keys. No rewrite, no framework lock-in — LangChain, CrewAI, or a raw script all
talk to the same proxy.

The integration is boring on purpose:

import { createPaymentClient } from "@gatewards/agent-sdk";

const client = createPaymentClient({ apiKey: process.env.GATEWARDS_AGENT_KEY, // identifies THIS agent proxy: true, });

// your agent's calls go through the proxy unchanged const res = await client.get("https://api.example.com/data");

You set the cap per agent (calls/day + max per call). When an agent goes over, the proxy returns a refusal in the request
path — your call gets a 429, not a silent overage you discover tomorrow. When an agent's rate spikes into loop territory,
the pipeline auto-pauses instead of grinding through your budget.

Because every call is already tagged by agent identity, attribution stops being a grep session. You get "which agent spent
what" for free, as a side effect of the thing that enforces the caps.

The one that surprised me: cross-agent dedup

This one I didn't plan for. Several agents poll the same endpoints — same GET, same params, different agents. The proxy
caches identical GET responses across the whole fleet, so five agents making the same call pay for one. On a polling-heavy
fleet that turned out to be a bigger line-item win than the caps.

What it deliberately doesn't do

Honesty matters more than a clean pitch, so the limits up front:

It doesn't estimate dollar caps. Caps are calls/day and max-per-call, not "$5/day". Estimating real-time per-call cost across arbitrary upstream APIs is a guess, and I'd rather give you a primitive that's exact than a dollar figure that's wrong. If you genuinely need a $ cap, I want to hear it — that's an open design question for me.
Dedup is GET-only by default. POST caching is opt-in per pipeline, because deduping a non-idempotent call is how you ship a bug.
It's a proxy in your request path. That's a dependency. It's built to fail open on its own errors rather than take your fleet down, but you should know it's there.

Where it is

It's live at gatewards.com, and the SDK is open source (Apache-2.0): npm i @gatewards/agent-sdk