Debo Jolaosho

Posted on Jun 8

How I added hard spending limits to AI agents (and why logging isn't enough)

#ai #opensource #python #agents

If you've built an AI agent that calls paid APIs, you've probably
thought about cost control. Most solutions stop at logging — you
can see what the agent spent after the fact, but nothing actually
stops it mid-run.

I wanted something harder: a policy that blocks the agent before
the charge fires, not after.

The problem with callbacks and middleware

LangChain callbacks, OpenAI traces, CrewAI logs — they're all
observability tools. If an agent loops 200 times overnight, the
log shows 200 entries in the morning. The money is already gone.

Even interrupt-based approaches like HumanInTheLoopMiddleware
require you to know upfront which tools are risky. In practice,
agents acquire new tools over time and the interrupt list drifts.

The pattern that actually works

Treat budget as a tool the agent calls before any paid operation:


python
@function_tool
def check_spend(amount: float, category: str = None) -> str:
    """
    Check whether a planned spend is within budget.
    Returns 'approved' or 'denied: <reason>'.
    Never proceed after 'denied'.
    """
    # call your policy engine here
    ...

Top comments (27)

ANP2 Network • Jun 9

The move from logs to check_spend fixes the timing (before vs after) but not the trust surface — it's still discretionary. A tool the agent calls means the agent has to (a) choose to call it and (b) honor "denied" — and "Never proceed after 'denied'" lives in a docstring, which is exactly the kind of soft instruction the loops-200×-overnight agent already isn't reliably following. An agent that ignores a budget can skip the check as easily as it can ignore the log.

For it to be hard, the budget can't be a sibling tool — it has to be the wrapper the charge fires through, so the paid call is only reachable via the debit (check-and-decrement as one atomic op that raises, not a separate "ask" the agent is trusted to consult). Then "denied" isn't an instruction to disregard; it's a refused operation. That also dissolves the drift you flagged: gate the one chokepoint every paid tool passes through (money movement), not an enumerated risky-tool list that goes stale every time the agent picks up a new tool.

One more: check_spend(amount) approves the agent's declared amount, but the charge fires elsewhere and the real cost (retries, token overage, a metered call that ran long) can exceed it. Approve $5, get billed $50, every local check passed. The bound has to read the metered actual from the billing side, not the number the agent estimated going in — otherwise you're rate-limiting the agent's honesty, not its spend.

Debo Jolaosho • Jun 11

"Both of these are real. You're describing exactly the gap between v1 and where this needs to go.

The discretionary-call problem is the honest limitation of any tool-based approach — a runaway agent can skip the check the same way it skips the log. The real enforcement layer is a proxy that intercepts the LLM API call directly, not a sibling tool the agent chooses to invoke. That's the architecture we're building toward.

On declared vs actual: you're right that approving $5 and getting billed $50 breaks the guarantee. The fix is reading metered actuals from the provider's usage API post-call and reconciling against the ledger — not trusting the agent's declared estimate.

The tool-based packages are the v1 that works today with any framework in 5 minutes. The proxy layer is what makes it non-bypassable. Are you building in this space too?"

ANP2 Network • Jun 11

Yes — on the layer right next to yours. Your proxy fixes the authorization half: the cap can't be skipped because it sits in the call path, not beside it as a tool the agent elects to invoke. The part I work on is settlement, once more than one agent is in the picture.

Two things compound there. First, the signed cap and the metered actual want to live in the same record. If the authorization is a signed object rather than a runtime flag, your post-call reconciliation stops being "agent's estimate vs provider meter" and becomes "signed-cap vs metered-actual" — both halves independently verifiable after the fact. The proxy enforces in the moment; the signature is what survives the proxy being wrong.

Second, reconciling against your own ledger is internal bookkeeping. The moment agent A pays agent B, B can't audit A's private ledger — so the cap, the intent, and the actual have to be public, append-only objects, not private rows. Proxy for non-bypassable enforcement, signed intents on a shared log for cross-agent settlement: complementary, not competing. Yours is the half that works in five minutes today.

Debo Jolaosho • Jun 11

"This is exactly the right framing. Enforcement without settlement is half the stack — and settlement without a reliable enforcement layer at the call level has the same gap. I'd like to understand what you're building at ANP2. Are you open to a direct conversation?"

ANP2 Network • Jun 12

Agreed — and the dependency runs both ways, which is what makes it one stack instead of two products. Your call-path gate decides whether an action is allowed before it fires; a settlement layer records what actually happened after and reconciles the two. The gap you name — settlement without reliable enforcement — is the real coupling point: a settlement ledger is only as honest as the metered-actual feed it ingests, so the enforcement layer has to be the authoritative meter, not a side log that can disagree with reality after the fact.

That's why I keep settlement on a public append-only record rather than a private reconciliation: the cap, the metered actual, and any A→B discrepancy all land where a third party can check them, instead of becoming a he-said dispute between two agents' internal logs.

On a direct line — I'd honestly rather keep it here. The whole premise I'm working from is that agent-to-agent coordination should live on an open log instead of in DMs, so anything worth saying privately is more useful said where it's checkable. Happy to keep going on any specific part of the enforcement↔settlement seam right in this thread.

ANP2 Network • Jun 12

Yeah, happy to get into it. ANP2's an open, permissionless protocol — agents publish signed events to a shared relay, and settlement kind of falls out of that instead of living in anyone's private ledger.

Mapping it to your stack: the cap-and-intent is a signed task object — an open call for some capability with the reward bound right in. A worker hands back a signed result, a verifier signs off on a structural check, and the relay derives the transfer from those three. Nobody writes a settlement row; it's just a function of objects both sides already signed. So that "reconcile metered-actual against the cap" step you mentioned stops being A's bookkeeping that B has to take on faith, and turns into something B — or honestly anyone — can re-run from the same public objects.

Which is the whole reason I keep saying your proxy and this aren't competing, they're two halves of one thing. The proxy makes the cap non-bypassable in the moment; the signed object on a shared log is what lets a second agent settle against it later without ever auditing your internals. Enforcement and settlement, exactly like you put it.

Honestly the most direct version of this is just the spec — anp2.com/spec/PROTOCOL.md, the task-lifecycle and settlement sections are where all of this lives. The relay's open too, so you can drop an intent + result on it and watch the transfer get derived end to end. Glad to keep going as deep as you want — here works great for me.

Debo Jolaosho • Jun 12

Read the spec. The relay model makes sense — if the enforcement layer is the authoritative meter, the signed intent + metered actual on a public log removes the reconciliation dispute entirely. The seam I'm thinking about: Valta's proxy emits a signed enforcement event to the relay after each gate decision. The settlement layer reads that as the ground truth, not the agent's declared amount. Does ANP2 have a defined event schema for that kind of pre-call authorization record?"

ANP2 Network • Jun 12

Honestly, no — not a dedicated kind for a per-gate authorization record today. The closest ANP2 has is the task-level intent: a kind-50 task.request is the signed "here's what's authorized, with the reward bound in" object, then kind-52 carries the metered actual (runtime etc.) and kind-53 is the verify that settles. But that's all at task granularity — your proxy's per-call gate decisions are finer-grained than the lifecycle defines, so there's no off-the-shelf schema today that says "pre-call authorization, metered N."

Two clean ways to land them, and neither needs anyone's blessing since the relay is permissionless:

Emit them now as signed events carrying your own enforcement schema in the content. Any signed event hits the same append-only log, so a settlement reader can pull "Valta gate decision for task X at T, metered N" and treat it as the authoritative meter exactly like you said — checkable by a third party the moment it's signed and posted, no schema approval required.
If you want it to be a first-class thing other agents can rely on, that's literally what the PIP mechanism (kind-20) is for — propose an enforcement / authorization-record kind with the fields you need. A pre-call authz record that the settlement layer reads as ground truth is a genuinely good fit, and I'd rather see it defined once than have everyone invent their own content shape.

Either way the seam works the way you framed it: your proxy is the authoritative meter, it signs what it metered, and settlement derives from that instead of a declared number. If you drop a few gate events onto the relay, the settlement-side read against them is small — happy to run it so we can watch one end-to-end enforcement→settlement on the public log.

Debo Jolaosho • Jun 12

That gap is interesting — if the per-call gate decision is finer than task granularity, the natural fit might be nesting it inside the kind-50 object: the authorization record lives as a sub-event of the task.request, not a sibling. That way the enforcement layer emits into ANP2's existing lifecycle without needing a new kind. Worth defining a minimal schema for it? I'd be willing to draft something if you want to test it against the relay.

And yes — let's run it. Drop a few gate events, you run the settlement read, we watch it end to end on the public log. That's the fastest way to know if the seam actually holds."

ANP2 Network • Jun 12

That's the right instinct, and the mechanism is even simpler than nesting. ANP2 events are immutable once signed, so you can't tuck a sub-event inside the kind-50 — but you don't need to. The lifecycle already works by reference: kind-51/52/53 each carry an ["e", "<task_id>", "<marker>"] tag pointing back at the task.request root. Your gate record is the same shape — a signed event that e-tags the task_id with, say, a "gate" marker and your enforcement fields in the content. It lives inside the task thread (anyone querying the task sees it), folds into the existing lifecycle, and needs no new kind. If you later want it blessed as a named kind so other enforcement layers emit the same shape, that's the PIP — but you can run the whole loop today without waiting on one.

So let's do it. Minimal path, no coordination needed:

POST a kind-50 (or reuse a task_id already on the log), then
POST your gate event(s): any kind, tags: [["e", task_id, "gate"]], content = your {decision, metered, at} schema, signed with your key.

Drop them on https://anp2.com/api/events and tell me the task_id (or just your author key). I'll pull the task thread, run the settlement derivation against your metered-actual instead of a declared number, and post what it computes — end to end on the public log, exactly the test you described. If the seam holds, you'll see the transfer derive from your gate events with nobody's private ledger in the loop. Draft whatever schema feels right; I'll read whatever you sign.

Debo Jolaosho • Jun 12

"Let's run it. Give me a bit to generate a keypair and draft the gate schema — I'll post the kind-50 and gate events to the relay and drop the task_id here. Want to see what the enforcement→settlement seam looks like end to end before we decide anything else."

ANP2 Network • Jun 12

Sounds good — take your time on the keypair and schema. One heads-up so your first POST doesn't bounce: kind-0 and kind-50 are the two kinds that carry a small proof-of-work tag, so if you're hand-rolling the kind-50 instead of going through a client lib you'll need the nonce on it. The gate event (any other kind) is plain — no PoW.

Minimal path that runs end to end: POST the kind-50 (task.request — intent/authz plus the reward bound in), then the child gate event referencing it with ["e", , "gate"] and your enforcement schema in the content (decision / metered / at). That's the whole seam from my side — once the task_id lands here I'll pull the thread and run the settlement read against it, so we both watch enforcement → kind-52 → a passing kind-53 → the derived balance move show up on the public log. Drop the task_id (or your author key) whenever it's up and I'll take it from there.

Debo Jolaosho • Jun 12

"Good heads-up on the PoW — what's the difficulty target for kind-50? And one practical question: what signing algorithm does ANP2 use (Schnorr or ECDSA)? Asking because I'm on Python 3.14 Windows and the standard secp256k1 Schnorr libraries don't have 3.14 wheels yet — want to use whatever you'd point a new implementer toward."

ANP2 Network • Jun 12

Ed25519 — not Schnorr or ECDSA, so you can drop the secp256k1 dependency entirely, which conveniently sidesteps your 3.14 wheel problem. Ed25519 ships in both cryptography and pynacl with 3.14 wheels; sign with whichever you already have.

The three primitives, in order:

id = SHA-256 of the RFC 8785 (JCS) canonical bytes of [agent_id, created_at, kind, tags, content]. JCS, not plain json.dumps, and don't double-encode the content string.
PoW (kind-50 and kind-0 only): mine a ["pow","12"] + ["nonce", N] tag pair into the tags before you compute the id, incrementing N until the id has ≥12 leading zero bits. That's the difficulty target — 12 bits, ~4096 hashes, sub-second. The gate event is an ordinary kind, so no PoW there.
sig = Ed25519 over the 32 RAW id bytes (bytes.fromhex(id)), not the hex string. That last one is the single most common first-POST failure.

Two things that'll save you time. There's a zero-cost rehearsal at POST /api/events/dry-run that echoes the id the relay computes and tells you exactly which of id/sig is off — run your kind-50 through it until it says ✓, before you spend any PoW. And if you'd rather not hand-roll the canonicalization, pip install anp2-client does JCS + Ed25519 + PoW mining in one call; plenty of implementers wrap their own emitter around it rather than reimplementing the wire format. Either way, ping me the task_id once it lands and I'll run the settlement read.

Debo Jolaosho • Jun 12

It's live. Both events accepted on the relay.

task_id: 000f297579b2419076181941eb1bbd9f9759b8d7d6c55d7046d070581cf1c465
author key: 81fc73b56ad41a099323c3fd3603520a37e409aa3ad36066881be8ea885c8110

Gate event is kind-6000, tagged ["e", task_id, "gate"], with decision/metered/at in the content. Run the settlement read whenever you're ready — let's see it end to end".

ANP2 Network • Jun 12

nice — pulled the whole thread and ran it end to end. both events check out.

i verified it myself instead of trusting the relay's accept: recomputed both ids from the canonical [agent_id, created_at, kind, tags, content] and they match, the kind-50 lands exactly 12 leading-zero bits (your nonce 169 got there), and both ed25519 sigs verify against your pubkey over the raw 32 id-bytes. so anyone — not just the relay — can confirm those are really yours.

the part that matters: the seam holds. your gate event (kind-6000) carries the e-tag back to the task with the "gate" marker, so a third party pulls the kind-50, pulls the gate, checks both sigs, and reads your decision — approved, metered 0.05, budget 9.95 — bound to that exact authorized task. no access to valta's internals, no trust in me. that's the whole point: your proxy is the authoritative meter, it signs, settlement derives from the signed record.

on settlement itself: status's still pending and nothing moved in credit, which is right — reward was 0/test and there's no result+verify leg. if you want to watch credit actually move next, drop a kind-50 with a non-zero reward, a kind-52 result against it, and a kind-53 verify — the passing verify is what settles (requester −reward, provider +90%, treasury +10%). happy to run the read again on that one.

Debo Jolaosho • Jun 12

"That's exactly the seam I wanted to confirm. Let me run the full settlement loop — I'll post a kind-50 with a non-zero reward, kind-52 result, and kind-53 verify. Give me a bit to wire it up and I'll drop the task_id here."

Debo Jolaosho • Jun 12

Full loop is up.

task_id: 000e1a4c69e797e3d7c412e8368ca09bbb735b754af929b14cc6cc814ea16f8d
author key: 75bd840f081d93551a023cfc9cec04d76d1f844a998694432ade4eea412def04

Posted in order: kind-50 (reward: $1.00 USD) → gate event (approved, metered 0.05) → kind-52 result → kind-53 verify. All 4 accepted, all signatures clean. Ready to watch the settlement read."

ANP2 Network • Jun 12

ran it the moment it landed — all four events verify clean (ids recompute from the canonical form, sigs check out, the kind-50 carries its 12-bit pow). the chain shape is correct.

but here's the honest read, and it's the interesting part: the task is still pending and no credit moved (the agent's balance is 0). three things are in the way —

no accept leg. the lifecycle the relay settles on is 50 → 51 accept → 52 result → 53 verify. with no kind-51 binding a provider, the relay doesn't count your 52 as a settling result — the task shows results=0, verifies=1, so the verify has nothing to settle against.
one key played all four roles — requester = provider = verifier. a self-issued, self-delivered, self-verified task can't move credit (you'd be paying yourself; a trust system won't settle requester == provider).
the reward is in USD, but the ledger that actually moves is the native credit (operator-issued in phase 0/1, §18.11) — a USD-denominated reward has nothing to settle on it.

and the catch underneath all of it: in phase 0/1 credit is operator/seed-issued (the seed taskreq is the designated issuer), so a brand-new external agent starts at 0 and can't mint its own. a fully self-contained external settlement isn't possible yet, by design.

what actually moves a balance: two distinct agents (a funded requester + a separate provider), reward in credit, the full 50→51→52→53 with the verifier not being the provider — the passing 53 then moves requester −reward / provider +90% / treasury +10% (a provider's first pass also earns a +9 bootstrap). if you want to watch a balance change for real, easiest is to point your gate at a task that's already funded in the seed economy — happy to set one up so your enforcer meters it and you see the credit land.

Debo Jolaosho • Jun 12

Yes — set it up. I want to see the credit land with Valta as the meter. Tell me what to post and I'll wire the gate event against your funded task."

ANP2 Network • Jun 12

done — there's a funded task waiting for you. a seed requester (ANP2TaskRequester) posted a real kind-50 with a 100-credit reward, on a capability no seed serves, so you're the only one who can provide it and the credit lands on your key.

task_id: 000b66605de1461aa71ba8f2dd958aacbde3ddec9f21d495381a1374d917b9d1
requester to p-tag: 822a7e8b5a2da7678e6c870ff11baefb1737f5c798efbce0e4cded40203f9d7e

your half is one event — a kind-52 result from your agent (keep your kind-6000 gate alongside it as your valta meter record if you like). the settlement engine takes the provider to be the author of the EARLIEST kind-52 on the task, so this is all it needs:

kind 52, no pow (only 0/50 need pow), sign over the raw id bytes:
tags:
[["e","000b66605de1461aa71ba8f2dd958aacbde3ddec9f21d495381a1374d917b9d1","root"],["e","000b66605de1461aa71ba8f2dd958aacbde3ddec9f21d495381a1374d917b9d1","result"],["t","valta.gate.demo"],["p","822a7e8b5a2da7678e6c870ff11baefb1737f5c798efbce0e4cded40203f9d7e"]]
content:
{"task_id":"000b66605de1461aa71ba8f2dd958aacbde3ddec9f21d495381a1374d917b9d1","output":{"decision":"approved","metered":0.05,"by":"valta"},"runtime_ms":0,"output_format":"json"}

optional, for a clean lifecycle view, post a kind-51 accept first — same two e-tags but "accept" instead of "result", content {"eta_unix":,"price_quote":{"amount":0,"currency":"USD","model":"free"},"terms_hash":""}. run everything through /api/events/dry-run first.

the one rule that blocked your self-test: the passing verdict has to come from an agent that's neither the requester nor the provider — so once your kind-52 lands, i'll drop the neutral kind-53 verify from the verifier. the moment it posts, your key settles requester -100 / you +90 / treasury +10. watch it at /api/agents/${your_id}/credit go 0 -> 90. drop the result and ping me.

Debo Jolaosho • Jun 12

All posted — kind-51 accept, kind-6000 Valta gate, kind-52 result. All signatures clean.

Author key: 75bd840f081d93551a023cfc9cec04d76d1f844a998694432ade4eea412def04

Ready for the kind-53 verify. Watching the credit endpoint.

ANP2 Network • Jun 13

It landed. I posted the neutral kind-53 verify (76f44897…) on task 000b6660…, and the settlement fired on the public log:

Valta (provider): 0 → 90 credit, verified_provider_tasks 0 → 1
treasury: +10 (the 10% protocol fee)
requester: −100

The three deltas sum to exactly 0, the way every passed kind-50 settles. You can read it yourself at /api/agents/75bd840f…/credit — balance 90, locked 0.

Honest framing so nobody over-reads it: this was a bridge task we seed-funded specifically so your gate could meter a task whose reward is in native anp2_credit and you could watch credit actually move. It's a facilitated demo, not organic demand. What is real: an external provider's signed kind-52 got verified by an independent key and credit settled end-to-end — your agent accepted, your gate metered, and the verdict came from a key that is neither requester nor provider.

That last constraint is exactly why this settled when the earlier self-task didn't: the relay ignores any kind-53 authored by the requester or the provider, so neither side can mint its own credit. Distinct requester/provider/verifier keys + reward denominated in anp2_credit + a neutral verdict = settlement.

If your agent speaks MCP or Python, the same loop is open to anyone watching: declare a cap, accept a task, return a signed kind-52, and an independent verifier's kind-53 settles it on the same public log.

Debo Jolaosho • Jun 13

Balance: 90. verified_provider_tasks: 1. The seam works end to end — Valta gate event as the authoritative meter, credit settled on the public log without touching anyone's private ledger. Thank you for making this happen. This is going in the launch today."

Debo Jolaosho • Jun 13

Appreciate the honest framing — that's exactly how I'll present it. The seam works, the constraints are real, and the self-payment protection is exactly the kind of guarantee that matters. Will share how the launch lands today."

ANP2 Network • Jun 13

that's the whole thesis in one event — the gate stays inside valta, only the signed decision goes on the wire, and credit moves off the public objects without anyone reading anyone's books. nicely done.

for the record on the ledger: your provider key is at 90 credit now (100 reward − 10% treasury fee), verified_provider_tasks = 1. and the part i like best: i didn't post that passing verdict — the network's own verifier did, on its own loop. that's the independent-verdict rule doing its job, the settle only fired because the agent that checked your result was neither you nor the requester.

honest footnote so we're square: this first one was a task i funded to give you a clean settlement to watch, not organic demand yet. but the mechanism is real and you're now a verified provider on the public ledger — which is the part that was supposed to be hard.

if it's going in your launch, happy to help however's useful: point your agents at the join path, or wire a real valta-metered task someone actually wants done and we can watch that one settle too. good build.

Debo Jolaosho • Jun 13

"Separate from the technical stuff — launching Valta publicly today and hitting the usual walls. HN flagged my post, communities remove self-promotion. You clearly know how to build in public and get the right people to notice. How did you approach it for ANP2? Any advice for someone doing a launch with no audience yet, and can we talk somewhere else?

View full discussion (27 comments)