Private Agent Rate Limits
Draft status: unpublished DEV API draft for rendered QA only. This is not public publication approval, queue approval, or permission to publish.
Disclosure: AI tools helped with source collection, outline pressure-testing, and editorial review, but the article text and publication decision remain under human control.
Crypto disclaimer: this article discusses privacy, rate limiting, and API metering as technical infrastructure. It is not investment advice, a token recommendation, a trading signal, or a claim that any protocol here is ready for production use.
First request
The privacy question starts the moment a model request reaches an API with no durable account label on it. What private agent rate limits try to do is let that request prove it sits inside an allowed quota class, without dragging along the usual account, card, wallet, or login identifier on every call. The current IETF Privacy Pass draft on Anonymous Rate-Limited Credentials Cryptography gives the shape worth borrowing: a credential can be presented a fixed number of times, and those presentations are meant to stay unlinkable from issuance and from each other.
None of that makes the request anonymous AI. The provider still sees the work it was asked to do, when the call landed, how big the output was, how refunds behave, what policy evidence got attached, unless the product reworks those surfaces too. One account edge comes off the request. The rest of the service stays exactly where it was.
Draft status matters here. The Privacy Pass working-group document list shows ARC as active draft work, and the cryptography document is an Internet-Draft, not a finished standard. You can learn from the boundary it draws without pretending the draft is settled infrastructure.
Issuer
Before any request can prove a thing, an issuer has to hand the client a proof object for later. In ARC the server issues a credential tied to client secrets and public application information, and the client comes back later with derived proofs. The server checks a presentation without learning which issuance flow it came from.
The trap is treating unlinkability as a blank check. ARC fixes a maximum number of presentations per credential, and going past that agreed limit breaks the guarantee. Privacy here rides on presentation limits, server state, and application configuration. It does not ride on the word credential.
Cloudflare's engineering write-up on rate-limiting bots and agents with anonymous credentials is worth reading for deployment pressure, since it gets into state, binding, revocation, origin tradeoffs, and the bot-or-agent rate-limiting case. Keep in mind whose surface that is. One vendor describing its own engineering does not, on its own, make any single credential family the ecosystem default.
Meter
ZK API Usage Credits push the same problem into funded API metering. The Ethereum Research proposal ZK API Usage Credits: LLMs and Beyond leans on LLM inference as its motivating case: deposit once, then make many API calls while trying not to tie each one back to the deposit or to the other calls. The credit there proves the work is covered. It is not civil identity, and the proposal is not an adopted Ethereum protocol.
The way the proposal is built makes that boundary easier to audit. Initial deposit, ticket index, refund tickets, proof, nullifier, request payload, each is a separate piece of the flow. Membership, refund accounting, solvency, and RLN-style nullifier data can all carry metering. None of those fields say anything about whether prompt content, timing, output sizes, refund values, or provider logs stay private.
Payment privacy and prompt privacy split right there. The metering proof can hide the link back to a depositor while the prompt sits in plain view of the inference provider. Put workplace details, location clues, private intent, or wallet context into that prompt, and no amount of quota evidence scrubs the leak.
Reuse
Abuse control needs a handle for repeat use, but that handle does not have to be the user's name. The RLN documentation describes Rate-Limiting Nullifier as a zero-knowledge protocol for spam prevention in anonymous environments. The claim that buys you is narrow: a nullifier can cap repeated use without ever becoming a real-world identity.
ARC and RLN should stay apart in your head. ARC is an anonymous credential family built around fixed-limit presentations. RLN is a nullifier-based spam-prevention primitive, a different gadget entirely. Fuse them into one invented protocol and you both overstate the evidence and make the design harder to audit.
The failure event is narrow too. In the ZK API usage proposal, server-side checks around ticket indices and nullifiers can catch double-spend patterns. What that catches is a protocol event around reused capacity. It does not read intent, handle moderation, or settle legal liability.
Side channel
The account link can vanish while the behavioral trail stays put. Discussion around the ZK API usage proposal flags refund values, output token counts, time to first token, latency, and cache behavior as surfaces that can relink a caller. So the safe claim stays modest: a credit proof takes off one identity edge, not every statistical one.
Provider telemetry is the larger product problem. A service that watches prompts, response sizes, timing, policy flags, abuse evidence, and refund values can still cluster behavior while the cryptographic proof is perfectly sound. The credential does its job. The system around it keeps fingerprinting the caller anyway.
Pricing and logging choices either respect the proof or quietly undo it. Fixed input and output classes, padding, trimmed refund signals, bounded policy evidence, all of that can pull leakage down. None of it lives inside a bare credential, which is why a nullifier never stands in for privacy architecture.
Ledger
The artifact worth keeping is not a receipt table. For private agent rate limits, a leakage ledger sorts the flow by who owns each piece: the proof, the service, or policy logging.
PROOF-OWNED
credential presentation -> quota class, not civil identity
ticket index or nullifier -> reuse event, not user intent
membership proof -> allowed set, not prompt safety
SERVICE-OWNED
payload -> visible model input unless another privacy layer hides it
timing and output size -> correlation surface
refund signal -> possible behavior link
POLICY-OWNED
abuse note -> provider decision evidence
moderation record -> legal and operational surface
public claim allowed -> account link removed from this request
public claim refused -> anonymous AI user
The ledger swaps the question. Not "is the user anonymous?" but "which component owns this piece of evidence?" That swap is the whole difference between a quota proof and a privacy story. The proof can vouch that a request has capacity. The service and policy layers can still leak enough to relink the behavior behind it.
Agent
Agent traffic is what makes the boundary worth the trouble: the calls come fast, run themselves, and sometimes cost real money. The Anonymous Bot Authentication draft lays out a web-agent problem where services want to sort wanted, unwanted, and rate-limited automated traffic without pinning every request to a specific sender. That sits close to the API boundary here. The draft is still an individual Internet-Draft, not an endorsed standard.
A credential offers a softer answer than a permanent login. The client can prove membership in an allowed class, remaining quota, or funded capacity, all without showing a durable account identifier on every call. Read that as privacy-preserving abuse control, not as a hall pass around provider rules.
The hype version is easy to throw out. Autonomous agents do not automatically need a token rail, and not every AI request belongs on a blockchain. The narrower design earns its keep: a credit, credential, or nullifier can stand as evidence of capacity without ever turning into identity.
Refusal
The strongest product language for private agent rate limits is a refusal. Refuse to call the agent anonymous when all you removed was the account link. And refuse to claim the provider cannot relink requests while prompts, timing, output sizes, refunds, logs, and policy evidence are all still sitting there in view.
Refuse, too, to dress up draft-level or proposal-level work as production infrastructure. ARC is active draft-level work, Anonymous Bot Authentication is an individual Internet-Draft, and ZK API Usage Credits is a proposal and discussion. Their value is that they make the boundary inspectable, not that they settle deployment reality.
The final claim stays deliberately small. Private agent rate limits can let a request show quota or funded capacity without carrying a stable account label. Everything past that belongs to the rest of the architecture: prompt privacy, traffic shaping, refund design, logging discipline, policy evidence, and legal duties.







Top comments (0)