All For Science

Posted on Apr 23

An agent called my payment API 50,000 times in 90 seconds. Here's what broke.

#ai #security #engineering #story

It was 2:47 AM on a Tuesday.

My phone lit up with 47 alerts in under a minute.

"Payment endpoint: rate limit exceeded"
"Payment endpoint: 429 errors"
"Payment endpoint: CPU 98%"

I opened the logs. What I saw made my stomach drop.

Agent payments-batch-23a7 had called the /transfer endpoint 50,342 times in 90 seconds.

Each call succeeded.

Each call moved money.

And the API key? It worked perfectly. Authenticated every single request.

How we got here

Three months earlier, we had built a multi-agent payment system.

Agent A (orchestrator) received a customer request
Agent B (risk check) validated the transaction
Agent C (payment executor) called Stripe
Agent D (notification) sent confirmations

We secured it the way everyone does: API keys.

Each agent had a key. Each service validated the key. Simple. Familiar. We shipped fast.

We thought we were done.

The root cause

The 2:47 AM incident wasn't a hack. No external attacker.

It was a bug.

Agent B (risk check) entered an error loop. Every time it failed to validate, it retried. Every retry created a new payment request. The orchestrator saw each request as legitimate — because the API key was valid.

The key told us who was calling. It told us nothing about how many times or under what conditions.

Our rate limits were at the human level: 1000 requests per minute per key. Agent B's error loop generated 50k requests in 90 seconds — well under the per-minute limit because the loop was distributed across multiple instances.

We had no per-agent counters. No per-action limits. No circuit breakers at the agent level.

What we tried first

Fix #1: Stricter rate limits

We dropped the limit to 100 requests per minute per key.

Three hours later, a legitimate batch job failed. Customers complained. We reverted.

Fix #2: Manual approval for payments

Every transfer needed a human to click "approve" in a dashboard.

Agents are supposed to be autonomous. This defeated the entire point. Agents waited minutes for human clicks. Throughput collapsed.

Fix #3: Hardcoded agent IDs

We embedded agent IDs into the payment service logic.

Works until you add a new agent type. Then you modify code. Then you test. Then you deploy. Then you pray.

We added four new agent types in two weeks. The hardcoded approach became unmaintainable overnight.

What actually worked

We realized we needed four things that API keys don't provide:

Per-agent counters — Agent B can call transfer 100 times. Then it's blocked.
Per-action limits — Risk check can call "validate" 10k times but "transfer" only 100 times.
Time-bound permissions — A batch agent only works between 2-4 AM. Outside that window, calls are rejected.
Delegation tracing — When Agent C calls Stripe, we need to know the full chain (A → B → C), not just C.

We built all four into a system we called Codios.

How Codios changed our 2:47 AM problem

Here's what happens now when an agent calls our payment endpoint:

Before (API keys):

Check key → valid → execute → money moves → audit log shows "Agent C called /transfer"

After (Codios):

Agent carries a signed capability contract
The contract says: "Agent B can call /transfer 100 times, expires in 1 hour"
The payment service verifies the signature offline (~0ms)
Checks the counter — if 100 reached, reject
Checks expiry — if outside window, reject
Consumes a nonce to prevent replay
Writes to audit log with full delegation chain
Then executes the transfer

When Agent B's error loop happened again three weeks later:

Call #101 hit the contract limit. Rejected. No money moved. My phone didn't ring at 2:47 AM.

What we learned

API keys are not enough for agents.

Not because API keys are bad. Because they solve the wrong problem. Authentication is table stakes. Authorization — with scope, limits, and time — is what agents actually need.

Build for failure loops, not just happy paths.

We designed security for the "agent works correctly" case. We forgot the "agent breaks and calls the same endpoint 50k times" case. That's where all the risk lives.

Delegation chains need full visibility.

When something fails three agents deep, you need to know the whole path. Partial logs are worse than no logs — they send you down the wrong debugging path.

Where we are now

Codios runs in production across our payment, risk, and notification agents.

Average enforcement overhead: 1.8ms
False positives from rate limits: 0 since deployment
Unauthorized calls blocked: 127,000+ (mostly from error loops like the one above)
2:47 AM phone calls: 0