snapsynapse

Posted on Mar 26 • Edited on Apr 14 • Originally published at snapsynapse.com

I built an open spec because every bad 429 was costing me twice

#showdev #api #webdev #agents

I was building an AI agent readiness scanner called Siteline when I noticed something embarrassing: my own rate limiting was making things worse.

An agent would hit a 429 Too Many Requests. It would get back Retry-After: 60. So it would wait 60 seconds and try again. Reasonable. But it had no idea whether a cached result already existed for that domain. It had no idea what the actual limit was before it hit it. It had no way to know why the limit existed -- was this a temporary cooldown, or was it burning through a daily quota?

Every vague refusal generated follow-up traffic. The rate limit meant to protect the service was creating load on the service.

The pattern that kept showing up

I started looking at how other APIs handle this, and the same gap appeared everywhere. Rate limits exist. Communication about rate limits usually doesn't. And when it does it's just kinda...mean? Like there's a lot of "Stop, don't do this!" but no "Hey, here's the right way to do this."

A 429 with Retry-After: 60 tells a retry loop what to do. It doesn't tell an autonomous agent whether to retry, use a cached result, try a different endpoint, or inform the human. It doesn't tell a developer what the limits are before they hit them. It doesn't tell anyone why the limit exists.

When the caller is a person, they shrug and wait. When the caller is an agent, it retries faster, probes more systematically, and lacks the judgment to know when to stop. The waste compounds.

So I wrote a spec.

Graceful Boundaries

[https://github.com/snapsynapse/graceful-boundaries/blob/main/spec.md][1]

Graceful Boundaries addresses three gaps that existing standards cover separately but nothing combines:

Proactive discovery -- limits are machine-readable before they are hit

Structured refusal -- when a limit is exceeded, the response explains what happened, which limit applies, when to retry, and why

Constructive guidance -- the refusal includes a useful next step, not just a block

The spec defines four conformance levels, from "you added five fields to your 429s" (Level 1) to "agents can discover your limits, understand your refusals, follow constructive alternatives, and self-throttle on success responses" (Level 4).

What a bad refusal looks like

{
  "error": "Too Many Requests"
}

The caller learns nothing. It retries. You get more traffic.

What a Graceful Boundaries refusal looks like

{
  "error": "rate_limit_exceeded",
  "detail": "You can run up to 10 scans per hour. Try again in 2400 seconds.",
  "limit": "10 scans per IP per hour",
  "retryAfterSeconds": 2400,
  "why": "Siteline is a free service. Rate limits keep it available for everyone and prevent abuse.",
  "alternativeEndpoint": "/api/result?id=example.com"
}

The caller knows the limit. Knows when to retry. Knows why the limit exists (a security signal, not a courtesy). And knows there's a cached result endpoint it can try right now instead of waiting.

Zero follow-up requests generated from that refusal.

Proactive discovery: let agents plan before they fail

Level 2 adds a discovery endpoint. Any agent can hit /api/limits and get back every enforced limit as structured JSON:

curl -s https://siteline.to/api/limits | jq '{service, limits: .limits.scan}'

{
  "service": "Siteline",
  "limits": {
    "scan": {
      "endpoint": "/api/scan",
      "method": "GET",
      "limits": [
        {
          "type": "ip-rate",
          "maxRequests": 10,
          "windowSeconds": 3600,
          "description": "10 scans per IP per hour."
        }
      ]
    }
  }
}

An agent that reads this before making any requests can budget its calls. No discovery-through-failure.

Self-throttling on success

Level 4 adds proactive headers to successful responses:

RateLimit: limit=10, remaining=9, reset=3540
RateLimit-Policy: 10;w=3600

A caller seeing remaining=1 self-throttles before the next request. A caller seeing remaining=9 knows it has budget and won't add artificial delays. This is the highest-leverage traffic reduction mechanism in the spec, and it's aligned with the IETF draft-ietf-httpapi-ratelimit-headers draft.

It applies beyond rate limits

One thing that surprised me during development: the pattern works for every class of HTTP response, not just 429s.

A 400 with "why": "Blocks requests to non-public addresses to prevent server-side request forgery" tells the caller the security model. A 404 with "scanAvailable": true and a scanUrl tells the caller it can create the resource instead of giving up. A 503 with retryAfterSeconds and a statusUrl tells the caller when to come back and where to check status.

The spec covers five response classes (Limit, Input, Access, Not Found, Availability) with specific required and optional fields for each.

The security model

Transparency and security are in tension. The spec handles this with a simple principle: be transparent about rules, not mechanisms.

"10 requests per hour" is a rule. Safe to disclose. "We use Redis with a sliding window" is an implementation. Not safe. The spec includes eight security considerations (SC-1 through SC-8) covering limit calibration attacks, validation oracles, URL origin restrictions, and more. There's a full security audit in the repo.

Adopting it

Level 1 takes about 20 minutes. Add five fields to your existing 429 responses: error, detail, limit, retryAfterSeconds, and why. The why field is the one that matters most -- it must explain the purpose of the limit, not restate the error.

Level 2 adds a discovery endpoint. Level 3 adds constructive guidance to refusals. Level 4 adds proactive headers to successes.

The conformance checker validates any public URL:

node evals/check.js https://your-service.com

104 tests, zero dependencies.

Try it

The full spec is at spec.md. The reference implementation is Siteline, a Level 4 conformant AI agent readiness scanner you can verify yourself.

Licensed CC-BY-4.0. Use it, adapt it, build on it.

How do your APIs currently communicate limits? Is it the structured kind, or the "good luck figuring out what just happened" kind?

DEV Community