I was building an AI agent readiness scanner called Siteline when I noticed something embarrassing: my own rate limiting was making things worse.
An agent would hit a 429 Too Many Requests. It would get back Retry-After: 60. So it would wait 60 seconds and try again. Reasonable. But it had no idea whether a cached result already existed for that domain. It had no idea what the actual limit was before it hit it. It had no way to know why the limit existed -- was this a temporary cooldown, or was it burning through a daily quota?
Every vague refusal generated follow-up traffic. The rate limit meant to protect the service was creating load on the service.
The pattern that kept showing up
I started looking at how other APIs handle this, and the same gap appeared everywhere. Rate limits exist. Communication about rate limits usually doesn't. And when it does it's just kinda...mean? Like there's a lot of "Stop, don't do this!" but no "Hey, here's the right way to do this."
A 429 with Retry-After: 60 tells a retry loop what to do. It doesn't tell an autonomous agent whether to retry, use a cached result, try a different endpoint, or inform the human. It doesn't tell a developer what the limits are before they hit them. It doesn't tell anyone why the limit exists.
When the caller is a person, they shrug and wait. When the caller is an agent, it retries faster, probes more systematically, and lacks the judgment to know when to stop. The waste compounds.
So I wrote a spec.
Graceful Boundaries
snapsynapse
/
graceful-boundaries
A specification for how services communicate operational limits to humans and autonomous agents.
Graceful Boundaries
A specification for how services communicate their operational limits to humans and autonomous agents.
The problem
Every unclear response generates follow-up traffic. A vague 429 causes blind retries. A vague 403 causes re-attempts with different credentials. A generic 500 causes indefinite retries. When autonomous agents are the caller, the waste compounds: agents retry faster, probe more systematically, and lack the human judgment to know when to stop.
Most services enforce rate limits but communicate them poorly. A 429 Too Many Requests with Retry-After: 60 tells a retry loop what to do. It doesn't tell an autonomous agent whether to retry, use a cached result, try a different endpoint, or inform the human. It doesn't tell a developer what the limits are before they hit them. It doesn't tell anyone why the limit exists.
What Graceful Boundaries does
The specification addresses three gaps that existing standards cover separately but…
Graceful Boundaries addresses three gaps that existing standards cover separately but nothing combines:
🔍 Proactive discovery -- limits are machine-readable before they are hit
📋 Structured refusal -- when a limit is exceeded, the response explains what happened, which limit applies, when to retry, and why
🧭 Constructive guidance -- the refusal includes a useful next step, not just a block
The spec defines four conformance levels, from "you added five fields to your 429s" (Level 1) to "agents can discover your limits, understand your refusals, follow constructive alternatives, and self-throttle on success responses" (Level 4).
What a bad refusal looks like
{
"error": "Too Many Requests"
}
The caller learns nothing. It retries. You get more traffic.
What a Graceful Boundaries refusal looks like
{
"error": "rate_limit_exceeded",
"detail": "You can run up to 10 scans per hour. Try again in 2400 seconds.",
"limit": "10 scans per IP per hour",
"retryAfterSeconds": 2400,
"why": "Siteline is a free service. Rate limits keep it available for everyone and prevent abuse.",
"alternativeEndpoint": "/api/result?id=example.com"
}
The caller knows the limit. Knows when to retry. Knows why the limit exists (a security signal, not a courtesy). And knows there's a cached result endpoint it can try right now instead of waiting.
Zero follow-up requests generated from that refusal.
Proactive discovery: let agents plan before they fail
Level 2 adds a discovery endpoint. Any agent can hit /api/limits and get back every enforced limit as structured JSON:
curl -s https://siteline.snapsynapse.com/api/limits | jq '{service, limits: .limits.scan}'
{
"service": "Siteline",
"limits": {
"scan": {
"endpoint": "/api/scan",
"method": "GET",
"limits": [
{
"type": "ip-rate",
"maxRequests": 10,
"windowSeconds": 3600,
"description": "10 scans per IP per hour."
}
]
}
}
}
An agent that reads this before making any requests can budget its calls. No discovery-through-failure.
Self-throttling on success
Level 4 adds proactive headers to successful responses:
RateLimit: limit=10, remaining=9, reset=3540
RateLimit-Policy: 10;w=3600
A caller seeing remaining=1 self-throttles before the next request. A caller seeing remaining=9 knows it has budget and won't add artificial delays. This is the highest-leverage traffic reduction mechanism in the spec, and it's aligned with the IETF draft-ietf-httpapi-ratelimit-headers draft.
It applies beyond rate limits
One thing that surprised me during development: the pattern works for every class of HTTP response, not just 429s.
A 400 with "why": "Blocks requests to non-public addresses to prevent server-side request forgery" tells the caller the security model. A 404 with "scanAvailable": true and a scanUrl tells the caller it can create the resource instead of giving up. A 503 with retryAfterSeconds and a statusUrl tells the caller when to come back and where to check status.
The spec covers five response classes (Limit, Input, Access, Not Found, Availability) with specific required and optional fields for each.
The security model
Transparency and security are in tension. The spec handles this with a simple principle: be transparent about rules, not mechanisms.
"10 requests per hour" is a rule. Safe to disclose. "We use Redis with a sliding window" is an implementation. Not safe. The spec includes eight security considerations (SC-1 through SC-8) covering limit calibration attacks, validation oracles, URL origin restrictions, and more. There's a full security audit in the repo.
Adopting it
Level 1 takes about 20 minutes. Add five fields to your existing 429 responses: error, detail, limit, retryAfterSeconds, and why. The why field is the one that matters most -- it must explain the purpose of the limit, not restate the error.
Level 2 adds a discovery endpoint. Level 3 adds constructive guidance to refusals. Level 4 adds proactive headers to successes.
The conformance checker validates any public URL:
node evals/check.js https://your-service.com
104 tests, zero dependencies.
Try it
The full spec is at spec.md. The reference implementation is Siteline, a Level 4 conformant AI agent readiness scanner you can verify yourself.
Licensed CC-BY-4.0. Use it, adapt it, build on it.
How do your APIs currently communicate limits? Is it the structured kind, or the "good luck figuring out what just happened" kind? Drop it in the comments 👇
Top comments (0)