Field notes from running Claude in a production workload where 529
overloaded_errorbecame a weekly incident. I don't work for Anthropic. Everything below is based on behaviour observed against the public API.
I'm going to skip the pitch and start with the thing that will save you an hour if you landed here from a PagerDuty alert at 3 a.m.
TL;DR
-
HTTP 529 with
overloaded_erroris not your rate limit. It's Anthropic saying the fleet is out of capacity for the model you asked for. - Retrying the same model, same region, same request, same second is exactly what made the wall bigger. Your retry storm is what 529 is trying to avoid.
-
Respect
retry-after. If it's missing, jitter 2–8 seconds and fall back to a smaller model (Haiku) on non-critical paths. - When your own retry policy makes it worse, look at a third-party signal (status dashboards, community reports, latency probes) before escalating inside your team.
The 529 is not the 429
Most client libraries I've seen wrap retries around 429 "Too Many Requests" and leave it there. That's fine for your per-key quota. But Claude will also return HTTP 529 with:
{
"type": "error",
"error": {
"type": "overloaded_error",
"message": "Overloaded"
}
}
A 429 means you are sending too much. A 529 means the model's capacity pool is too small right now. The same request, replayed in the next second, gets the same answer. Two failure modes, two very different retry policies.
Read the headers first
Before touching any retry code, check if the response carried these headers:
anthropic-ratelimit-requests-limit: 4000
anthropic-ratelimit-requests-remaining: 3987
anthropic-ratelimit-requests-reset: 2025-11-03T12:34:56Z
anthropic-ratelimit-tokens-limit: 400000
anthropic-ratelimit-tokens-remaining: 398420
anthropic-ratelimit-tokens-reset: 2025-11-03T12:34:00Z
retry-after: 12
If requests-remaining is non-zero and you're still getting 529, the server-side capacity wall is the problem, not your key. Stop blaming your gateway.
If retry-after is present, use it literally. The easiest mistake is a library that ignores the header and retries in 500ms.
A retry policy that actually respects 529
TypeScript, no SDK-specific code so it's portable:
type RetryInput = { attempt: number; retryAfterSec?: number };
function backoff({ attempt, retryAfterSec }: RetryInput): number {
if (retryAfterSec) return retryAfterSec * 1000;
// 1s, 2s, 4s, 8s, 16s — with 50% jitter
const base = Math.min(16_000, 1_000 * 2 ** attempt);
return Math.floor(base * (0.5 + Math.random() * 0.5));
}
async function callClaude(req: ClaudeRequest, maxAttempts = 5) {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
const res = await fetch("https://api.anthropic.com/v1/messages", {
method: "POST",
headers: {
"x-api-key": process.env.ANTHROPIC_API_KEY!,
"anthropic-version": "2023-06-01",
"content-type": "application/json",
},
body: JSON.stringify(req),
});
if (res.ok) return res.json();
if (res.status === 529 || res.status === 429) {
const retryAfter = Number(res.headers.get("retry-after")) || undefined;
if (attempt === maxAttempts - 1) throw new Error(`claude ${res.status} after ${maxAttempts}`);
await new Promise(r => setTimeout(r, backoff({ attempt, retryAfterSec: retryAfter })));
continue;
}
throw new Error(`claude ${res.status}: ${await res.text()}`);
}
throw new Error("unreachable");
}
Two things matter:
- Jitter. If ten workers retry on a deterministic schedule, they stampede the model the moment its queue opens. Random jitter spreads the load.
- Capped max. Don't let attempt 7 sleep 128 seconds. If 16s didn't work, the capacity issue is big enough that you should be on a fallback path, not holding a user request open.
Model fallback as a first-class path
For anything that isn't user-facing reasoning, fall back to Haiku after the second 529:
async function callWithFallback(req: ClaudeRequest) {
try {
return await callClaude(req, 2); // fast fail on primary
} catch {
const downgraded = { ...req, model: "claude-haiku-4-5-20251001" };
return callClaude(downgraded, 3);
}
}
Haiku is a different capacity pool. When Opus or Sonnet are overloaded, Haiku is often fine. You trade quality for availability — almost always the right call on a transient spike.
Don't trust your own retries blindly
Here's the real trap: once you've got a retry policy, every 529 becomes invisible. You quietly burn latency budget, your p99 creeps up, your error budget gets eaten without a single page firing. Dashboard green, users angry anyway.
Two signals I rely on to see the "hidden 529 drip":
-
Per-model error counters. Track
anthropic_http_5xx_total{model="opus",code="529"}as its own metric, not rolled up under a genericapi_errors_total. You want 529 as visible as a bill. - External latency probes. Before you decide your code has a bug, check if Claude is slow everywhere, not just from your region.
For the second one I eventually stumbled on a community-built dashboard at claudestatus.com — free, unofficial, not affiliated with Anthropic, and I have nothing to do with whoever built it. It polls the Anthropic Statuspage API, runs HTTP latency checks from 17 countries, and surfaces community-submitted reports in one place. When production errors spike, I cross-check against its 30-day history before escalating. Nine times out of ten it's a regional wave I can wait out instead of waking up a teammate.
Checklist for the next 529 page
When the alert fires, in order:
-
Check headers — is
requests-remainingzero? That's your key, not the fleet. Different fix. -
Check
retry-after— use it literally. -
Check
tokens-remaining— a long prompt can exhaust the token-minute bucket before the request-minute bucket. - Check external signal — is it just you, your region, or global? A community dashboard or the official Anthropic statuspage answers this in 10 seconds.
- Fall back or degrade — drop to Haiku, cache the last good response, tell the user it'll be a minute.
- Only then look at your own code.
Nothing profound here. The mistake I kept making was assuming 529 = 429 and burning retries against a wall.
Dashboard referenced above: claudestatus.com. Free, community-built, unofficial, not affiliated with Anthropic — I just keep it pinned because it saves me on-call headaches. If you've hit an interesting 529 pattern, I'd love to hear about it in the comments.
Top comments (0)