Claude Status: Why Your Claude API Keeps Returning 529 `overloaded_error` — A Production Debugging Playbook

#api #claude #llm #sre

Field notes from running Claude in a production workload where 529 overloaded_error became a weekly incident. I don't work for Anthropic. Everything below is based on behaviour observed against the public API.

I'm going to skip the pitch and start with the thing that will save you an hour if you landed here from a PagerDuty alert at 3 a.m.

TL;DR

HTTP 529 with overloaded_error is not your rate limit. It's Anthropic saying the fleet is out of capacity for the model you asked for.
Retrying the same model, same region, same request, same second is exactly what made the wall bigger. Your retry storm is what 529 is trying to avoid.
Respect retry-after. If it's missing, jitter 2–8 seconds and fall back to a smaller model (Haiku) on non-critical paths.
When your own retry policy makes it worse, look at a third-party signal (status dashboards, community reports, latency probes) before escalating inside your team.

The 529 is not the 429

Most client libraries I've seen wrap retries around 429 "Too Many Requests" and leave it there. That's fine for your per-key quota. But Claude will also return HTTP 529 with:

{
  "type": "error",
  "error": {
    "type": "overloaded_error",
    "message": "Overloaded"
  }
}

A 429 means you are sending too much. A 529 means the model's capacity pool is too small right now. The same request, replayed in the next second, gets the same answer. Two failure modes, two very different retry policies.

Read the headers first

Before touching any retry code, check if the response carried these headers:

anthropic-ratelimit-requests-limit: 4000
anthropic-ratelimit-requests-remaining: 3987
anthropic-ratelimit-requests-reset: 2025-11-03T12:34:56Z
anthropic-ratelimit-tokens-limit: 400000
anthropic-ratelimit-tokens-remaining: 398420
anthropic-ratelimit-tokens-reset: 2025-11-03T12:34:00Z
retry-after: 12

If requests-remaining is non-zero and you're still getting 529, the server-side capacity wall is the problem, not your key. Stop blaming your gateway.

If retry-after is present, use it literally. The easiest mistake is a library that ignores the header and retries in 500ms.

A retry policy that actually respects 529

TypeScript, no SDK-specific code so it's portable:

type RetryInput = { attempt: number; retryAfterSec?: number };

function backoff({ attempt, retryAfterSec }: RetryInput): number {
  if (retryAfterSec) return retryAfterSec * 1000;
  // 1s, 2s, 4s, 8s, 16s — with 50% jitter
  const base = Math.min(16_000, 1_000 * 2 ** attempt);
  return Math.floor(base * (0.5 + Math.random() * 0.5));
}

async function callClaude(req: ClaudeRequest, maxAttempts = 5) {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const res = await fetch("https://api.anthropic.com/v1/messages", {
      method: "POST",
      headers: {
        "x-api-key": process.env.ANTHROPIC_API_KEY!,
        "anthropic-version": "2023-06-01",
        "content-type": "application/json",
      },
      body: JSON.stringify(req),
    });

    if (res.ok) return res.json();

    if (res.status === 529 || res.status === 429) {
      const retryAfter = Number(res.headers.get("retry-after")) || undefined;
      if (attempt === maxAttempts - 1) throw new Error(`claude ${res.status} after ${maxAttempts}`);
      await new Promise(r => setTimeout(r, backoff({ attempt, retryAfterSec: retryAfter })));
      continue;
    }
    throw new Error(`claude ${res.status}: ${await res.text()}`);
  }
  throw new Error("unreachable");
}

Two things matter:

Jitter. If ten workers retry on a deterministic schedule, they stampede the model the moment its queue opens. Random jitter spreads the load.
Capped max. Don't let attempt 7 sleep 128 seconds. If 16s didn't work, the capacity issue is big enough that you should be on a fallback path, not holding a user request open.

Model fallback as a first-class path

For anything that isn't user-facing reasoning, fall back to Haiku after the second 529:

async function callWithFallback(req: ClaudeRequest) {
  try {
    return await callClaude(req, 2); // fast fail on primary
  } catch {
    const downgraded = { ...req, model: "claude-haiku-4-5-20251001" };
    return callClaude(downgraded, 3);
  }
}

Haiku is a different capacity pool. When Opus or Sonnet are overloaded, Haiku is often fine. You trade quality for availability — almost always the right call on a transient spike.

Don't trust your own retries blindly

Here's the real trap: once you've got a retry policy, every 529 becomes invisible. You quietly burn latency budget, your p99 creeps up, your error budget gets eaten without a single page firing. Dashboard green, users angry anyway.

Two signals I rely on to see the "hidden 529 drip":

Per-model error counters. Track anthropic_http_5xx_total{model="opus",code="529"} as its own metric, not rolled up under a generic api_errors_total. You want 529 as visible as a bill.
External latency probes. Before you decide your code has a bug, check if Claude is slow everywhere, not just from your region.

For the second one I eventually stumbled on a community-built dashboard at claudestatus.com — free, unofficial, not affiliated with Anthropic, and I have nothing to do with whoever built it. It polls the Anthropic Statuspage API, runs HTTP latency checks from 17 countries, and surfaces community-submitted reports in one place. When production errors spike, I cross-check against its 30-day history before escalating. Nine times out of ten it's a regional wave I can wait out instead of waking up a teammate.

Checklist for the next 529 page

When the alert fires, in order:

Check headers — is requests-remaining zero? That's your key, not the fleet. Different fix.
Check retry-after — use it literally.
Check tokens-remaining — a long prompt can exhaust the token-minute bucket before the request-minute bucket.
Check external signal — is it just you, your region, or global? A community dashboard or the official Anthropic statuspage answers this in 10 seconds.
Fall back or degrade — drop to Haiku, cache the last good response, tell the user it'll be a minute.
Only then look at your own code.

Nothing profound here. The mistake I kept making was assuming 529 = 429 and burning retries against a wall.

Dashboard referenced above: claudestatus.com. Free, community-built, unofficial, not affiliated with Anthropic — I just keep it pinned because it saves me on-call headaches. If you've hit an interesting 529 pattern, I'd love to hear about it in the comments.