If you ship customer-facing AI features on OpenAI or Anthropic, you'll hit failures that look similar on the surface but need completely different debugging paths. Here are the six error classes that consistently waste the most engineering time.
Class 1: The Three Different 429s
A 429 from OpenAI or Anthropic always means "rate limited" — but it can mean three entirely different things:
Request-rate 429 (RPM limit): Too many requests per minute.
- Check:
x-ratelimit-limit-requestsandx-ratelimit-remaining-requestsheaders - Fix: exponential backoff with jitter, or request a tier upgrade
Token-rate 429 (TPM limit): Token throughput is too high.
- Check:
x-ratelimit-limit-tokensandx-ratelimit-remaining-tokensheaders - Fix: reduce
max_tokens, implement token budget tracking, compress prompts
Concurrent request 429: OpenAI Tier 1–2 accounts have a parallel-request cap.
- Check: RPM and TPM headers look fine but you still get 429? It's this.
- Fix: add a semaphore/queue in front of API calls. Backoff won't help — this is a concurrency cap, not a time-based limit.
The error message rarely distinguishes these. The response headers do. Most teams spend 45+ minutes tuning backoff on a concurrent-429, because they're debugging the wrong constraint.
Class 2: Anthropic 529 ≠ Rate Limit
Anthropic returns 529 for "Overloaded" — and it behaves completely differently from a 429:
- 529 means server load, not quota exhaustion
- Your quota reset timer is irrelevant
- Retrying immediately makes it worse (you're adding load to an already-stressed system)
- First move: check status.anthropic.com
If it's a platform incident, no request tuning helps. If it's not on the status page, try reducing parallelism and adding jitter. Teams consistently burn 30–60 minutes changing their payload during a 529 because they treat it like a 429. They're different failure modes.
Class 3: Provider 500/503 — Me or Them?
Both OpenAI and Anthropic return vague 500s:
{"error": {"message": "The server had an error processing your request.", "type": "server_error"}}
Fastest disambiguation:
- Check the status page first (status.openai.com or status.anthropic.com)
- Minimal repro: strip to minimum — no system prompt, simple user message, same model
- Minimal repro succeeds → your full payload has an edge case (context length, content policy, tool schema)
- Minimal repro also 500s → provider infrastructure issue; wait
The skip-to-step-2 mistake is common: teams tune their payload for 90 minutes during a provider outage.
Class 4: Which Timeout Is It?
Four different timeouts exist in the typical LLM request path:
| Timeout | Symptom | Fix |
|---|---|---|
| Client HTTP timeout | Your axios/fetch gives up | Increase timeout or use streaming |
| Model inference timeout | LLM still generating at OpenAI's 600s max | Reduce prompt complexity; use streaming |
| Gateway timeout (504) | Proxy intermediary | Transient; retry |
| Your own downstream timeout | Your API → customer connection drops | Make the architecture async; don't block customer response on full completion |
The fix for a client timeout (raise the timeout) is wrong for a downstream timeout (go async). Getting the diagnosis right requires knowing which link in the chain is timing out.
Class 5: Silent Invalid Requests
Some malformed requests return a clear 400. Others silently succeed (200) but produce wrong behavior.
Loud failures (400 with a clear message):
- Missing
rolein messages array - Invalid model name
-
max_tokensbelow minimum
Silent failures (200, but wrong):
- Unknown fields in
tools[].function.parameters— silently dropped - Deprecated
functionssyntax on models that expecttools— silently converted or ignored - Mixing content formats (string vs. array) on multimodal models
Silent failures show up as "the model is behaving weird" rather than "we have an API error," which makes them the hardest class to find.
Class 6: Model Version Drift
Both providers update model versions:
- OpenAI aliases (
gpt-4,gpt-4-turbo) point to whatever they decide is current - Anthropic uses explicit version strings but ships new versions you may not notice
If you pin to an alias, behavior can change invisibly. Two symptoms: sudden latency changes, subtle output behavior shifts.
Fix: pin to an exact version string. Add model deprecation dates to your maintenance calendar.
What's been your worst incident?
These six classes cover the majority of LLM API incident time I've seen. Concurrent-429 confusion and 529-vs-429 are the top two time sinks.
What failure class took your team the longest to pin down? If your stack is TypeScript or Python — did SDK error messages actually help, or did you end up parsing raw HTTP responses?
Top comments (1)
If you have shipped AI features on OpenAI or Anthropic and hit any of these in production, I would genuinely like to hear which one tripped you up first and what the initial symptom was. My current observation is that the concurrent-429 is the one that surprises engineers the most because the RPM and TPM headers look fine. Would be useful to hear whether that matches your experience, or if another error class has been burning more incident time.