DEV Community

Void Stitch
Void Stitch

Posted on

The 6 AI API error classes that destroy incident response time — and how to tell them apart

If you ship customer-facing AI features on OpenAI or Anthropic, you'll hit failures that look similar on the surface but need completely different debugging paths. Here are the six error classes that consistently waste the most engineering time.

Class 1: The Three Different 429s

A 429 from OpenAI or Anthropic always means "rate limited" — but it can mean three entirely different things:

Request-rate 429 (RPM limit): Too many requests per minute.

  • Check: x-ratelimit-limit-requests and x-ratelimit-remaining-requests headers
  • Fix: exponential backoff with jitter, or request a tier upgrade

Token-rate 429 (TPM limit): Token throughput is too high.

  • Check: x-ratelimit-limit-tokens and x-ratelimit-remaining-tokens headers
  • Fix: reduce max_tokens, implement token budget tracking, compress prompts

Concurrent request 429: OpenAI Tier 1–2 accounts have a parallel-request cap.

  • Check: RPM and TPM headers look fine but you still get 429? It's this.
  • Fix: add a semaphore/queue in front of API calls. Backoff won't help — this is a concurrency cap, not a time-based limit.

The error message rarely distinguishes these. The response headers do. Most teams spend 45+ minutes tuning backoff on a concurrent-429, because they're debugging the wrong constraint.

Class 2: Anthropic 529 ≠ Rate Limit

Anthropic returns 529 for "Overloaded" — and it behaves completely differently from a 429:

  • 529 means server load, not quota exhaustion
  • Your quota reset timer is irrelevant
  • Retrying immediately makes it worse (you're adding load to an already-stressed system)
  • First move: check status.anthropic.com

If it's a platform incident, no request tuning helps. If it's not on the status page, try reducing parallelism and adding jitter. Teams consistently burn 30–60 minutes changing their payload during a 529 because they treat it like a 429. They're different failure modes.

Class 3: Provider 500/503 — Me or Them?

Both OpenAI and Anthropic return vague 500s:

{"error": {"message": "The server had an error processing your request.", "type": "server_error"}}
Enter fullscreen mode Exit fullscreen mode

Fastest disambiguation:

  1. Check the status page first (status.openai.com or status.anthropic.com)
  2. Minimal repro: strip to minimum — no system prompt, simple user message, same model
  3. Minimal repro succeeds → your full payload has an edge case (context length, content policy, tool schema)
  4. Minimal repro also 500s → provider infrastructure issue; wait

The skip-to-step-2 mistake is common: teams tune their payload for 90 minutes during a provider outage.

Class 4: Which Timeout Is It?

Four different timeouts exist in the typical LLM request path:

Timeout Symptom Fix
Client HTTP timeout Your axios/fetch gives up Increase timeout or use streaming
Model inference timeout LLM still generating at OpenAI's 600s max Reduce prompt complexity; use streaming
Gateway timeout (504) Proxy intermediary Transient; retry
Your own downstream timeout Your API → customer connection drops Make the architecture async; don't block customer response on full completion

The fix for a client timeout (raise the timeout) is wrong for a downstream timeout (go async). Getting the diagnosis right requires knowing which link in the chain is timing out.

Class 5: Silent Invalid Requests

Some malformed requests return a clear 400. Others silently succeed (200) but produce wrong behavior.

Loud failures (400 with a clear message):

  • Missing role in messages array
  • Invalid model name
  • max_tokens below minimum

Silent failures (200, but wrong):

  • Unknown fields in tools[].function.parameters — silently dropped
  • Deprecated functions syntax on models that expect tools — silently converted or ignored
  • Mixing content formats (string vs. array) on multimodal models

Silent failures show up as "the model is behaving weird" rather than "we have an API error," which makes them the hardest class to find.

Class 6: Model Version Drift

Both providers update model versions:

  • OpenAI aliases (gpt-4, gpt-4-turbo) point to whatever they decide is current
  • Anthropic uses explicit version strings but ships new versions you may not notice

If you pin to an alias, behavior can change invisibly. Two symptoms: sudden latency changes, subtle output behavior shifts.

Fix: pin to an exact version string. Add model deprecation dates to your maintenance calendar.


What's been your worst incident?

These six classes cover the majority of LLM API incident time I've seen. Concurrent-429 confusion and 529-vs-429 are the top two time sinks.

What failure class took your team the longest to pin down? If your stack is TypeScript or Python — did SDK error messages actually help, or did you end up parsing raw HTTP responses?

Top comments (1)

Collapse
 
void_stitch profile image
Void Stitch

If you have shipped AI features on OpenAI or Anthropic and hit any of these in production, I would genuinely like to hear which one tripped you up first and what the initial symptom was. My current observation is that the concurrent-429 is the one that surprises engineers the most because the RPM and TPM headers look fine. Would be useful to hear whether that matches your experience, or if another error class has been burning more incident time.