DEV Community

Cover image for LLM API Error Handling & Debugging Guide (2026): Common Errors & Fixes
TokenPAPA
TokenPAPA

Posted on • Originally published at doc.tokenpapa.ai

LLM API Error Handling & Debugging Guide (2026): Common Errors & Fixes

LLM API Error Handling & Debugging Guide (2026): Common Errors & Fixes

Published: June 30, 2026 · 14 min read


Introduction

Every LLM API call will eventually fail. Authentication expires, rate limits hit, models overload, and networks degrade. The difference between a robust application and a brittle one is how gracefully it handles failure.

In 2026, with five major providers (OpenAI, DeepSeek, Anthropic, Google, and dozens more via API gateways), the error surface area is larger than ever. Each provider has unique error codes, retry semantics, and failure modes.

This guide catalogs every common LLM API error — what it means, why it happens, and exactly how to fix it. Whether you're debugging a production incident or building error handling from scratch, this is your reference.

New to LLM APIs? Start with our Best LLM APIs 2026 for model selection, and LLM API Pricing Comparison 2026 for cost data.


Error Reference by Status Code

Status 400: Bad Request

Meaning: The request payload is malformed or contains invalid parameters.

Symptom Likely Cause Fix
"model" field required Missing model parameter Add model: "gpt-5" or "deepseek-v4"
"messages" must be an array Messages field not a list Wrap in []
"role" must be one of system/user/assistant Invalid role string Use exactly "system", "user", or "assistant"
max_tokens exceeds limit Token cap exceeded Reduce max_tokens (GPT-5: 128K, DeepSeek V4: 128K)
Invalid JSON in request body Malformed JSON Validate with jq . before sending

Example fix:

# Wrong: missing model
resp = requests.post(url, json={"messages": [...]})  # 400

# Correct
resp = requests.post(url, json={
    "model": "gpt-5",
    "messages": [{"role": "user", "content": "Hello"}]
})
Enter fullscreen mode Exit fullscreen mode

Status 401: Unauthorized

Meaning: API key is missing, invalid, or expired.

Provider-specific messages:

Provider Error Body Common Cause
OpenAI "Incorrect API key provided" Wrong key or revoked
DeepSeek "Authentication Fails" Key expired or region blocked
Anthropic "x-api-key header is required" Missing header
Gemini "API_KEY_INVALID" Key not activated for model

Debugging checklist:

  1. Check export | grep API_KEY — is the environment variable set?
  2. Verify key length (OpenAI: sk-proj-..., DeepSeek: sk-...)
  3. Check billing status — expired payment causes immediate deactivation
  4. Test with curl: curl -H "Authorization: Bearer https://api.openai.com/v1/models

Pro tip: Rotate keys regularly. Use tokenpapa's API gateway to manage multiple provider keys from a single endpoint with automatic failover.

Status 403: Forbidden

Meaning: Key is valid but lacks permission for the requested resource.

Common scenarios:

  • Free-tier key trying to access gpt-5 (requires paid tier)
  • Organization-level restrictions (OpenAI org limits)
  • Country/region blocks (some providers restrict by IP geolocation)
  • Model access not granted (Claude 4 custom models)

Fix: Upgrade your account tier, or use a proxy/gateway that handles region routing.

Status 429: Too Many Requests

Meaning: Rate limit exceeded. See our dedicated LLM API Rate Limiting & Retry Strategies Guide for deep coverage.

Quick fix:

import time
time.sleep(float(resp.headers.get("Retry-After", 5)))
# Then retry
Enter fullscreen mode Exit fullscreen mode

Status 500: Internal Server Error

Meaning: The provider's server encountered an error. Usually transient.

Providers that return 500:

Provider Frequency Best Response
OpenAI Rare (under 0.1%) Retry after 1-2s
DeepSeek Occasional (cache miss storms) Retry after 3-5s
Anthropic Rare Retry after 1s
Gemini Very rare (under 0.01%) Retry after 1s

Important: Do NOT retry 500 errors more than 3 times. If persistent, switch to a fallback provider or model.

Status 503: Service Unavailable

Meaning: The service is temporarily overloaded or under maintenance.

Provider behavior:

  • OpenAI: Usually resolves within 30-60 seconds. Check status.openai.com
  • DeepSeek: Can lag during peak China hours (9-11 PM CST). Use tokenpapa's load-balanced endpoint
  • Anthropic: Typically maintenance windows (announced via status page)
  • Gemini: Very rare — auto-resolves

Status 529: Too Many Requests (Anthropic-specific)

Meaning: Claude-specific overload error. Anthropic uses 529 instead of 429.

This is unique to Anthropic — your generic HTTP client must handle it:

retryable_codes = {429, 500, 503, 529}  # Note: 529 included!
Enter fullscreen mode Exit fullscreen mode

Anthropic's 529 includes a retry_after_ms field in the response body:

{
  "error": {
    "type": "overloaded_error",
    "message": "Overloaded, resubmit your request"
  }
}
Enter fullscreen mode Exit fullscreen mode

Fix: Exponential backoff. If 529 persists for more than 30 seconds, consider routing to Claude 4 Sonnet instead of Opus.


Debugging Toolkit

Step 1: Log All Requests and Responses

import logging, json

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("llm_client")

def log_request(method, url, headers, body):
    logger.info(f"Request {method} {url}")
    logger.info(f"  Headers: { {k:v for k,v in headers.items() if k.lower() != 'authorization'} }")
    logger.info(f"  Body: {json.dumps(body)[:500]}")

def log_response(resp):
    logger.info(f"Response {resp.status_code} ({len(resp.content)} bytes)")
    if resp.status_code >= 400:
        logger.error(f"  Error: {resp.text[:500]}")
Enter fullscreen mode Exit fullscreen mode

Step 2: Structured Error Logging

Use structured logs for production monitoring:

import structlog

log = structlog.get_logger()

def on_error(provider, model, status_code, error_body, latency_ms):
    log.error("llm_api_error",
        provider=provider,
        model=model,
        status_code=status_code,
        error=error_body.get("error", {}).get("message", "unknown"),
        latency_ms=latency_ms
    )
Enter fullscreen mode Exit fullscreen mode

Step 3: Health Check Endpoint

Probe each provider before routing traffic:

curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer " \
  https://api.openai.com/v1/models
Enter fullscreen mode Exit fullscreen mode

Step 4: Request Tracing

Add a unique request_id to every outgoing request for correlation:

import uuid

request_id = str(uuid.uuid4())
headers = {
    "Authorization": f"Bearer {api_key}",
    "X-Request-Id": request_id  # OpenAI supports this for debugging
}
Enter fullscreen mode Exit fullscreen mode

Common Error Patterns and Solutions

Pattern 1: Intermittent 429s Under Load

Symptom: Works fine at low volume, starts getting 429s at higher concurrency.

Root cause: You are exceeding RPM or TPM limits.

Solution: Use a token bucket limiter (see our rate limiting guide) and reduce max_concurrent by 50%.

Pattern 2: 401 After Key Rotation

Symptom: Previously working code suddenly returns 401.

Root cause: Environment variable not updated after key rotation, or multiple services using cached keys.

Solution:

grep -r "sk-" /etc/environment /home/*/.env /etc/profile.d/ 2>/dev/null
# Update all occurrences
Enter fullscreen mode Exit fullscreen mode

Pattern 3: Timeout on Long Contexts

Symptom: Requests with large contexts (50K+ tokens) time out.

Root cause: Timeout value is too low for long generations.

Solution:

resp = requests.post(url, json=payload, timeout=(10, 300))
#                  connect timeout, read timeout
Enter fullscreen mode Exit fullscreen mode

Pattern 4: DeepSeek V4 Returns Empty Response

Symptom: DeepSeek V4 returns HTTP 200 with empty choices array.

Root cause: Common during cache miss storms; the stream starts but produces zero tokens.

Fix:

if not resp.json().get("choices") or not resp.json()["choices"][0].get("message", {}).get("content"):
    return await fallback_to_deepseek_v4_direct()
Enter fullscreen mode Exit fullscreen mode

Production Error Response Strategy

Error Type Action Time Threshold Escalation
401/403 Stop and alert Immediate Developer on-call
429 Retry with backoff 30 seconds Switch provider
500 Retry 3x 10 seconds Switch model
503 Wait and retry 60 seconds Check provider status
529 Backoff 30 seconds Route to Sonnet
Timeout Retry with longer timeout 60 seconds Reduce context size

For production systems, using tokenpapa.ai as your API gateway gives you built-in error normalization, automatic fallback across providers, and unified logging.


Conclusion

LLM API errors are inevitable, but they don't have to cause downtime:

  • Every status code has a specific cause and fix: 400 (payload), 401 (auth), 403 (permissions), 429 (rate), 500 (server), 503 (overload), 529 (Anthropic)
  • Structured logging and tracing turn errors into actionable data
  • Provider-specific quirks (Anthropic 529, DeepSeek empty responses) need custom handling
  • Fallback chains protect against single-provider outages

Build confidently. Sign up at tokenpapa.ai for unified API access across all major providers with built-in error handling and $5 free credits to start.


Originally published at https://doc.tokenpapa.ai/en/docs/blog/llm-api-error-handling-debugging.

Top comments (0)