TokenPAPA

Posted on Jun 30 • Originally published at doc.tokenpapa.ai

LLM API Error Handling & Debugging Guide (2026): Common Errors & Fixes

#llm #api #tutorial #debugging

LLM API Error Handling & Debugging Guide (2026): Common Errors & Fixes

Published: June 30, 2026 · 14 min read

Introduction

Every LLM API call will eventually fail. Authentication expires, rate limits hit, models overload, and networks degrade. The difference between a robust application and a brittle one is how gracefully it handles failure.

In 2026, with five major providers (OpenAI, DeepSeek, Anthropic, Google, and dozens more via API gateways), the error surface area is larger than ever. Each provider has unique error codes, retry semantics, and failure modes.

This guide catalogs every common LLM API error — what it means, why it happens, and exactly how to fix it. Whether you're debugging a production incident or building error handling from scratch, this is your reference.

New to LLM APIs? Start with our Best LLM APIs 2026 for model selection, and LLM API Pricing Comparison 2026 for cost data.

Error Reference by Status Code

Status 400: Bad Request

Meaning: The request payload is malformed or contains invalid parameters.

Symptom	Likely Cause	Fix
`"model" field required`	Missing model parameter	Add `model: "gpt-5"` or `"deepseek-v4"`
`"messages" must be an array`	Messages field not a list	Wrap in `[]`
`"role" must be one of system/user/assistant`	Invalid role string	Use exactly `"system"`, `"user"`, or `"assistant"`
`max_tokens exceeds limit`	Token cap exceeded	Reduce `max_tokens` (GPT-5: 128K, DeepSeek V4: 128K)
`Invalid JSON in request body`	Malformed JSON	Validate with `jq .` before sending

Example fix:

# Wrong: missing model
resp = requests.post(url, json={"messages": [...]})  # 400

# Correct
resp = requests.post(url, json={
    "model": "gpt-5",
    "messages": [{"role": "user", "content": "Hello"}]
})

Status 401: Unauthorized

Meaning: API key is missing, invalid, or expired.

Provider-specific messages:

Provider	Error Body	Common Cause
OpenAI	`"Incorrect API key provided"`	Wrong key or revoked
DeepSeek	`"Authentication Fails"`	Key expired or region blocked
Anthropic	`"x-api-key header is required"`	Missing header
Gemini	`"API_KEY_INVALID"`	Key not activated for model

Debugging checklist:

Check export | grep API_KEY — is the environment variable set?
Verify key length (OpenAI: sk-proj-..., DeepSeek: sk-...)
Check billing status — expired payment causes immediate deactivation
Test with curl: curl -H "Authorization: Bearer https://api.openai.com/v1/models

Pro tip: Rotate keys regularly. Use tokenpapa's API gateway to manage multiple provider keys from a single endpoint with automatic failover.

Status 403: Forbidden

Meaning: Key is valid but lacks permission for the requested resource.

Common scenarios:

Free-tier key trying to access gpt-5 (requires paid tier)
Organization-level restrictions (OpenAI org limits)
Country/region blocks (some providers restrict by IP geolocation)
Model access not granted (Claude 4 custom models)

Fix: Upgrade your account tier, or use a proxy/gateway that handles region routing.

Status 429: Too Many Requests

Meaning: Rate limit exceeded. See our dedicated LLM API Rate Limiting & Retry Strategies Guide for deep coverage.

Quick fix:

import time
time.sleep(float(resp.headers.get("Retry-After", 5)))
# Then retry

Status 500: Internal Server Error

Meaning: The provider's server encountered an error. Usually transient.

Providers that return 500:

Provider	Frequency	Best Response
OpenAI	Rare (under 0.1%)	Retry after 1-2s
DeepSeek	Occasional (cache miss storms)	Retry after 3-5s
Anthropic	Rare	Retry after 1s
Gemini	Very rare (under 0.01%)	Retry after 1s

Important: Do NOT retry 500 errors more than 3 times. If persistent, switch to a fallback provider or model.

Status 503: Service Unavailable

Meaning: The service is temporarily overloaded or under maintenance.

Provider behavior:

OpenAI: Usually resolves within 30-60 seconds. Check status.openai.com
DeepSeek: Can lag during peak China hours (9-11 PM CST). Use tokenpapa's load-balanced endpoint
Anthropic: Typically maintenance windows (announced via status page)
Gemini: Very rare — auto-resolves

Status 529: Too Many Requests (Anthropic-specific)

Meaning: Claude-specific overload error. Anthropic uses 529 instead of 429.

This is unique to Anthropic — your generic HTTP client must handle it:

retryable_codes = {429, 500, 503, 529}  # Note: 529 included!

Anthropic's 529 includes a retry_after_ms field in the response body:

{
  "error": {
    "type": "overloaded_error",
    "message": "Overloaded, resubmit your request"
  }
}

Fix: Exponential backoff. If 529 persists for more than 30 seconds, consider routing to Claude 4 Sonnet instead of Opus.

Debugging Toolkit

Step 1: Log All Requests and Responses

import logging, json

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("llm_client")

def log_request(method, url, headers, body):
    logger.info(f"Request {method} {url}")
    logger.info(f"  Headers: { {k:v for k,v in headers.items() if k.lower() != 'authorization'} }")
    logger.info(f"  Body: {json.dumps(body)[:500]}")

def log_response(resp):
    logger.info(f"Response {resp.status_code} ({len(resp.content)} bytes)")
    if resp.status_code >= 400:
        logger.error(f"  Error: {resp.text[:500]}")

Step 2: Structured Error Logging

Use structured logs for production monitoring:

import structlog

log = structlog.get_logger()

def on_error(provider, model, status_code, error_body, latency_ms):
    log.error("llm_api_error",
        provider=provider,
        model=model,
        status_code=status_code,
        error=error_body.get("error", {}).get("message", "unknown"),
        latency_ms=latency_ms
    )

Step 3: Health Check Endpoint

Probe each provider before routing traffic:

curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer " \
  https://api.openai.com/v1/models

Step 4: Request Tracing

Add a unique request_id to every outgoing request for correlation:

import uuid

request_id = str(uuid.uuid4())
headers = {
    "Authorization": f"Bearer {api_key}",
    "X-Request-Id": request_id  # OpenAI supports this for debugging
}

Common Error Patterns and Solutions

Pattern 1: Intermittent 429s Under Load

Symptom: Works fine at low volume, starts getting 429s at higher concurrency.

Root cause: You are exceeding RPM or TPM limits.

Solution: Use a token bucket limiter (see our rate limiting guide) and reduce max_concurrent by 50%.

Pattern 2: 401 After Key Rotation

Symptom: Previously working code suddenly returns 401.

Root cause: Environment variable not updated after key rotation, or multiple services using cached keys.

Solution:

grep -r "sk-" /etc/environment /home/*/.env /etc/profile.d/ 2>/dev/null
# Update all occurrences

Pattern 3: Timeout on Long Contexts

Symptom: Requests with large contexts (50K+ tokens) time out.

Root cause: Timeout value is too low for long generations.

Solution:

resp = requests.post(url, json=payload, timeout=(10, 300))
#                  connect timeout, read timeout

Pattern 4: DeepSeek V4 Returns Empty Response

Symptom: DeepSeek V4 returns HTTP 200 with empty choices array.

Root cause: Common during cache miss storms; the stream starts but produces zero tokens.

Fix:

if not resp.json().get("choices") or not resp.json()["choices"][0].get("message", {}).get("content"):
    return await fallback_to_deepseek_v4_direct()

Production Error Response Strategy

Error Type	Action	Time Threshold	Escalation
401/403	Stop and alert	Immediate	Developer on-call
429	Retry with backoff	30 seconds	Switch provider
500	Retry 3x	10 seconds	Switch model
503	Wait and retry	60 seconds	Check provider status
529	Backoff	30 seconds	Route to Sonnet
Timeout	Retry with longer timeout	60 seconds	Reduce context size

For production systems, using tokenpapa.ai as your API gateway gives you built-in error normalization, automatic fallback across providers, and unified logging.

Conclusion

LLM API errors are inevitable, but they don't have to cause downtime:

Every status code has a specific cause and fix: 400 (payload), 401 (auth), 403 (permissions), 429 (rate), 500 (server), 503 (overload), 529 (Anthropic)
Structured logging and tracing turn errors into actionable data
Provider-specific quirks (Anthropic 529, DeepSeek empty responses) need custom handling
Fallback chains protect against single-provider outages

Build confidently. Sign up at tokenpapa.ai for unified API access across all major providers with built-in error handling and $5 free credits to start.

Originally published at https://doc.tokenpapa.ai/en/docs/blog/llm-api-error-handling-debugging.

DEV Community

LLM API Error Handling & Debugging Guide (2026): Common Errors & Fixes

LLM API Error Handling & Debugging Guide (2026): Common Errors & Fixes

Introduction

Error Reference by Status Code

Status 400: Bad Request

Status 401: Unauthorized

Status 403: Forbidden

Status 429: Too Many Requests

Status 500: Internal Server Error

Status 503: Service Unavailable

Status 529: Too Many Requests (Anthropic-specific)

Debugging Toolkit

Step 1: Log All Requests and Responses

Step 2: Structured Error Logging

Step 3: Health Check Endpoint

Step 4: Request Tracing

Common Error Patterns and Solutions

Pattern 1: Intermittent 429s Under Load

Pattern 2: 401 After Key Rotation

Pattern 3: Timeout on Long Contexts

Pattern 4: DeepSeek V4 Returns Empty Response

Production Error Response Strategy

Conclusion

Top comments (0)