Helen

Posted on Jun 11

AI API Errors: A Practical Debugging Guide for Developers

#api #apigateway #coding

API failures in AI work differently. Here's how to debug them properly.

A 200 status code doesn't always mean your AI generation succeeded. A null content field isn't necessarily an error. And a prompt that worked perfectly yesterday might fail today — because a provider quietly updated their content policy.

This guide walks you through reading AI API errors, understanding what each failure mode actually means, and building error handling that tells you what broke — not just that something broke.

Note: Model names like gpt-5.4 and gpt-5.4-mini used here are CometAPI platform identifiers. They work through https://api.cometapi.com/v1 only — not directly through OpenAI or Anthropic APIs.

Why AI API Debugging Is Different
With a standard REST API, 200 means success and 4xx means you made a mistake. AI APIs introduce a third category: soft failures — responses that return 200 but contain nothing usable.

AI failures fall into three types:

Failure Type What Happens Example
Hard failure HTTP error (4xx, 5xx). Request didn't complete. 401 Unauthorized
Soft failure HTTP 200, but finish_reason is content_filter or length Blocked prompt
Silent failure HTTP 200, everything looks fine — but output is wrong Wrong classification
Most error handling only covers the first type. The second and third types are where production bugs hide.

Understanding Error Responses
The text completions endpoint returns a consistent error structure:

json
{
"error": {
"message": "Human-readable description (includes request ID)",
"type": "comet_api_error",
"param": "the_problematic_parameter",
"code": "error_code"
}
}
What to log: Always log message and param. The message tells you what went wrong. The param tells you which parameter caused it.

Image & video endpoints return different error formats — always parse the raw response body.

HTTP Status Codes: What They Mean
Status Meaning Common Cause Fix
400 Bad request Missing model or wrong parameter Check error.param
401 Unauthorized Invalid or missing API key Verify Bearer format
429 Rate limited Too many requests Exponential backoff
500 Server error Provider-side issue Retry with backoff
504 Gateway timeout Provider took too long Retry or use faster model
Rule of thumb: Retry on 429, 500, and 504. Don't retry on 400 or 401 — the same request will fail again.

The Most Overlooked Field: finish_reason
A 200 response with finish_reason: "content_filter" means your generation was blocked. The content field will be null or empty. If you don't check this, your app will silently return nothing.

finish_reason Meaning Action
stop Normal completion Success
length Hit token limit Increase max_tokens or shorten prompt
content_filter Blocked by safety policy Rephrase the prompt
tool_calls Model called a tool Handle the tool call (content will be null)
A Robust Text Completion Example (Python)
Here's a production-ready function that handles all three failure types:

python
import os
import logging
from openai import OpenAI, APIStatusError, APIConnectionError

client = OpenAI(
base_url="https://api.cometapi.com/v1",
api_key=os.environ.get("COMETAPI_KEY"),
)

def safe_complete(messages, model="gpt-5.4-mini", **kwargs):
try:
response = client.chat.completions.create(
model=model, messages=messages, **kwargs
)
except APIStatusError as e:
error_body = e.response.json().get("error", {})
logging.error(f"API error {e.status_code}: {error_body.get('message')}")
raise

choice = response.choices[0]
finish_reason = choice.finish_reason

if finish_reason == "content_filter":
    raise ValueError(f"Generation blocked on model {model}. Rephrase prompt.")

if finish_reason == "length":
    logging.warning("Output truncated at token limit.")

return {
    "content": choice.message.content or "",
    "finish_reason": finish_reason,
    "tool_calls": choice.message.tool_calls,
}

Key takeaway: Always check finish_reason. Don't assume 200 means success.

Detecting Silent Failures
Silent failures are the hardest to catch. The API returns 200, finish_reason is stop, but the output is semantically wrong. You can only catch these at the application level.

Example: Validation for classification tasks

python
def validate_completion(result, task):
content = result["content"].strip()

# Empty output check
if not content and result["finish_reason"] != "tool_calls":
    raise ValueError(f"Empty output for task '{task}'")

# Task-specific validation
if task == "classify":
    valid_labels = {"positive", "negative", "neutral"}
    if content.lower() not in valid_labels:
        logging.warning(f"Unexpected output: '{content}'")
        # May need to re-prompt with stricter instructions

if task == "json_extract":
    import json
    try:
        json.loads(content)
    except json.JSONDecodeError:
        raise ValueError("Expected JSON but got plain text")

return content

Common causes of silent failures:

Ambiguous prompts

Model ignored format instructions

Input was too short or too long for the task

Exponential Backoff for Rate Limits
Rate limit errors (429) are temporary. Use exponential backoff with jitter:

python
import time
import random

def complete_with_retry(messages, model="gpt-5.4-mini", max_retries=3):
for attempt in range(max_retries):
try:
return safe_complete(messages, model=model)
except APIStatusError as e:
if e.status_code < 500:
raise # Don't retry 4xx errors
except RateLimitError:
pass # Retry

    if attempt < max_retries - 1:
        wait = (2 ** attempt) + random.random()
        logging.warning(f"Retry in {wait:.1f}s")
        time.sleep(wait)

raise RuntimeError(f"Failed after {max_retries} attempts")

Why jitter matters: Random delay prevents multiple clients from retrying in sync (thundering herd problem).

Image Generation Errors
Image generation has its own failure patterns:

Symptom Cause Fix
Empty data array Prompt filtered Check revised_prompt; rephrase
response_format error Wrong parameter for GPT Image 2 Use output_format instead
n > 1 error Qwen Image doesn't support multiple images Loop single requests
URL returns 403 later URL expired Download immediately
Simplified image generation check:

python
def generate_image_safe(prompt, model="dall-e-3"):
response = requests.post(
"https://api.cometapi.com/v1/images/generations",
json={"model": model, "prompt": prompt},
headers={"Authorization": f"Bearer {api_key}"}
)

data = response.json().get("data", [])
if not data:
    return {"blocked": True}  # Content filter triggered

return {"url": data[0].get("url"), "blocked": False}

Video Generation Errors
Video generation is asynchronous. Key patterns to watch:

Symptom Cause Fix
Stuck in queued 10+ min Server load Try a different model
failed with no detail Prompt filtered Rephrase prompt
URL returns 403 URL expired Download immediately
task_not_exist on first poll Task still initializing Wait 5s and retry
Kling returns "succeed" Non-standard status Handle both "succeed" and "succeeded"
Minimal polling pattern:

python
def poll_video(task_id, max_wait=600):
elapsed = 0
while elapsed < max_wait:
result = requests.get(f"https://api.cometapi.com/v1/videos/{task_id}").json()
status = result.get("status")

    if status == "succeeded":
        return result["output"][0]
    if status in ("failed", "cancelled"):
        raise RuntimeError(f"Video failed: {result.get('error')}")

    time.sleep(10)
    elapsed += 10

raise TimeoutError("Video generation timed out")

Debugging Checklist
For text generation:

API key is correctly formatted (Bearer )

finish_reason is stop (not content_filter or length)

content is not null (or null is expected due to tool_calls)

Error is 4xx (fix request) or 5xx (retry)

Output passes application-layer validation (no silent failure)

For image generation:

data array is not empty (content filter not triggered)

Correct parameters used (output_format for GPT Image 2, not response_format)

Downloaded image before URL expired

For video generation:

Task progresses beyond queued within reasonable time

Error field checked in failed task response

Video downloaded before URL expired

Handles both "succeed" (Kling) and "succeeded" (others)

FAQ
Q: My request returns 200 but no content. What happened?
Check finish_reason. content_filter means the generation was blocked. tool_calls means the model wants to call a tool (content is null by design). If finish_reason is stop but content is still empty, that's a silent failure — log the full response and check your prompt.

Q: How do I know if my prompt was filtered?
Text: finish_reason === "content_filter". Images: data array is empty. Video: Task reaches failed status quickly with no error detail. Fix: Rephrase the prompt to be more neutral.

Q: When should I retry a failed request?
Retry on 429 and 5xx with exponential backoff. Don't retry on 4xx — a bad request won't fix itself.

Q: What's exponential backoff?
Instead of retrying immediately, wait progressively longer: 1s, 2s, 4s. Add random jitter to prevent multiple clients from retrying in sync. This is standard practice for any rate-limited API.

Q: How do I catch silent failures?
Silent failures require application-layer validation. The API won't tell you the output is semantically wrong. Check that the output matches the expected format (valid JSON, expected label, minimum length). Log the full output when validation fails.

DEV Community

AI API Errors: A Practical Debugging Guide for Developers

Top comments (0)