DEV Community

Cover image for The 7 Anthropic API Errors That Mean Different Things in Production
Gabriel Anhaia
Gabriel Anhaia

Posted on

The 7 Anthropic API Errors That Mean Different Things in Production


A team I talked to last month had one exception handler around every Anthropic call. The handler caught Exception, logged the message, retried three times with a fixed two-second sleep, then surfaced a generic 500 to the user. That code shipped to production for nine months. Nobody noticed it was wrong until a coworker rotated an API key and the entire support-ticket classifier started retrying the same authentication_error three times per request before failing. In that scenario, every retry on a permanent error stretches the latency tail of the rollout window — exactly the kind of self-inflicted brownout a richer taxonomy avoids.

That is the cost of treating the Anthropic API as a binary. The response taxonomy carries real information. Different error types want different reactions: some you retry, others you don't; a few want aggressive backoff, a few want immediate failure. The rest belong in your input-validation layer and should never have reached the network.

The seven error shapes below each map to a different reaction. The taxonomy and status codes come from the Anthropic errors docs. The exception classes are from the official Python SDK.

The shape of an error response

Every error from the API arrives as JSON with a top-level error object that has a type and a message, plus a request_id you should log on every failure:

{
  "type": "error",
  "error": {
    "type": "not_found_error",
    "message": "The requested resource could not be found."
  },
  "request_id": "req_011CSHoEeqs5C35K2UUqR7Fy"
}
Enter fullscreen mode Exit fullscreen mode

The Python SDK wraps these into typed exceptions, with the HTTP status driving which class gets raised. Mapping is direct:

Status SDK class API error.type
400 BadRequestError invalid_request_error
401 AuthenticationError authentication_error
403 PermissionDeniedError permission_error
404 NotFoundError not_found_error
413 APIStatusError request_too_large
429 RateLimitError rate_limit_error
500 InternalServerError api_error
529 InternalServerError / APIStatusError overloaded_error

The 529 is the one most teams miss in their handler. The SDK may surface it as InternalServerError (the base for 5xx) or as APIStatusError; either way it wants its own retry policy, distinct from a 500.

1. rate_limit_error (429): retry with jitter, not a fixed sleep

This means your account hit a rate limit on requests-per-minute, tokens-per-minute, or both. It is your problem, not Anthropic's. The fix is backoff.

The SDK already retries 429s a couple of times by default with a short exponential backoff. If your traffic shape is bursty enough that the default isn't enough, configure it explicitly and add jitter so a thundering herd of failed clients does not synchronise on the same retry slot:

import random, time
from anthropic import Anthropic, RateLimitError

client = Anthropic(max_retries=4)

def call_with_jitter(prompt: str, attempts: int = 5):
    for i in range(attempts):
        try:
            return client.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}],
            )
        except RateLimitError:
            if i == attempts - 1:
                raise
            sleep = (2 ** i) + random.random()
            time.sleep(sleep)
Enter fullscreen mode Exit fullscreen mode

Observability metric to track: anthropic_rate_limit_hits_total{tier=...}. Alert on rate-of-change, not absolute count. A few 429s a minute is fine. A spike from 0 to 200 in 30 seconds means your traffic profile changed and your provisioned capacity has not.

2. overloaded_error (529): back off longer, fail over if you have a fallback

Status 529 means Anthropic is temporarily overloaded across all users. This is capacity, not your account. Anthropic's docs note that 529s happen when their APIs experience high traffic across all users. Retrying immediately will hit the same wall. Retrying with the same backoff curve as a 429 is wrong: you're not the one who needs to slow down, you're waiting for someone else to finish.

from anthropic import APIStatusError

def is_overloaded(e: APIStatusError) -> bool:
    return getattr(e, "status_code", None) == 529

def call_with_failover(prompt: str):
    try:
        return primary_call(prompt)
    except APIStatusError as e:
        if is_overloaded(e):
            return fallback_call(prompt)
        raise
Enter fullscreen mode Exit fullscreen mode

Two reactions are reasonable. Either back off with a much longer base (5, 15, 45 seconds) or fail over: a different model, a different region, a cached response, or a degraded path that skips the model entirely. Whichever you pick, don't silently retry on the same client at the same rate. You'll turn a brownout into your own outage.

3. invalid_request_error (400): never retry, fix the input

A BadRequestError is your fault. The shape of the request is wrong. Common causes: max_tokens set absurdly low for the kind of response you asked for, message content too long for the context window, role-alternation violations in the messages array, content blocks of an unsupported type, or unsupported parameter combinations. Some models reject prefilled assistant messages with a 400 in specific configurations — verify against your model's docs before relying on prefill.

from anthropic import BadRequestError

try:
    msg = client.messages.create(...)
except BadRequestError as e:
    log.error(
        "anthropic.bad_request",
        extra={
            "request_id": getattr(e, "request_id", None),
            "error_type": e.body.get("error", {}).get("type"),
            "message": str(e),
        },
    )
    raise InputValidationFailed(str(e))
Enter fullscreen mode Exit fullscreen mode

Retry budget here is zero. The same input will fail the same way every time. The actionable signal is the error.message. Read it, surface it to whoever owns the prompt template, and add a unit test that catches it next time. Catching BadRequestError to retry it means there is a bug upstream in your prompt-construction layer, and the retry is hiding it.

4. authentication_error (401): never retry, page someone

A 401 means the key is missing, malformed, revoked, or rotated and your service has stale config. None of those get better with another HTTP request. The correct reaction is to stop calling the API, surface the error to your config / secrets layer, and page the on-call.

from anthropic import AuthenticationError

try:
    msg = client.messages.create(...)
except AuthenticationError:
    metrics.incr("anthropic_auth_failure_total")
    raise AuthConfigError("Anthropic auth failed; rotate key")
Enter fullscreen mode Exit fullscreen mode

Note: the snippet above raises a domain exception rather than calling SystemExit. In a long-lived API server, SystemExit tears down the worker on a single auth blip, which is rarely what you want. In a CLI or job runner, a hard exit is fine. Pick the shape that matches your runtime.

During a key rotation, a generic retry handler turns a 30-second blip into a 90-second outage with 3× the failed-request volume. If your handler catches AuthenticationError and retries, kill that branch.

5. permission_error (403): the key works, your model access does not

A 403 means the API key is valid but it does not have permission for what you asked. The most common shape: asking for a model your organisation has not been granted access to — a beta model, a region-locked SKU, or a Bedrock-only deployment hit through the public endpoint. Anthropic also returns request_too_large (413) when payloads exceed the per-request size limit; that one is an input-shape problem you fix upstream, not an auth issue.

from anthropic import PermissionDeniedError

model_name = "claude-sonnet-4-5"

try:
    msg = client.messages.create(model=model_name, ...)
except PermissionDeniedError as e:
    log.error("anthropic.permission_denied", extra={
        "model": model_name,
        "request_id": getattr(e, "request_id", None),
    })
    raise ConfigError(
        "Model not enabled for this org; check console"
    )
Enter fullscreen mode Exit fullscreen mode

Retry budget is zero, same as auth. The fix is in the Anthropic console, not in your retry loop. Track this on a separate counter — key rotations and model-access misconfigs need different runbooks.

6. not_found_error (404): your model name is wrong

This one is sneaky because it sounds like a transient resource lookup, and your handler probably treats it as one. From the Messages API, a 404 almost always means you spelled the model name wrong, asked for a model that has been retired, or used a family alias when the API expected a dated snapshot from the model list.

from anthropic import NotFoundError

try:
    msg = client.messages.create(model=model_name, ...)
except NotFoundError:
    log.error("anthropic.unknown_model", extra={
        "model": model_name,
    })
    raise ConfigError(f"Unknown Anthropic model: {model_name}")
Enter fullscreen mode Exit fullscreen mode

The reason this matters separately from BadRequestError is the diagnostic. A 400 says the request is shaped wrong; a 404 says the resource you named does not exist. Telling them apart in your logs saves a round-trip when a model deprecation lands and a few of your fleet are still pinned to an old name.

7. api_error (500): retry once, then escalate

A 500 from Anthropic's side is rare and non-deterministic. The right reaction is one retry, then escalate as a real failure. The SDK retries on 500 by default; in many production loops, one retry is enough to ride through a transient hiccup without amplifying a real incident.

from anthropic import InternalServerError

def call_once_retry(prompt: str):
    for attempt in range(2):
        try:
            return client.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}],
            )
        except InternalServerError as e:
            if attempt == 0:
                metrics.incr("anthropic_api_error_retry_total")
                time.sleep(1)
                continue
            metrics.incr("anthropic_api_error_unrecoverable_total")
            raise
Enter fullscreen mode Exit fullscreen mode

If 500s climb above noise, that is an Anthropic incident, and your dashboard should send you to status.anthropic.com instead of into a tighter retry loop. The retry-once-then-fail policy keeps your service from amplifying a backend brownout while still riding through one-off blips.

Putting it together

The handler at the top of every Anthropic call ends up looking like this:

import anthropic

def safe_call(prompt: str):
    try:
        return client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}],
        )
    except anthropic.AuthenticationError:
        raise AuthConfigError("rotate key")
    except anthropic.PermissionDeniedError:
        raise ConfigError("model not enabled")
    except anthropic.NotFoundError:
        raise ConfigError("unknown model name")
    except anthropic.BadRequestError as e:
        raise InputValidationFailed(str(e))
    except anthropic.RateLimitError:
        return call_with_jitter(prompt)
    except anthropic.InternalServerError as e:
        if getattr(e, "status_code", None) == 529:
            return call_with_failover(prompt)
        return call_once_retry(prompt)
    except anthropic.APIStatusError as e:
        if getattr(e, "status_code", None) == 529:
            return call_with_failover(prompt)
        raise
Enter fullscreen mode Exit fullscreen mode

A note on the helpers: call_with_jitter, call_with_failover, and call_once_retry each issue a fresh request to the API, so any error they raise is their problem to handle internally. If you want every branch to log a request_id exactly once, wrap the helpers in their own try/except using the same patterns above, or have them return structured results instead of raising. Don't assume a retry inside a helper inherits the outer handler's coverage.

Seven branches, seven reactions, one log line per branch with the request_id attached. The next time something fails, your dashboard tells you whether it is your config, your traffic, your input shape, or Anthropic's day, without anyone reading raw exception messages by hand.


If this was useful

Production agents fail in more shapes than one. Error handling is the boring half of the work that keeps the interesting half running. The AI Agents Pocket Guide covers the patterns for building agents that survive contact with real traffic: bounded loops, retry policies, tool-call hygiene, and the failure modes that bite teams in week three.

AI Agents Pocket Guide — Patterns for Building Autonomous Systems with LLMs

Top comments (0)