Mukunda Rao Katta

Posted on May 25

My agent retried a 401 Unauthorized nine times. The fix was two lines.

#hermeschallenge #ai #python #agents

The bug was obvious in retrospect.

My agent called a third-party API. The access token had expired. The API returned 401 Unauthorized. My retry wrapper caught the exception, checked "is this an exception?", said yes, and tried again. Nine more times.

Nine round-trips to an endpoint that was never going to succeed until I rotated the token. The model kept reasoning "I got an error, I should retry" because my retry logic had given it no other vocabulary. The loop had one question: did the call raise? It had no idea that some exceptions are retryable and some are not.

tool-error-classify is the small Python library I wrote to fix this. It gives every tool exception a stable ErrorKind code the agent loop can branch on. On PyPI as tool-error-classify. 51 tests, zero deps.

The shape of the fix

Before, the agent loop looked like this:

try:
    result = call_tool(args)
except Exception as e:
    # Retry? Abort? We have no idea.
    retry_or_give_up(e)

After, it looks like this:

from tool_error_classify import classify, ErrorKind

try:
    result = call_tool(args)
except Exception as e:
    kind = classify(e)

    if kind == ErrorKind.TRANSIENT:
        retry_with_backoff(e)
    elif kind == ErrorKind.QUOTA:
        back_off_and_retry(e, hint=kind.retry_after)
    elif kind == ErrorKind.AUTH:
        abort("Credentials expired. Rotate token and restart.")
    elif kind == ErrorKind.NOT_FOUND:
        tell_model("Resource not found. Check the id you passed.")
    elif kind == ErrorKind.VALIDATION:
        tell_model("Bad arg shape: " + str(e))
    else:
        log_and_abort(e)

The auth case never retries. The quota case backs off. The transient case retries. Each branch is a decision you make once in the loop, not scattered across every tool.

The ErrorKind enum

Eight values, each with a clear meaning:

class ErrorKind(Enum):
    TRANSIENT     # Temporary blip, safe to retry
    QUOTA         # Rate-limited (429), back off before retrying
    AUTH          # Credentials expired or missing, do not retry
    NOT_FOUND     # Tool arg pointed to a missing resource
    VALIDATION    # Bad arg shape, model should fix the call
    TIMEOUT       # Request timed out, may be safe to retry
    SERVER_ERROR  # Remote 5xx, retry with caution
    UNKNOWN       # Could not classify

The classifier checks HTTP status codes first, then Python exception class names, then walks the native exception chain. For QUOTA errors, it also parses the Retry-After header and exposes it as kind.retry_after (a datetime or a seconds delta).

from tool_error_classify import classify, ErrorKind

err = classify(my_exception)
print(err)           # ErrorKind.QUOTA
print(err.retry_after)  # datetime(2026, 5, 24, 14, 32, 10) or None

What it does NOT do

It does not catch exceptions. You raise, you catch, then you classify. The library never wraps your tool call or touches control flow.
It does not retry anything. Classification and retry are separate concerns. Pair with llm-retry-py to handle the backoff logic itself.
It does not know your tool's business logic. NOT_FOUND means an HTTP 404 or a FileNotFoundError in the chain. Whether that should retry with a different resource id is your call.
It does not replace structured error types in your own tools. If you already raise MyAuthError(ExpiredTokenError), the classifier will walk the chain and find it. You keep your exception hierarchy, the library just labels it.

Inside the lib: classifier, not catcher

This was the main design decision.

Every error-handling library I looked at mixed classification and control flow. Decorators that caught and re-raised. Context managers that retried. Wrappers that swallowed exceptions.

I wanted none of that. Once you mix "what kind of error" with "what do I do about it", you end up with a library that is opinionated about retry counts, backoff curves, fallback providers. Those are real problems, but they are not this library's problem.

The classifier's only job is to answer one question: what kind of error is this?

def classify(exc: BaseException) -> ErrorKind:
    ...

It takes an already-raised exception and returns a code. It does not catch. It does not retry. It does not log. It classifies.

This means you can use it anywhere in an existing try/except chain without restructuring your code. Add two lines and your loop gains a vocabulary.

except Exception as e:
    kind = classify(e)   # line 1
    if kind == ErrorKind.AUTH:  # line 2
        raise  # or abort, or log, your call

How classification works

Three passes, in priority order.

HTTP status codes. If the exception (or any cause in its chain) carries a status_code, status, response.status_code, or similar attribute, the code drives classification. 401/403 becomes AUTH. 404 becomes NOT_FOUND. 422 becomes VALIDATION. 429 becomes QUOTA. 5xx becomes SERVER_ERROR or TRANSIENT depending on the code.

Exception class names. If no status code is found, the classifier checks the class name hierarchy. TimeoutError, asyncio.TimeoutError, requests.exceptions.Timeout all produce TIMEOUT. FileNotFoundError, KeyError with a resource-shaped message, NotFoundError subclasses produce NOT_FOUND. Permission and auth exception names produce AUTH.

Chain walk. If the top-level exception is a wrapper like ToolCallError(cause=real_error), the classifier walks __cause__ and __context__ chains to find the innermost exception that has a classifiable attribute or name. You get the real error kind, not "wrapped exception, unknown".

If all three passes fail, the result is UNKNOWN. The loop can decide what that means.

When this is useful

You have an agent loop that calls external tools (APIs, databases, file operations) and you want the retry decision to be explicit, not implicit.
You are debugging a loop that burns tokens on doomed retries and you want visibility into why each call failed.
You are composing llm-retry-py for backoff and llm-circuit-breaker-py for persistent failure detection, and you want a shared vocabulary between them.
You are writing a tool wrapper and you want to test that a given exception produces the right kind code, without having to simulate the full error path.

When this is NOT what you want

If your tools never fail or you only care about "failed vs. succeeded". The library adds a step that has no benefit without branching on the result.
If your tool exceptions already carry structured metadata that your loop reads directly. The classifier is for teams that have raw Python exceptions and want a stable code without migrating their entire exception hierarchy.

Install

pip install tool-error-classify

Repo: https://github.com/MukundaKatta/tool-error-classify

Sibling libraries

Lib	Boundary	Repo
tool-error-classify	Classify error kind after a tool raises	this repo
llm-retry-py	Retry backoff with per-provider retryable-code presets	https://github.com/MukundaKatta/llm-retry-py
llm-circuit-breaker-py	Open/half-open circuit on repeated errors	https://github.com/MukundaKatta/llm-circuit-breaker-py
tool-call-budgets	Per-tool call-count cap to stop runaway loops	https://github.com/MukundaKatta/tool-call-budgets
agentvet	Arg validation before the call, prevents VALIDATION errors upstream	https://github.com/MukundaKatta/AgentVetPy

What's next

A register_classifier(fn) hook so teams can add project-specific exception types without forking the library. If your codebase raises MyCompanyAuthError and you want that to produce ErrorKind.AUTH, you should be able to register a one-liner rule rather than wrapping every call site.

DEV Community