The bug was obvious in retrospect.
My agent called a third-party API. The access token had expired. The API returned 401 Unauthorized. My retry wrapper caught the exception, checked "is this an exception?", said yes, and tried again. Nine more times.
Nine round-trips to an endpoint that was never going to succeed until I rotated the token. The model kept reasoning "I got an error, I should retry" because my retry logic had given it no other vocabulary. The loop had one question: did the call raise? It had no idea that some exceptions are retryable and some are not.
tool-error-classify is the small Python library I wrote to fix this. It gives every tool exception a stable ErrorKind code the agent loop can branch on. On PyPI as tool-error-classify. 51 tests, zero deps.
The shape of the fix
Before, the agent loop looked like this:
try:
result = call_tool(args)
except Exception as e:
# Retry? Abort? We have no idea.
retry_or_give_up(e)
After, it looks like this:
from tool_error_classify import classify, ErrorKind
try:
result = call_tool(args)
except Exception as e:
kind = classify(e)
if kind == ErrorKind.TRANSIENT:
retry_with_backoff(e)
elif kind == ErrorKind.QUOTA:
back_off_and_retry(e, hint=kind.retry_after)
elif kind == ErrorKind.AUTH:
abort("Credentials expired. Rotate token and restart.")
elif kind == ErrorKind.NOT_FOUND:
tell_model("Resource not found. Check the id you passed.")
elif kind == ErrorKind.VALIDATION:
tell_model("Bad arg shape: " + str(e))
else:
log_and_abort(e)
The auth case never retries. The quota case backs off. The transient case retries. Each branch is a decision you make once in the loop, not scattered across every tool.
The ErrorKind enum
Eight values, each with a clear meaning:
class ErrorKind(Enum):
TRANSIENT # Temporary blip, safe to retry
QUOTA # Rate-limited (429), back off before retrying
AUTH # Credentials expired or missing, do not retry
NOT_FOUND # Tool arg pointed to a missing resource
VALIDATION # Bad arg shape, model should fix the call
TIMEOUT # Request timed out, may be safe to retry
SERVER_ERROR # Remote 5xx, retry with caution
UNKNOWN # Could not classify
The classifier checks HTTP status codes first, then Python exception class names, then walks the native exception chain. For QUOTA errors, it also parses the Retry-After header and exposes it as kind.retry_after (a datetime or a seconds delta).
from tool_error_classify import classify, ErrorKind
err = classify(my_exception)
print(err) # ErrorKind.QUOTA
print(err.retry_after) # datetime(2026, 5, 24, 14, 32, 10) or None
What it does NOT do
- It does not catch exceptions. You raise, you catch, then you classify. The library never wraps your tool call or touches control flow.
- It does not retry anything. Classification and retry are separate concerns. Pair with
llm-retry-pyto handle the backoff logic itself. - It does not know your tool's business logic.
NOT_FOUNDmeans an HTTP 404 or aFileNotFoundErrorin the chain. Whether that should retry with a different resource id is your call. - It does not replace structured error types in your own tools. If you already raise
MyAuthError(ExpiredTokenError), the classifier will walk the chain and find it. You keep your exception hierarchy, the library just labels it.
Inside the lib: classifier, not catcher
This was the main design decision.
Every error-handling library I looked at mixed classification and control flow. Decorators that caught and re-raised. Context managers that retried. Wrappers that swallowed exceptions.
I wanted none of that. Once you mix "what kind of error" with "what do I do about it", you end up with a library that is opinionated about retry counts, backoff curves, fallback providers. Those are real problems, but they are not this library's problem.
The classifier's only job is to answer one question: what kind of error is this?
def classify(exc: BaseException) -> ErrorKind:
...
It takes an already-raised exception and returns a code. It does not catch. It does not retry. It does not log. It classifies.
This means you can use it anywhere in an existing try/except chain without restructuring your code. Add two lines and your loop gains a vocabulary.
except Exception as e:
kind = classify(e) # line 1
if kind == ErrorKind.AUTH: # line 2
raise # or abort, or log, your call
How classification works
Three passes, in priority order.
HTTP status codes. If the exception (or any cause in its chain) carries a status_code, status, response.status_code, or similar attribute, the code drives classification. 401/403 becomes AUTH. 404 becomes NOT_FOUND. 422 becomes VALIDATION. 429 becomes QUOTA. 5xx becomes SERVER_ERROR or TRANSIENT depending on the code.
Exception class names. If no status code is found, the classifier checks the class name hierarchy. TimeoutError, asyncio.TimeoutError, requests.exceptions.Timeout all produce TIMEOUT. FileNotFoundError, KeyError with a resource-shaped message, NotFoundError subclasses produce NOT_FOUND. Permission and auth exception names produce AUTH.
Chain walk. If the top-level exception is a wrapper like ToolCallError(cause=real_error), the classifier walks __cause__ and __context__ chains to find the innermost exception that has a classifiable attribute or name. You get the real error kind, not "wrapped exception, unknown".
If all three passes fail, the result is UNKNOWN. The loop can decide what that means.
When this is useful
- You have an agent loop that calls external tools (APIs, databases, file operations) and you want the retry decision to be explicit, not implicit.
- You are debugging a loop that burns tokens on doomed retries and you want visibility into why each call failed.
- You are composing
llm-retry-pyfor backoff andllm-circuit-breaker-pyfor persistent failure detection, and you want a shared vocabulary between them. - You are writing a tool wrapper and you want to test that a given exception produces the right kind code, without having to simulate the full error path.
When this is NOT what you want
- If your tools never fail or you only care about "failed vs. succeeded". The library adds a step that has no benefit without branching on the result.
- If your tool exceptions already carry structured metadata that your loop reads directly. The classifier is for teams that have raw Python exceptions and want a stable code without migrating their entire exception hierarchy.
Install
pip install tool-error-classify
Repo: https://github.com/MukundaKatta/tool-error-classify
Sibling libraries
| Lib | Boundary | Repo |
|---|---|---|
| tool-error-classify | Classify error kind after a tool raises | this repo |
| llm-retry-py | Retry backoff with per-provider retryable-code presets | https://github.com/MukundaKatta/llm-retry-py |
| llm-circuit-breaker-py | Open/half-open circuit on repeated errors | https://github.com/MukundaKatta/llm-circuit-breaker-py |
| tool-call-budgets | Per-tool call-count cap to stop runaway loops | https://github.com/MukundaKatta/tool-call-budgets |
| agentvet | Arg validation before the call, prevents VALIDATION errors upstream | https://github.com/MukundaKatta/AgentVetPy |
What's next
A register_classifier(fn) hook so teams can add project-specific exception types without forking the library. If your codebase raises MyCompanyAuthError and you want that to produce ErrorKind.AUTH, you should be able to register a one-liner rule rather than wrapping every call site.
Top comments (0)