DEV Community

Arson
Arson

Posted on

Your API Is Making AI Agents Dumber (Fix It With Error Classification)

Most APIs return flat errors that mix fundamentally different failure modes. An agent cannot route to the right recovery strategy without parsing natural language error messages. Here is a fix.

The Problem

AI agents call your API with wrong parameters. They retry the same bad request three times, burn credits, and leave. From their logs it looks like your API failed. From yours it looks like the agent never questioned its own inputs.

The default API contract is: send request, get success or failure. The agent learns nothing about WHY it failed.

The Fix: correction_class

Every error response should include a machine-readable correction_class field. One enum value that tells the agent what kind of recovery to attempt:

1. local_repair

Parameter is malformed. Wrong format, typo, missing field. The agent can retry immediately with corrected input.

{
  "error": "invalid_captcha_type",
  "correction_class": "local_repair",
  "correction_reason": "Type 'recaptchav2' is not valid. Use: cloudflare-turnstile, recaptcha-v2"
}
Enter fullscreen mode Exit fullscreen mode

2. upstream_refetch

Parameter was valid when generated but the source data changed. The siteKey extracted from the page is stale because the site rotated it.

{
  "error": "turnstile_not_found",
  "correction_class": "upstream_refetch",
  "correction_reason": "Page content may have changed. Re-crawl the URL and extract fresh parameters."
}
Enter fullscreen mode Exit fullscreen mode

3. plan_invalidation

The context changed enough that the current plan is invalid. The URL no longer has a CAPTCHA, or the site switched types. Do not retry — re-plan.

{
  "error": "site_blocks_automated_solving",
  "correction_class": "plan_invalidation",
  "correction_reason": "This site actively blocks automated solving. Consider an alternative approach."
}
Enter fullscreen mode Exit fullscreen mode

4. capacity_exceeded

The service is the bottleneck. Rate limit, quota, or temporary outage. Wait and retry unchanged.

{
  "error": "rate_limited",
  "correction_class": "capacity_exceeded",
  "correction_reason": "Rate limit exceeded. Retry after 30 seconds."
}
Enter fullscreen mode Exit fullscreen mode

Why This Matters

Most APIs today return all four failure modes as HTTP 400 or 500 with a message string. An agent that receives local_repair knows to fix the parameter and retry. An agent that receives plan_invalidation knows to stop retrying and re-plan entirely. Without this signal, the agent retries everything the same way.

The API that teaches is worth more than the API that just executes.

Implementation

We shipped this on GateSolve today. Every failed CAPTCHA solve now includes correction_class and correction_reason. Turnstile uses solve time to distinguish: fast failure (under 3s) means stale parameters (upstream_refetch), timeout means the site blocks automated solving (plan_invalidation).

The taxonomy is simple enough to adopt in any API. One enum field. Four values. The agent reads it and routes to the right recovery path.

Would love to see other agent-facing APIs adopt the same pattern. The more consistent the taxonomy, the smarter agents get at recovery across tools.


If you build agent-facing APIs, try adding correction_class to your error responses. Your agent users will thank you (silently, by not churning).

Top comments (0)