klement Gunndu

Posted on Mar 25

Your API Wasn't Designed for AI Agents. Here Are 5 Fixes.

#python #ai #architecture #webdev

AI agents are your API's newest power users. They don't read your docs the way humans do. They parse your OpenAPI spec, call endpoints in loops, retry on every failure, and chain 20 requests without pausing to ask "are you sure?"

Most APIs were designed for frontend developers clicking buttons. That design breaks when an autonomous agent starts calling your endpoints at 3 AM with no human in the loop.

Here are 5 architecture patterns that make your API survive the agent era.

1. Machine-Readable Descriptions in Your OpenAPI Spec

Human developers read your API docs and infer intent. AI agents read your OpenAPI specification and take descriptions literally. Vague descriptions produce wrong API calls.

The difference between an agent choosing the right endpoint and the wrong one is often a single sentence in your spec.

from fastapi import FastAPI
from pydantic import BaseModel, Field

app = FastAPI(
    title="Order Management API",
    description="Manages customer orders. All monetary values are in USD cents.",
)


class OrderCreate(BaseModel):
    """Create a new order. Does NOT charge the customer.
    Use POST /orders/{order_id}/confirm to finalize and charge."""

    customer_id: str = Field(
        description="Unique customer identifier. Format: cust_xxxxxxxxxxxx"
    )
    items: list[str] = Field(
        description="List of SKU strings. Each SKU must exist in the product catalog."
    )
    amount_cents: int = Field(
        description="Total order amount in USD cents. Must match sum of item prices.",
        ge=1,
    )


@app.post(
    "/orders",
    summary="Create a draft order (does not charge customer)",
    response_description="Returns the created order with status 'draft'",
)
async def create_order(order: OrderCreate):
    """Create a draft order. The order is NOT finalized until
    POST /orders/{order_id}/confirm is called. Draft orders
    expire after 30 minutes."""
    return {"order_id": "ord_123", "status": "draft"}

Three things agents need that human developers infer:

Explicit side effects. "Does NOT charge the customer" prevents an agent from assuming a POST creates and charges in one step.
Format specifications in Field descriptions. Format: cust_xxxxxxxxxxxx eliminates guessing about ID formats.
Units in the schema. "USD cents" prevents an agent from sending 19.99 when you expect 1999.

Expose your spec at a standard endpoint like /openapi.json. Agents discover APIs through specs, not landing pages.

2. Idempotency Keys for Safe Retries

Agents retry. Every agent framework includes retry logic. LangChain retries on exceptions. OpenAI's function calling retries on malformed responses. Your own agent loop retries on timeouts.

Without idempotency keys, each retry creates a duplicate. One "create order" intent becomes three orders.

from fastapi import FastAPI, Header, HTTPException, Response
from typing import Optional
import hashlib
import json
import time

app = FastAPI()

# In production: use Redis or a database
idempotency_store: dict[str, dict] = {}
IDEMPOTENCY_TTL_SECONDS = 86400  # 24 hours


@app.post("/payments")
async def create_payment(
    amount_cents: int,
    customer_id: str,
    response: Response,
    idempotency_key: Optional[str] = Header(None, alias="Idempotency-Key"),
):
    if idempotency_key is None:
        raise HTTPException(
            status_code=400,
            detail={
                "error": "idempotency_key_required",
                "message": "POST /payments requires an Idempotency-Key header.",
                "docs": "/openapi.json#/paths/~1payments/post",
            },
        )

    # Check if this key was already processed
    if idempotency_key in idempotency_store:
        cached = idempotency_store[idempotency_key]
        response.headers["X-Idempotent-Replayed"] = "true"
        return cached["response"]

    # Process the payment (your real logic here)
    result = {
        "payment_id": f"pay_{hashlib.sha256(idempotency_key.encode()).hexdigest()[:12]}",
        "amount_cents": amount_cents,
        "customer_id": customer_id,
        "status": "completed",
    }

    # Cache the response
    idempotency_store[idempotency_key] = {
        "response": result,
        "created_at": time.time(),
    }

    return result

The X-Idempotent-Replayed: true header tells the calling agent that this response is cached, not fresh. Agents that track state transitions need to know the difference.

Document idempotency in your OpenAPI spec. Add the Idempotency-Key header as a required parameter on every mutating endpoint. Agents read headers from specs — if the header is documented, the agent sends it.

3. Structured Error Responses

Human developers read "Something went wrong" and check the logs. Agents read it and have no idea what to do next.

Every error your API returns becomes input to an LLM deciding what to do next. Structured errors give agents a recovery path. Unstructured errors produce hallucinated retries.

from fastapi import FastAPI, HTTPException, Request
from fastapi.responses import JSONResponse
from pydantic import BaseModel

app = FastAPI()


class APIError(BaseModel):
    error_code: str
    message: str
    retryable: bool
    retry_after_seconds: int | None = None
    suggestion: str | None = None


ERROR_CATALOG = {
    "insufficient_funds": APIError(
        error_code="insufficient_funds",
        message="Customer balance is below the requested amount.",
        retryable=False,
        suggestion="Reduce the amount or add funds via POST /customers/{id}/balance",
    ),
    "rate_limited": APIError(
        error_code="rate_limited",
        message="Too many requests. Slow down.",
        retryable=True,
        retry_after_seconds=30,
        suggestion="Wait for retry_after_seconds, then retry the same request.",
    ),
    "resource_locked": APIError(
        error_code="resource_locked",
        message="This resource is being modified by another request.",
        retryable=True,
        retry_after_seconds=2,
        suggestion="Retry with the same idempotency key after the wait period.",
    ),
}


@app.exception_handler(HTTPException)
async def structured_error_handler(request: Request, exc: HTTPException):
    if isinstance(exc.detail, dict) and "error_code" in exc.detail:
        return JSONResponse(
            status_code=exc.status_code,
            content=exc.detail,
        )
    # Fallback for unstructured errors
    return JSONResponse(
        status_code=exc.status_code,
        content={
            "error_code": "unknown_error",
            "message": str(exc.detail),
            "retryable": False,
            "suggestion": "Check the request parameters and try again.",
        },
    )

Three fields that change agent behavior:

retryable: Boolean. Agents check this before deciding to retry. Without it, agents retry everything — including errors that will never succeed.
retry_after_seconds: Integer. Tells the agent exactly how long to wait. Without it, agents use exponential backoff or guess.
suggestion: String. The LLM reads this and uses it to decide the next action. "Reduce the amount or add funds via POST /customers/{id}/balance" gives the agent a concrete recovery path.

This turns a dead-end error into a decision tree the agent can navigate.

4. Confirmation Endpoints for Destructive Actions

Agents chain actions. A prompt like "clean up old data" can trigger an agent to call DELETE /users/{id} in a loop without asking anyone. One bad filter and you lose production data.

Destructive operations need a two-step process: prepare, then confirm.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import uuid
import time

app = FastAPI()

pending_deletions: dict[str, dict] = {}
CONFIRMATION_TTL_SECONDS = 300  # 5 minutes


class DeletionPreview(BaseModel):
    confirmation_token: str
    action: str
    affected_resources: list[str]
    affected_count: int
    expires_at: float
    warning: str


@app.post(
    "/users/bulk-delete/preview",
    summary="Preview a bulk deletion. Returns affected resources. Does NOT delete anything.",
)
async def preview_bulk_delete(
    filter_status: str,
    older_than_days: int,
) -> DeletionPreview:
    # Query what WOULD be deleted (read-only operation)
    affected = [f"user_{i}" for i in range(3)]  # Replace with real query

    token = str(uuid.uuid4())
    preview = DeletionPreview(
        confirmation_token=token,
        action="bulk_delete_users",
        affected_resources=affected,
        affected_count=len(affected),
        expires_at=time.time() + CONFIRMATION_TTL_SECONDS,
        warning=f"This will permanently delete {len(affected)} users. "
        f"Confirm within 5 minutes or the token expires.",
    )

    pending_deletions[token] = {
        "filter_status": filter_status,
        "older_than_days": older_than_days,
        "preview": preview.model_dump(),
        "created_at": time.time(),
    }

    return preview


@app.post(
    "/users/bulk-delete/confirm",
    summary="Execute a previewed bulk deletion. Requires a valid confirmation token.",
)
async def confirm_bulk_delete(confirmation_token: str):
    if confirmation_token not in pending_deletions:
        raise HTTPException(
            status_code=404,
            detail={
                "error_code": "invalid_token",
                "message": "Confirmation token not found or expired.",
                "retryable": False,
                "suggestion": "Call POST /users/bulk-delete/preview to get a new token.",
            },
        )

    pending = pending_deletions[confirmation_token]
    if time.time() > pending["preview"]["expires_at"]:
        del pending_deletions[confirmation_token]
        raise HTTPException(
            status_code=410,
            detail={
                "error_code": "token_expired",
                "message": "Confirmation token has expired.",
                "retryable": False,
                "suggestion": "Call POST /users/bulk-delete/preview to generate a new token.",
            },
        )

    # Execute the actual deletion
    del pending_deletions[confirmation_token]
    return {"status": "deleted", "count": pending["preview"]["affected_count"]}

The preview endpoint is a read-only operation. It returns what would happen without doing it. The confirmation endpoint executes — but only with a valid, unexpired token.

This gives the agent (or the human reviewing agent actions) a checkpoint. The token expiration prevents stale confirmations from executing hours later when the data has changed.

5. Rate Limiting With Retry-After Headers

Agents hit rate limits more than humans do. A human clicks a button once. An agent calls your endpoint in a loop until a condition is met.

Most rate limiters return 429 Too Many Requests with no guidance. The agent retries immediately, gets another 429, retries again. This creates a thundering herd.

from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
import time

app = FastAPI()

# Simple per-client rate limiter (use Redis in production)
rate_limit_store: dict[str, list[float]] = {}
REQUESTS_PER_MINUTE = 60


@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    client_id = request.headers.get("X-API-Key", request.client.host)
    now = time.time()
    window_start = now - 60

    # Clean old entries and count recent requests
    requests = rate_limit_store.get(client_id, [])
    requests = [t for t in requests if t > window_start]

    if len(requests) >= REQUESTS_PER_MINUTE:
        oldest_in_window = min(requests)
        retry_after = int(oldest_in_window + 60 - now) + 1

        return JSONResponse(
            status_code=429,
            headers={
                "Retry-After": str(retry_after),
                "X-RateLimit-Limit": str(REQUESTS_PER_MINUTE),
                "X-RateLimit-Remaining": "0",
                "X-RateLimit-Reset": str(int(oldest_in_window + 60)),
            },
            content={
                "error_code": "rate_limited",
                "message": f"Rate limit exceeded. {REQUESTS_PER_MINUTE} requests per minute.",
                "retryable": True,
                "retry_after_seconds": retry_after,
                "suggestion": f"Wait {retry_after} seconds before retrying.",
            },
        )

    requests.append(now)
    rate_limit_store[client_id] = requests

    response = await call_next(request)
    response.headers["X-RateLimit-Limit"] = str(REQUESTS_PER_MINUTE)
    response.headers["X-RateLimit-Remaining"] = str(
        REQUESTS_PER_MINUTE - len(requests)
    )
    return response

Four headers that agents need on every response:

Header	Purpose
`Retry-After`	Seconds until the agent should retry. Agents that respect this stop hammering your server.
`X-RateLimit-Limit`	Total requests allowed per window. Agents use this to pace themselves.
`X-RateLimit-Remaining`	Requests left in the current window. Agents can preemptively slow down before hitting zero.
`X-RateLimit-Reset`	Unix timestamp when the window resets. Agents can schedule their next batch.

The Retry-After header is an HTTP standard (RFC 9110, Section 10.2.3). Most HTTP client libraries and agent frameworks parse it automatically. If you only add one header, make it this one.

The Pattern Behind the Patterns

All five patterns share one principle: make the implicit explicit.

Human developers infer intent from context, read between the lines of error messages, and know not to retry a payment twice. AI agents do none of that. They operate on what your API explicitly tells them.

Explicit descriptions in your spec (Pattern 1)
Explicit idempotency guarantees (Pattern 2)
Explicit error recovery paths (Pattern 3)
Explicit confirmation gates (Pattern 4)
Explicit rate limit boundaries (Pattern 5)

The good news: every pattern here makes your API better for human consumers too. Structured errors, idempotency keys, and rate limit headers are things your human developers have been asking for. Agents just forced the issue.

Start with Pattern 3 (structured errors) and Pattern 5 (rate limiting headers). These require the least refactoring and prevent the most agent-caused incidents. Then add idempotency keys to your payment and mutation endpoints. The confirmation pattern is for high-risk operations where the cost of a wrong action is permanent.

Your API was designed for humans. The agents are already calling it. Make the implicit explicit before they learn the hard way.

Follow @klement_gunndu for more AI architecture content. We're building in public.

Top comments (28)

Victor García • Mar 25

Solid post. We're building a self-hosted AI OS where the agent consumes our API via skill files, so the context is a bit different — but two things are going straight into our backlog:

Adding retryable: boolean to our error envelope. We have structured errors already but never tell the agent explicitly whether to retry.
Idempotency key on POST /emails/send — the one endpoint where a duplicate would actually hurt.

"Make the implicit explicit" is a great framing.

klement Gunndu • Mar 28

Structured errors without retryable is the exact gap that burns agents the most — they end up guessing retry logic from status codes, which gets ugly fast. Skill files as the consumption layer is a smart pattern, curious how you handle versioning w

Victor García • Mar 30

Skills declare a min_claw_version in their YAML frontmatter — that's the only contract they need. The API lives at /api/v1/ and skills are pure Markdown describing HTTP calls, so they don't carry their own semver. A new optional response field doesn't break anything because the agent interprets JSON dynamically, not against a hardcoded schema. If we ever ship a breaking /v2/, the frontmatter tells the runtime which version the skill was written for and routes accordingly. So far hasn't been an issue though.

klement Gunndu • Mar 27

The skill files approach is interesting — you're pre-defining the agent's interaction contract with each API, which solves the discovery problem at the cost of flexibility. retryable in the error envelope is a quick win that pays off immediately. For idempotency, one thing worth noting: putting the key generation on the client side makes the agent's retry logic much cleaner since it controls the deduplication boundary.

Victor García • Mar 30

Yep, fully agree. In our case the "client" is the AI agent itself, so the skill file will just instruct it to generate a UUID before calling POST /emails/send and pass it as X-Idempotency-Key. Agent controls the dedup boundary, server just checks the key. Pairs nicely with retryable: true — the agent knows when to retry and the key ensures it's safe to do so.

klement Gunndu • Mar 25

The retryable boolean addition is exactly the kind of thing that saves agents from burning tokens on retry loops that will never succeed. Most error envelopes stop at the status code and message — telling the agent whether the failure is transient or permanent eliminates an entire class of wasted calls.

Idempotency keys on POST /emails/send is the right call too. That is the pattern where the cost of a duplicate is asymmetric — a duplicate database write is annoying, a duplicate email erodes user trust. Good instinct to prioritize it there first.

klement Gunndu • Mar 28

Structured errors without retryable is such a common gap — agents end up reimplementing retry logic per-endpoint instead of trusting the envelope. Skill files as the consumption layer is a smart pattern, curious if you version those independently f

klement Gunndu • Mar 28

Self-hosted AI OS consuming APIs via skill files is an interesting architecture — you get deterministic routing without the agent guessing which endpoint to call. The retryable boolean and batch endpoints translate directly to that pattern since your skill files can encode the retry logic statically. Curious how you handle schema evolution — when the API adds new fields, do the skill files auto-update or is that a manual sync?

Victor García • Mar 30

Honestly it's kind of a non-issue by design. Skills describe operations — endpoints, verbs, required body fields — not response schemas. So when the API adds a new field, skills don't care. When a skill does need updating (like adding retryable to the error envelope), we ship it in the same commit since skills and API live in the same monorepo. For third-party skills on our registry, min_claw_version in the frontmatter handles compatibility — the registry flags anything that falls behind. The key thing is that because skills are text instructions for an LLM and not compiled code, an outdated skill still works — it just won't leverage the new capability.

klement Gunndu • Mar 28

The retryable boolean on error envelopes is one of those changes that pays for itself in the first week — agents stop burning tokens on non-retriable 422s immediately. Smart call prioritizing the email endpoint for idempotency keys too, that's exactly where duplicates cause real damage.

freerave • Mar 25

If we optimize APIs specifically for AI agents (shorter payloads, descriptive errors), aren't we creating a 'shadow API layer' alongside our main REST/GraphQL services? How do we maintain both without doubling the engineering effort?

klement Gunndu • Mar 28

Really sharp concern — in practice these aren't separate APIs, they're the same endpoints with content negotiation. An Accept: application/ai+json header that triggers leaner serializers and structured error bodies keeps it one codebase with a thin

klement Gunndu • Mar 27

Content negotiation via Accept headers is the cleanest way to avoid the shadow layer problem. Same endpoint, same handler, but the serializer switches between verbose UI payloads and dense agent payloads based on the header. One business logic layer. The only duplication is in the response serializers, which are cheap to maintain. The overhead ends up around 10-15% more code in the response layer, not double.

klement Gunndu • Mar 25

Good question. In practice it doesn't have to be a separate API layer. The changes I described — structured error envelopes, idempotency keys, pagination tokens — benefit human consumers too. The difference is making implicit contracts explicit. A retryable field helps frontend retry logic just as much as it helps an agent. Where it diverges is response verbosity: agents often want flatter, denser payloads while UIs want nested display-ready structures. Content negotiation (Accept headers or query params like ?format=agent) on the same endpoints handles that without maintaining two separate services.

freerave • Mar 25

That’s a very pragmatic approach. Using Content Negotiation to toggle between verbose UI structures and dense agent payloads is brilliant—it keeps the backend DRY while serving two very different consumers. I especially like the idea of making 'retryable' logic explicit in the response; it’s a win-win for both AI and frontend stability. Definitely keeping this 'Format Negotiation' pattern in mind for my next project!

klement Gunndu • Mar 25

Appreciate it. The DRY backend point is key — one data layer with format negotiation at serialization keeps engineering overhead near zero. And the retryable flag is one of those 10-minute changes that saves agents hundreds of wasted retry cycles. Small additions to the response contract have outsized impact on agent reliability.

klement Gunndu • Mar 25

The DRY benefit is the key selling point when pitching this to teams — one endpoint, two representations, zero duplication. Curious how the Format Negotiation pattern works out in your project.

klement Gunndu • Mar 25

Good question about the 'shadow API layer' risk. In practice, Content-Negotiation headers (Accept: application/agent+json vs application/json) let you serve both from the same endpoint — same backend logic, different serialization. The retryable field and cursor-based pagination are improvements that benefit human consumers too, so they're not really agent-only overhead.

Eber Cruz Fararoni • Mar 25

Great discussion here, Klement! Your pragmatic approach to the retryable boolean is a lifesaver for production environments—it’s one of those 'obvious in hindsight' fixes that saves a fortune in tokens.

Your points on idempotency actually resonate deeply with a challenge I've been tackling in my project, C-Fararoni (a Java 25 ecosystem for sovereign agents). I’ve noticed that even with a perfect API, 'Black Box' cloud agents sometimes struggle with Intent Integrity—for instance, hallucinating a new amount or a different key during a timeout retry, which can bypass even the best server-side logic.

To complement your server-side defense, we’ve been experimenting with a 'Sovereign' approach that I’d love to get your take on:

Deterministic State Graphs: Instead of letting the LLM decide the retry logic, we use a Directed State Graph (FSM). If the API returns your retryable: false flag, the graph physically removes that transition path. It turns a 'suggestion' into a hard architectural constraint.

Client-Side Heuristic Distillation: To @freerave’s point about 'Shadow APIs'—we found we could keep the backend DRY by doing the 'cleanup' on the agent's side. We use a distiller that prunes 1MB of raw logs into a 20-line signal before the LLM even sees it.

State-Derived Idempotency: We’ve started hashing the (StateNode + Payload) to generate keys. This way, even if the model is having a 'creative' moment, the infrastructure forces the same key for the same intent.

By moving this logic into a strictly typed Java 25 orchestrator and using local models (Qwen/Llama), we're trying to ensure the agent acts as a predictable engineering component.

Your article gave me some great ideas on how to better structure our error envelopes. Thanks for sharing this—it's refreshing to see someone focusing on the 'plumbing' that actually makes agents viable in production!

freerave • Mar 25

Spot on! The transition from treating API responses as "suggestions" to enforcing them as hard architectural constraints via an FSM is exactly where the industry needs to head. Relying on an LLM to preserve intent during a network timeout is a massive gamble.

I'm actually architecting the execution and networking layer for a VS Code extension in my ecosystem called dotfetch, and your concept of State-Derived Idempotency hit the nail on the head. Running the agent within an editor environment means we treat the generative layer as inherently untrusted. By hashing the (StateNode + EditorContextPayload) directly within the extension's TypeScript orchestrator before any request fires, we effectively build a Zero-Trust boundary right inside the IDE. If the model gets "creative" and hallucinates a modified payload during a retry, the deterministic hash mismatch acts as a kill switch.

For dotfetch, I'm also implementing a similar heuristic distillation pipeline. Instead of dumping raw language server logs or massive workspace errors into the LLM, we use strict schema validation (like Zod) to prune the context down to a surgical, deterministic signal before it even leaves the editor. It keeps the token burn minimal and prevents the LLM from spiraling.

Wrapping a non-deterministic black box inside a strictly typed, sovereign FSM on the client side is defensive engineering at its finest. Brilliant insights for C-Fararoni!

klement Gunndu • Mar 26

The FSM framing is strong. When state transitions are explicit and the agent can only move along defined paths, you eliminate an entire category of retry bugs. The key shift is treating the API response not as data to interpret but as a state machine instruction the agent must follow. That constraint is what makes idempotency enforceable rather than aspirational.

klement Gunndu • Mar 28

The retryable boolean is exactly that — once you've watched an agent burn through $40 retrying a 403, you never ship an error response without it again. Curious what idempotency challenge you're hitting, happy to dig into it.

klement Gunndu • Mar 25

The FSM approach to idempotency is a strong pattern. When the state machine enforces valid transitions, the API stops relying on the agent to remember where it left off. The server becomes the source of truth for workflow progress, which matters most when network failures interrupt multi-step operations.

klement Gunndu • Apr 2

Spot on about the token savings — we saw retry storms drop 60% just by adding that boolean. Idempotency keys are deceptively tricky in practice; curious what patterns you landed on for your use case.