Idempotent Tool Calls: The Retry Safety Net Agents Forget

#ai #agents #reliability #backend

Book: AI Agents Pocket Guide: Patterns for Building Autonomous Systems with LLMs
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Your agent calls a charge_card tool. The provider takes the request, processes the payment, and starts writing the 200 OK back. Halfway through, the connection drops. Your HTTP client raises a timeout. Your retry logic does what you told it to: it fires the same request again. The card gets charged twice.

The customer was charged once for a thing they bought, and once for a network hiccup neither of you saw. Nobody decided this. The retry that protects you from transient failures is the same retry that double-charges when the failure happened after the side effect, not before.

This is the oldest distributed-systems trap, and agents walk straight into it. An agent loop retries more than a normal client does. The model retries on tool errors. Your tool wrapper retries on transient HTTP failures. Your orchestrator retries the whole turn when it times out. Three layers of retry, each reasonable on its own, stacked on top of a tool that was never made safe to call twice.

The fix is idempotency keys. They are boring, they are well understood by every payments API you already use, and almost no agent codebase wires them through the tool layer.

Why a timeout is not a failure

The mistake hiding under double-charges is treating a timeout as "it didn't happen." A timeout means you stopped waiting. It says nothing about what the server did.

There are three outcomes when a tool call times out:

The request never reached the server. Safe to retry.
The request reached the server, did the work, and the response got lost on the way back. Not safe to retry.
The request is still running on the server right now. Retrying races your own in-flight call.

From the caller's side these are indistinguishable. You got a timeout in all three. So you cannot decide whether to retry based on the error. You have to make the retry safe regardless of which one happened.

That is what idempotency gives you. The server recognizes a repeated request and returns the result of the first one instead of doing the work again.

What idempotency keys actually do

An idempotency key is a unique token the caller generates and attaches to a request. The server stores the key alongside the result of the first request that used it. When a request arrives with a key it has already seen, the server skips the work and replays the stored result.

Stripe has done this for years. You send Idempotency-Key: <uuid> on a POST /charges, and if you retry with the same key, you get the original charge back, not a second one. The key is the caller's promise: "these two requests are the same intent, do the work once."

The hard part for agents is not the server side. Most serious APIs already support this. The hard part is generating the key in the right place and threading it through the layers of retry so all of them carry the same one.

Generate the key once, per intent

The key has to be stable across every retry of the same logical action, and different across distinct actions. That means you generate it at the point where the intent is formed, not at the point where the request is sent.

In an agent loop, the intent is formed when the model emits a tool call. Each tool-call block the model produces has its own ID. That ID is the natural anchor for the idempotency key, because it is stable: when your orchestrator retries the turn, it is replaying the same tool-call block, with the same ID.

import hashlib
import json


def idempotency_key(session_id, tool_call):
    # Stable across retries of the same tool-call block.
    # tool_call.id is assigned by the model per call.
    raw = f"{session_id}:{tool_call.id}"
    return hashlib.sha256(raw.encode()).hexdigest()

Folding session_id in keeps two different sessions from colliding if a model reuses a call ID. Hashing keeps the key a fixed length and hides the internal IDs from the downstream provider.

What you do not want is to generate a fresh UUID inside the HTTP retry loop. That is the bug. A new UUID per attempt means every retry looks like a new intent to the server, which is exactly the double-charge you were trying to prevent.

Threading the key through the tool call

Here is a tool runner that derives the key from the tool call and passes it to the underlying API. The key travels with the request, so every retry of that request carries the same one.

import httpx


def run_charge_tool(session_id, tool_call, http):
    key = idempotency_key(session_id, tool_call)
    args = tool_call.input

    resp = http.post(
        "https://api.example-pay.com/v1/charges",
        headers={"Idempotency-Key": key},
        json={
            "amount": args["amount"],
            "currency": args["currency"],
            "customer": args["customer_id"],
        },
    )
    resp.raise_for_status()
    return resp.json()

If this call times out and your client retries, the retry sends the same Idempotency-Key. The payments provider recognizes it and returns the first charge instead of making a second. The double-charge cannot happen, even though the timeout looked identical to a real failure.

The key insight: the safety lives in the key, not in the retry logic. You do not have to make the retry smarter. You make the call safe to repeat, then retry as aggressively as you like.

When the tool is yours, you own the dedup table

External APIs hand you idempotency for free. Internal tools (the ones your team wrote, the database writes, the "create ticket" and "send email" actions) usually do not have it. You have to add it.

The pattern is a dedup table keyed on the idempotency key, written in the same transaction as the side effect.

def create_ticket(key, payload, db):
    with db.transaction() as tx:
        existing = tx.execute(
            "SELECT result FROM tool_dedup WHERE key = %s",
            (key,),
        ).fetchone()
        if existing:
            return existing["result"]

        ticket_id = tx.execute(
            "INSERT INTO tickets (title, body) "
            "VALUES (%s, %s) RETURNING id",
            (payload["title"], payload["body"]),
        ).fetchone()["id"]

        result = {"ticket_id": ticket_id}
        tx.execute(
            "INSERT INTO tool_dedup (key, result) "
            "VALUES (%s, %s)",
            (key, json.dumps(result)),
        )
        return result

Both writes happen in one transaction. If the connection drops after the commit, the result is already stored under the key, so the retry reads it back instead of creating a second ticket. If the connection drops before the commit, nothing happened, and the retry does the work cleanly. There is no window where the side effect lands but the dedup record does not.

A unique constraint on the key column is your backstop. If two retries race and both pass the SELECT, the second INSERT fails on the constraint, and you catch it and read the winning row. The database is the arbiter, not your application code.

Classify your tools before you trust them

Not every tool needs this. Spending the effort everywhere is wasted. The split is by side effect.

Read-only tools — search, lookup, "get weather", "fetch the user's orders" — are naturally idempotent. Calling them twice returns the same answer and changes nothing. Retry them freely, no key needed.

Write tools that mutate state — charge a card, send an email, create a record, post a message — are the ones that bite. Every one of these needs an idempotency key threaded through every retry layer, or a dedup table if it is internal.

The dangerous middle is the write tool that looks safe. "Append a row to a log." "Increment a counter." "Add an item to a cart." Run those twice and you get duplicate log lines, an off-by-one counter, two items in the cart. They do not page anyone, so they rot quietly until someone notices the numbers drift.

Make the classification explicit in your tool definitions. A flag on each tool that says whether it mutates state, checked by the runner before it decides whether a key is required. A write tool with no idempotency strategy should fail loudly in code review, not silently in production.

Where this fits in the loop

Idempotency keys sit alongside the budgets and circuit breakers that already guard a serious agent loop. The breaker stops a tool that fails repeatedly. The budget stops a loop that runs too long or costs too much. Idempotency keys stop the quieter damage: the side effect that fires twice because a retry could not tell a lost response from a failed request.

You will not see it in your traces the way you see a runaway loop. A double-charge looks like two successful tool calls. The trace is green. The customer's statement is not. That gap (green dashboard, wrong outcome) is exactly where this class of bug lives, and it is why the fix has to be structural rather than something you watch for.

Wire the key from the tool-call ID, thread it through every retry layer, and give your internal write tools a dedup table. Then your retries can be as aggressive as your reliability story needs, because repeating a call is no longer the same as repeating the work.

If you want the wider pattern set this sits inside — bounded loops, structured stop reasons, side-effect-safe tools, and the recovery paths between them — the AI Agents Pocket Guide works through them with the same production framing. The chapter on tool design covers idempotency and dedup as a default, not an afterthought.