Mukunda Rao Katta

Posted on May 25

Make Your Agent's API Calls Idempotent Before You Need To

#hermeschallenge #ai #python #agents

Here is a scenario that happens more often than people admit. Your agent calls a payment API. The call succeeds on the server side. The response gets lost in transit. Your retry logic fires. The payment goes through twice.

The retry is not the bug. The retry is correct behavior. The problem is that the underlying call was not idempotent.

This post is about two separate tools for two separate problems. agentidemp-py generates and manages idempotency keys for external API calls. tool-result-cache caches tool return values so the LLM agent never makes the same call twice in a session. You often need both, but you need them for different reasons.

The Double-Send Problem

LLM agents retry for two reasons. The model decides to retry because the tool returned an error. Or llm-retry fires because the HTTP call timed out or got rate-limited.

In the second case, the tool might have run fine. The timeout happened after the remote processed the request but before the response arrived. Without idempotency, you get a duplicate side effect.

This matters for:

Payment and billing APIs
Email sends
Webhook deliveries
Database writes without upsert semantics
Anthropic message batches (charged per request submitted)

It does not matter for:

Read-only queries
Idempotent-by-nature APIs (most GET requests, S3 puts with the same key)
Operations where duplicate is harmless

Two Approaches, Different Jobs

Idempotency keys work when the remote API supports them. You attach a stable key to the request. If the server sees the same key twice, it returns the original result instead of running the operation again. Stripe, Anthropic batches, and some payment processors support this.

Result caching works regardless of what the remote supports. You cache the return value locally keyed on the call arguments. If the same tool is called with the same arguments, you return the cached result. The remote never sees the duplicate request.

# pip install agentidemp-py tool-result-cache

# Pattern 1: Idempotency keys with agentidemp-py
from agentidemp import IdempKey

# Scope the key to a session and tool name
# Same session + same tool + same args = same key, every time
def charge_customer(customer_id: str, amount_cents: int, session_id: str) -> dict:
    key = IdempKey.for_args(
        scope=f"charge:{session_id}",
        args={"customer_id": customer_id, "amount_cents": amount_cents},
    )
    response = stripe.PaymentIntent.create(
        amount=amount_cents,
        currency="usd",
        customer=customer_id,
        idempotency_key=str(key),
    )
    return {"payment_intent_id": response.id, "status": response.status}


# Pattern 2: Result caching with tool-result-cache
from tool_result_cache import cache_tool

@cache_tool(ttl_seconds=300, max_size=512)
def lookup_customer(customer_id: str) -> dict:
    # Within a 5-minute window, the same customer_id
    # returns the cached result without hitting the DB.
    return db.customers.get(customer_id)


# Pattern 3: Combine both for write operations
from agentidemp import IdempKey
from tool_result_cache import cache_tool

@cache_tool(ttl_seconds=60)  # cache first, avoid even generating the key
def send_confirmation_email(order_id: str, email: str, session_id: str) -> dict:
    key = IdempKey.for_args(
        scope=f"email:{session_id}",
        args={"order_id": order_id, "email": email},
    )
    result = email_service.send(
        to=email,
        template="order_confirmation",
        context={"order_id": order_id},
        idempotency_key=str(key),
    )
    return {"sent": True, "message_id": result.id}

When to Use Keys

Use idempotency keys when:

The remote API explicitly supports them (Stripe, some payment processors, Anthropic batch API)
The operation has money, email, or message delivery attached
You are using llm-retry-py and the retry policy covers timeouts and 5xx errors

The key scope matters. Scope too broadly (global key per tool name) and two different sessions collide. Scope too narrowly (random UUID per call) and you get no idempotency benefit. The right scope is usually: session ID + tool name + argument fingerprint.

agentidemp-py generates the fingerprint for you from a dict of arguments. It hashes them with SHA-256, so two calls with the same arguments get the same key regardless of dict ordering.

When to Use Cache

Use tool-result-cache when:

The tool is read-only or the remote is not idempotency-aware
You want to avoid any duplicate network call, not just duplicate side effects
The result is stable within a session window (customer lookup, config fetch, catalog query)

The cache is in-process and in-memory by default. It does not survive process restarts. For multi-process agents, you need an external store (Redis, memcached). The library supports custom backends via the store= parameter.

from tool_result_cache import cache_tool
import redis

store = redis.Redis(host="localhost", port=6379)

@cache_tool(ttl_seconds=120, store=store)
def get_product_price(sku: str) -> dict:
    return catalog_api.price(sku)

The Combination

Cache first, idempotency key as backup. This is the right default for write tools with retry.

The cache catches the case where the agent tries to re-run the same tool call within the same session, which happens when the model loops or when a tool returns a partial result. The cache returns instantly without generating a key.

If the cache is cold (first call, or TTL expired), the key fires. If the remote processed the request and the response got lost, the remote returns the original result using the key. The agent gets the right answer. No duplicate side effect.

What Does Not Work

Idempotency does not help for truly stateful side effects that the server does not track by key. If your email provider does not support idempotency keys, sending twice means two emails. You need application-level deduplication: check a sent-log before sending.

Caching does not help for operations where the result changes between calls. A live stock price cached for 5 minutes is a business risk, not a performance optimization.

Neither pattern covers concurrent agents. If two agents run at the same time with the same session ID and both try to charge a customer, you need distributed locking or database-level constraints. That is outside what either library provides.

Quick-Start Snippet

pip install agentidemp-py tool-result-cache llm-retry-py

# Example: wrap an existing tool function
from agentidemp import IdempKey
from tool_result_cache import cache_tool
from llm_retry import with_retry, RetryConfig

config = RetryConfig(max_attempts=3, base_delay=1.0, jitter=True)

# Read tools: cache only
@cache_tool(ttl_seconds=300)
def fetch_order_status(order_id: str) -> dict:
    return orders_api.get(order_id)

# Write tools: cache + idempotency key
@cache_tool(ttl_seconds=30)
def refund_order(order_id: str, session_id: str) -> dict:
    key = IdempKey.for_args(
        scope=f"refund:{session_id}",
        args={"order_id": order_id},
    )
    return payments_api.refund(order_id, idempotency_key=str(key))

# Combine with retry
refund_with_retry = with_retry(refund_order, config=config)

Related Libraries

Library	What It Does
agentidemp-py	Scoped idempotency key generation and management
tool-result-cache	LRU+TTL in-process cache for tool call results
llm-retry-py	Exponential backoff retry with jitter
agentvet	Static agent checks before deploy
tool-side-effects-tag	Tag tools as READ/WRITE/IDEMPOTENT/DESTRUCTIVE

What's Next

Once your tools are safe to retry, the next failure mode is tool loops. The model decides a tool failed and calls it again, even though the cache would have returned the right result. tool-loop-guard catches that pattern: it monitors call counts per tool per session and raises before the loop burns through your budget.

The tool-side-effects-tag library gives you a place to declare intent: mark a function as IDEMPOTENT and your agent framework can apply different retry policies automatically based on the tag. Combining tags with agentidemp-py and llm-retry-py gives you a complete retry-safe tool layer without manual per-tool configuration.

DEV Community