DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

Make Your Agent Idempotent: Run It Twice, Get the Same Result

The user submitted the same form twice. Both requests hit your agent. The agent created two orders, sent two confirmation emails, and charged the customer twice.

This is the idempotency problem. It is not new to LLM agents — it is a fundamental distributed systems challenge. But LLM agents have two aggravating factors: they take longer to run (more time for duplicates to arrive) and their tool calls often have real-world side effects (emails, payments, records).

agentidemp-py gives your agent idempotency keys.


The Shape of the Fix

from agentidemp_py import IdempotencyStore, AlreadyProcessed

store = IdempotencyStore(path="./idempotency-keys.jsonl")

def handle_request(request_id: str, payload: dict) -> dict:
    # Check if we've seen this request before
    if store.is_processed(request_id):
        raise AlreadyProcessed(request_id)

    # Mark as in-progress before doing work
    store.mark_in_progress(request_id)

    try:
        result = run_agent(payload)
        store.mark_complete(request_id, result=result)
        return result
    except Exception as e:
        store.mark_failed(request_id, error=str(e))
        raise
Enter fullscreen mode Exit fullscreen mode

First request: processes normally, marks complete with result. Second request: is_processed() returns True, raises AlreadyProcessed. No duplicate processing.


What It Does NOT Do

agentidemp-py does not make your tools idempotent. If your tools send emails or charge payments, those still need to be idempotent at the tool level. The store prevents duplicate agent runs, not duplicate external API calls within the same run.

It does not handle concurrent duplicate requests racing each other. The mark_in_progressmark_complete sequence is not atomic across processes. For concurrent deduplication, you need a distributed lock (Redis SET NX, database row lock).

It does not expire keys automatically. Old idempotency keys accumulate. Add a cleanup job that removes keys older than your retention policy.


Inside the Library

The JSONL store records request state:

{"id": "req-001", "state": "complete", "result": {...}, "ts": 1748107200}
{"id": "req-002", "state": "in_progress", "ts": 1748107201}
{"id": "req-003", "state": "failed", "error": "timeout", "ts": 1748107202}
Enter fullscreen mode Exit fullscreen mode
class IdempotencyStore:
    def __init__(self, path: str):
        self._path = Path(path)
        self._cache: dict[str, dict] = {}
        self._load()

    def _load(self) -> None:
        if self._path.exists():
            for line in self._path.read_text().splitlines():
                if line.strip():
                    record = json.loads(line)
                    self._cache[record["id"]] = record

    def is_processed(self, request_id: str) -> bool:
        record = self._cache.get(request_id)
        return record is not None and record["state"] == "complete"

    def mark_in_progress(self, request_id: str) -> None:
        self._write({"id": request_id, "state": "in_progress", "ts": time.time()})

    def mark_complete(self, request_id: str, result=None) -> None:
        self._write({"id": request_id, "state": "complete", "result": result, "ts": time.time()})
Enter fullscreen mode Exit fullscreen mode

The cache avoids re-reading the file on every check. The cache is loaded once on init and updated with every write.

States: in_progress requests can be retried (the agent might have crashed mid-run). complete requests return the cached result. failed requests can be retried with a new request_id or explicitly cleared.


When to Use It

Use it whenever your agent handles requests that may be submitted more than once. User-facing web APIs with retry buttons. Event-driven systems where the same event can be delivered multiple times (at-least-once delivery). Webhook handlers.

The request ID should come from the caller, not be generated by your agent. If the caller submits the same logical request twice, they should use the same request ID. Your agent uses the ID to detect duplicates.

The result caching is a bonus: if the caller asks "what happened to request req-001?", return the cached result without re-running.


Install

pip install git+https://github.com/MukundaKatta/agentidemp-py

# Also available on PyPI
pip install agentidemp-py
Enter fullscreen mode Exit fullscreen mode
from agentidemp_py import IdempotencyStore, AlreadyProcessed
from fastapi import FastAPI, HTTPException

app = FastAPI()
store = IdempotencyStore(path="./idempotency-keys.jsonl")

@app.post("/agent/run")
async def run_agent_endpoint(
    request_id: str,
    payload: dict,
) -> dict:
    if store.is_processed(request_id):
        # Return cached result
        cached = store.get_result(request_id)
        return {"status": "already_processed", "result": cached}

    store.mark_in_progress(request_id)

    try:
        result = await run_agent(payload)
        store.mark_complete(request_id, result=result)
        return {"status": "complete", "result": result}
    except Exception as e:
        store.mark_failed(request_id, error=str(e))
        raise HTTPException(status_code=500, detail=str(e))
Enter fullscreen mode Exit fullscreen mode

Sibling Libraries

Library What it solves
tool-call-dedup Session-scoped exact-duplicate tool call detection
agent-resume Checkpoint/resume long-running batch jobs
agent-shadow-mode Record-not-execute for pre-deployment validation
llm-batch-coalesce Coalesce concurrent duplicate LLM calls
agent-run-id Run IDs for correlating agent runs

The duplicate-prevention stack: agentidemp-py for request-level deduplication, llm-batch-coalesce for inflight LLM call deduplication, tool-call-dedup for within-session tool call deduplication.


What's Next

TTL-based expiry: IdempotencyStore(path=..., ttl_days=7) that excludes records older than the TTL from is_processed() checks. After 7 days, an old request ID can be reused.

Distributed backend: RedisIdempotencyStore(redis_url=..., ttl_seconds=86400) that uses SET NX for atomic "first write wins" semantics. This handles concurrent duplicate requests that the JSONL store cannot.

Result compression: for large results, the JSONL file can grow quickly. Optional gzip compression on the result field would reduce storage by 60-80% for typical JSON results.


Built as part of the agent-stack family: composable Python primitives for production LLM agents.

Top comments (0)