DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

Stop Generating Fresh UUIDs as Idempotency Keys

The double-charge that should not have happened

The error budget alert fired at 2am. A rate-limit spike. The on-call engineer woke up, checked the logs, and saw a cascade of 429s from the model API.

The retry logic kicked in. That was expected. Exponential backoff, jitter, the usual pattern. The requests eventually went through.

By morning, the processing looked fine. But the cost report for that hour was wrong. Almost double what it should have been.

The investigation took two hours. The culprit was a single line written three months earlier.

import uuid

def call_with_retry(prompt):
    for attempt in range(3):
        key = str(uuid.uuid4())  # new key every attempt
        response = client.messages.create(
            ...,
            idempotency_key=key
        )
        return response
Enter fullscreen mode Exit fullscreen mode

The key was regenerated on every retry attempt. Which meant every retry was treated as a new request. The 429 had interrupted mid-submission on some calls, so the API had received the request but not fully acknowledged it. On retry, a fresh key meant a fresh submission. Some prompts ran twice.

The fix is obvious in hindsight: derive the key from the request content, not from a random source. The same input should always produce the same key. Retries are safe because the key is already known to the server.

That is what agentidemp-py does.


The shape of the fix

from agentidemp import sha256_hex, uuidv5, scoped
import hashlib, json

# Build your request once
request = {
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": prompt}],
}

# Derive the key from the content
key = sha256_hex(json.dumps(request, sort_keys=True))
# Returns: ik_4a3b2c1d9e8f7a6b5c4d3e2f1a0b9c8d

# Now retry safely
for attempt in range(3):
    response = client.messages.create(
        **request,
        idempotency_key=key  # same key every attempt
    )
Enter fullscreen mode Exit fullscreen mode

The key is the same on every retry because the input is the same. The API server can recognize the duplicate and return the original result without running the model again.

Three helpers are available depending on what you need.

SHA-256 hex key from any string:

from agentidemp import sha256_hex

key = sha256_hex("some content here")
# ik_4a3b2c1d9e8f7a6b5c4d3e2f1a0b9c8d7e6f5a4b3c2d1e0f9a8b7c6d5e4f3a2
Enter fullscreen mode Exit fullscreen mode

Deterministic UUIDv5 from a namespace and name:

from agentidemp import uuidv5
import uuid

key = uuidv5(uuid.NAMESPACE_URL, "https://example.com/job/42")
# Returns a UUID string, same value every time for the same inputs
Enter fullscreen mode Exit fullscreen mode

Scoped key that ties a key to a logical boundary:

from agentidemp import scoped

key = scoped("catalog-refresh-2026-05-24", json.dumps(request, sort_keys=True))
# catalog-refresh-2026-05-24:ik_4a3b...
Enter fullscreen mode Exit fullscreen mode

The scoped helper is useful when the same content might be processed in different contexts and you want the keys to be distinct. A product description job in May and the same job in June should have different keys even if the input text is the same.


What it does NOT do

Before going further, here is what this library is not.

  • It does not make the HTTP call. You pass the key to your own client. The library only generates the key.
  • It does not store or track keys. There is no database, no file, no in-memory cache. If you want to check whether a key has been used before, that is your infrastructure.
  • It does not enforce idempotency on the API side. Whether the downstream API honors the key is up to that API. Anthropic's Messages API does. Other APIs vary.
  • It does not replace request deduplication on your side. If you are building a distributed system where multiple workers might submit the same job, you still need a coordination layer. This library just ensures every worker generates the same key for the same input.

Inside the library: the mandatory ik_ prefix

Every key returned by sha256_hex and scoped starts with ik_.

This is intentional and not configurable.

The reason is operability. In a real system, logs and databases contain many kinds of hex strings and UUIDs. Request IDs, trace IDs, session tokens, database row identifiers, content hashes. They often look the same.

When you see ik_4a3b2c1d... in a log line, the prefix tells you three things immediately:

  1. This is an idempotency key.
  2. It was generated by this library (or its Rust sibling, which uses the same prefix and produces byte-identical output for the same input).
  3. You can search for this exact string in your retry logs, cost reports, and API provider dashboards.

Without the prefix, a SHA-256 hex string looks like any other SHA-256 hex string. You have to look at context to understand what it represents. With the prefix, the string is self-describing.

The Rust port (agentidemp-rs) uses the same ik_ prefix and the same SHA-256 algorithm. A key generated in Python and a key generated in Rust for the same input are character-for-character identical. That matters in mixed-language systems where Python services and Rust services might need to agree on keys without sharing state.


When this is useful

Retry loops with exponential backoff. Generate the key before the first attempt. Pass the same key on every retry. Any partial submission on a failed attempt will be recognized as a duplicate on the next.

Distributed job queues. Multiple workers pull the same job from a queue. Each worker generates the key from the job content. All workers produce the same key. The API sees the first submission and deduplicates the rest.

Batch custom IDs. The Anthropic Batches API takes a custom_id field per request. That field functions as an idempotency key. Use sha256_hex on the request content to generate a stable custom_id. If you resubmit a batch that was interrupted, the same requests get the same IDs.

Cost reporting and deduplication. When you log idempotency keys alongside costs, you can detect duplicates after the fact by looking for repeated keys in the log. This is easier when the keys are derived from content rather than random, because you can regenerate the expected key and verify it.

Cross-service consistency. Two services that need to agree on an idempotency key for the same operation can both derive the key from the shared input without a coordination round-trip.


When NOT to use this

When you want each submission to be treated as unique regardless of content. Some workflows intentionally submit the same prompt multiple times, for example to sample multiple responses for evaluation. In that case, a content-derived key is wrong because it would cause the API to return the same response every time. Use a random key or include a run ID in the key input.

When the request content is not stable before submission. If your request object is assembled lazily or includes timestamps, counters, or other volatile fields, the key will be different each time. Either stabilize the input first or exclude the volatile fields before hashing.

As a replacement for real idempotency infrastructure. In a high-volume distributed system, you need a coordination layer that tracks which keys have been committed. This library helps you generate consistent keys. It does not replace the tracking layer.


Install

pip install agentidemp-py
Enter fullscreen mode Exit fullscreen mode

Zero dependencies. Python 3.8+.

from agentidemp import sha256_hex, uuidv5, scoped
Enter fullscreen mode Exit fullscreen mode

The full source is at MukundaKatta/agentidemp-py. 33 tests cover the key format, cross-language compatibility with the Rust port, and edge cases around empty strings, Unicode, and large inputs.


Siblings

These libraries compose well with agentidemp-py in an agent stack.

Lib Boundary Repo
llm-message-hash-py Canonical hash of the full LLM request including model, tools, and system. Use this when you want a hash that covers the entire request structure, not just the content. MukundaKatta/llm-message-hash-py
llm-retry-py Where the idempotency key gets used. Pass the key to the API call inside the retry loop. MukundaKatta/llm-retry-py
cachebench Cache miss-aware retry. Uses idempotency keys to track which misses triggered a real API call versus a cached response. MukundaKatta/cachebench
anthropic-batch-kit The custom_id field in each batch request is an idempotency key. Use sha256_hex to generate stable IDs before submitting a batch. MukundaKatta/anthropic-batch-kit

What is next

The library is stable. The main thing missing is a helper for building a canonical string from an Anthropic request dict before hashing it. Right now you do json.dumps(request, sort_keys=True) manually, which works but is easy to get wrong if you forget sort_keys=True or include non-deterministic fields.

A canonical_request_string(request_dict) helper that handles field exclusion and stable serialization would make the common case more obvious. That might land in a follow-up release or as part of llm-message-hash-py, which already owns the canonicalization logic for full request hashing.

For now, the pattern to remember is simple: derive the key before the first attempt, pass the same key on every retry, and put ik_ in your grep patterns.

Top comments (0)