DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

Rust: Stop Retries From Double-Submitting LLM Calls With Content-Derived Idempotency Keys

The retry logic was correct. The bug was in the key generation.

When the agent hit a 429, the retry handler waited, then resent the request. Each attempt generated a fresh UUID as the idempotency key. That UUID went into the X-Idempotency-Key header. Because the key changed on each retry, the downstream system had no way to know the requests were duplicates. It processed each one. The LLM call went through twice. The downstream action happened twice.

The fix was to derive the idempotency key from the content of the request, not from Uuid::new_v4() at call time. Same content, same key. Same key, same deduplication behavior at the server.

The shape of the fix

Before:

let idempotency_key = Uuid::new_v4().to_string(); // new key on every attempt
Enter fullscreen mode Exit fullscreen mode

After:

use agentidemp::sha256_hex;

let payload = serde_json::json!({
    "model": "claude-sonnet-4-6",
    "messages": messages,
});

let key = sha256_hex(payload.to_string().as_bytes());
// ik_4a3b8f1c2e9d... (same for same payload, every time)
Enter fullscreen mode Exit fullscreen mode

Now every retry of the same request produces the same key. The server sees a duplicate and returns the cached response instead of processing the request again.

For scoped keys, where you want to namespace by user or session:

use agentidemp::scoped;

let key = scoped("user:42", payload.to_string().as_bytes());
// ik_e71d... (sha256 of "user:42:" + payload bytes)
Enter fullscreen mode Exit fullscreen mode

For UUID-shaped keys, which some APIs expect:

use agentidemp::uuidv5;

let namespace = uuid::Uuid::parse_str("6ba7b810-9dad-11d1-80b4-00c04fd430c8").unwrap();
let key = uuidv5(&namespace, payload.to_string().as_bytes());
// ik_550e8400-e29b-41d4-a716-... (UUID v5 format with ik_ prefix)
Enter fullscreen mode Exit fullscreen mode

What it does NOT do

  • It does not handle the retry loop itself. It generates keys. Wire those keys into whatever retry mechanism you already have.
  • It does not know which fields to include or exclude from the payload before hashing. If your payload has a timestamp field that changes on every request, you need to strip it before passing to the key function, or the key will change.
  • It does not store keys or check for duplicates. Deduplication is the server's job, or your own cache's job. This crate generates the keys.
  • It does not pad or truncate to a fixed key length beyond the ik_ prefix plus the hex or UUID output.

Inside the lib

The ik_ prefix is not optional.

That is the core design choice. Every function in the crate returns a string that starts with ik_. You cannot disable it. When you see ik_4a3b... in a log line, you know it came from this library. When you see req_8f2c... or a bare UUID, you know it came from somewhere else.

This matters when you are reading logs from a service that mixes random request IDs, trace IDs, and idempotency keys in the same fields. The prefix makes keys identifiable without reading the surrounding context.

request_id: req_7f3a9b2c
trace_id:   0197a3b4-8c2d-4e1f-9a5b-3c7d8e0f1a2b
idem_key:   ik_4a3b8f1c2e9d6a8b3f7c9e1d4a2b5c8e
Enter fullscreen mode Exit fullscreen mode

The sha256_hex function hashes the input bytes and returns ik_ + 64 lowercase hex chars. It is a thin wrapper around a pure-Rust SHA-256 implementation. No C bindings, no OpenSSL dependency.

The uuidv5 function produces a UUID v5 from the given namespace UUID and input bytes, then prefixes with ik_. UUID v5 is SHA-1-based and deterministic given the same namespace and name.

Cross-language consistency: the Python port (agentidemp-py) uses the same prefix and the same hashing algorithm. Given the same input bytes, the Rust and Python versions produce the same ik_ key. That matters when a Rust producer and a Python consumer need to agree on the key for a shared deduplication store.

When useful

Any retry loop that calls an LLM or downstream API. The 429 case is the obvious one. Network timeouts are another: if the request succeeded but the response was lost in transit, the retry should be deduplicated at the server, not reprocessed.

Batch submission with idempotency support. You submit a batch, it fails partway through, you resubmit. Keys derived from item content mean already-processed items are skipped.

Audit logging. If you store the idempotency key alongside the result, you can later correlate any duplicate submissions and see which ones were deduplicated versus processed.

When NOT useful

If the API you are calling does not support idempotency keys, this crate does not help. Some APIs ignore that header entirely.

If your payload legitimately changes on every retry, for example because it includes a retry_count field that increments, you need to decide what to hash. Hashing the full payload including that field defeats the purpose. Hash only the fields that represent the logical identity of the request.

If you are working in Python, use the agentidemp-py sibling instead.

Install

[dependencies]
agentidemp = "0.1"
Enter fullscreen mode Exit fullscreen mode

No required runtime dependencies beyond standard Rust crypto primitives.

Siblings

Lib Boundary Repo
agentidemp-py Same concept, Python API MukundaKatta/agentidemp-py
llm-retry-rs Exponential backoff retry loop MukundaKatta/llm-retry
llm-message-hash-rs Full-request canonical hash MukundaKatta/llm-message-hash
tool-result-cache-rs LRU memoization for tool calls MukundaKatta/tool-result-cache-rs
cachebench Cache hit ratio measurement MukundaKatta/cachebench

What is next

A helper that extracts a canonical "identity subset" from a full request payload would make the pre-hashing step easier. The common case is: hash everything except timestamp, request_id, and trace_id. A field exclusion list would cover that without requiring the caller to build a stripped copy manually.

An LRU-backed local deduplication layer would let small agents skip the round trip entirely for recently-seen keys, before hitting the server deduplication. That would combine naturally with tool-result-cache-rs.

Source: MukundaKatta/agentidemp-rs


Part of the Hermes Agent Challenge sprint.

Top comments (0)