DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

Stable Cache Keys for LLM Requests: Canonical Hashing in Rust

The cache that always missed

The setup seemed simple. Submit a prompt, hash the request JSON, use the hash as a cache key. If the same prompt comes in again, return the cached response. Skip the API call, save tokens, save latency.

It did not work. The cache miss rate was 100%.

The reason: the request JSON included a timestamp field. Every request, same prompt, different timestamp. Different JSON. Different hash. Cache always missed. A week of caching infrastructure, completely defeated by one noise field that the provider adds to every request.

After removing the timestamp field manually, it worked for Anthropic. Then we added OpenAI. OpenAI includes a request_id field. Add that to the strip list. Then Bedrock. Bedrock's cross-region inference profiles add region headers. The manual strip list grew into its own small project with no tests and no structure.

That is what llm-message-hash wraps up properly.

The shape of the fix

use llm_message_hash::{hash_request, HashOpts, Provider};

let request_json = serde_json::json!({
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "max_tokens": 100,
    "anthropic-version": "2023-06-01",
    "x-request-id": "req_01XYZ",  // noise field
});

let opts = HashOpts::for_provider(Provider::Anthropic);
let key = hash_request(&request_json, &opts)?;
// key is stable across identical prompts regardless of x-request-id or anthropic-version

// Use as cache key:
if let Some(cached) = cache.get(&key) {
    return Ok(cached);
}
let response = client.send(&request_json).await?;
cache.insert(key, response.clone());
Ok(response)
Enter fullscreen mode Exit fullscreen mode

Same prompt, two minutes apart, same model. The hash is identical. The cache hits.

What this is NOT

This is not a semantic similarity check. Two prompts that mean the same thing but are worded differently will hash differently. That is intentional. Semantic caching is a different problem, usually solved at the embedding layer. This crate handles exact-match caching: byte-for-byte same request content, provably same hash.

This is not a request signing tool. The SHA-256 output is for cache keys and idempotency, not for authentication or tamper detection. If you need signed requests, that is a different surface.

This is not provider-agnostic by default. The whole point is that the presets are provider-aware. You can also pass custom HashOpts if you have a non-standard provider or want to drop additional fields.

Inside the lib

Two steps: canonicalize, then hash.

Canonicalization means recursive key-sorted JSON. Take any JSON object, sort its keys alphabetically at every level of nesting, serialize to a canonical string. This makes {"b": 1, "a": 2} and {"a": 2, "b": 1} produce the same canonical form and therefore the same hash. Without this step, field ordering differences in serialization would produce different hashes for identical content.

fn canonicalize(value: &Value) -> Value {
    match value {
        Value::Object(map) => {
            let sorted: BTreeMap<_, _> = map
                .iter()
                .map(|(k, v)| (k.clone(), canonicalize(v)))
                .collect();
            Value::Object(sorted.into_iter().collect())
        }
        Value::Array(arr) => {
            Value::Array(arr.iter().map(canonicalize).collect())
        }
        other => other.clone(),
    }
}
Enter fullscreen mode Exit fullscreen mode

BTreeMap gives sorted iteration by default. The canonicalized value is then serialized to a compact JSON string and SHA-256 hashed.

HashOpts controls which top-level keys are stripped before canonicalization:

pub struct HashOpts {
    pub drop_keys: Vec<String>,
}

impl HashOpts {
    pub fn for_provider(p: Provider) -> Self {
        match p {
            Provider::Anthropic => Self {
                drop_keys: vec![
                    "anthropic-version".into(),
                    "x-request-id".into(),
                    "created_at".into(),
                ],
            },
            Provider::OpenAI => Self {
                drop_keys: vec!["request_id".into(), "user".into()],
            },
            Provider::Bedrock => Self {
                drop_keys: vec!["x-amzn-requestid".into(), "x-amz-date".into()],
            },
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The user field in OpenAI requests is an optional caller-supplied tag. Two requests with the same prompt but different user values are semantically identical for caching purposes, so it goes in the drop list. The Anthropic anthropic-version header is a provider protocol version, not part of the prompt content.

When this is useful

Prompt caching at the application layer. If you are running the same system prompt plus varying user messages, you can cache by a hash of just the user message content after stripping system from the hash scope.

Idempotency keys. Before submitting an expensive batch job, hash the request. If a job with that key is already running or completed, skip the submission. This is the same principle as llm-message-hash: stable identity across semantically identical inputs.

Deduplication in logging. If you log every LLM request and want to collapse duplicates, the hash gives you a compact identity without storing the full request JSON.

A/B testing and replay. If you want to replay an exact request against a different model or provider, the hash lets you verify you are replaying the same content.

When NOT to use this

When your requests are never repeated. If every prompt is genuinely unique, caching adds overhead with no benefit.

When you need semantic matching. If "What is the capital of France?" and "Name the capital city of France." should be cache hits for each other, you need embedding similarity, not canonical hashing.

When field ordering is meaningful in your provider. The key-sort canonicalization assumes field ordering does not change semantics. For all mainstream LLM providers this is true. If you are using a custom or internal model that is sensitive to field ordering, test before relying on this.

Install

[dependencies]
llm-message-hash = "0.1"
Enter fullscreen mode Exit fullscreen mode

GitHub: MukundaKatta/llm-message-hash

crates.io: llm-message-hash

Siblings

Crate What it does
agentidemp-rs Scoped idempotency keys using SHA-256 / UUIDv5
tool-result-cache-rs LRU + TTL memoization for tool calls
cachebench Prompt cache hit-ratio observability across providers
llm-circuit-breaker Circuit breaker to stop hammering degraded endpoints

What is next

The obvious gap is a Python port with matching drop-set presets. The Rust version is the reference implementation. A Python port would make this usable in the much larger existing Python agent ecosystem without rewriting the logic.

Streaming request hashing is also something worth adding. Right now the hash is computed on the full request object. For streamed requests, you sometimes want to hash the assembled message content before the stream is complete. That requires a different entry point that accumulates chunks and hashes the assembled result.

The core approach is stable and ships today. Add it to any place you are building a cache key by hand from request JSON, and stop losing cache hits to timestamp fields.

Top comments (0)