I needed to cache LLM responses to avoid paying twice for the same prompt. My first cache key was json.dumps(request_dict). That seemed fine until I checked the hit rate: it was near zero. I dug in and found the problem. json.dumps on a Python dict is not deterministic. Key order depends on insertion order, which depends on how the dict was built. Two requests with the same logical content but constructed from different code paths had different key order and therefore different cache keys. Every call was a miss.
My second attempt was json.dumps(request_dict, sort_keys=True). Hit rate jumped. I thought I had it. Then I noticed another problem: the Anthropic SDK was adding a request_id field and a stream field to the payload dict before serialization. Those fields changed on every call. Even with sorted keys, my cache key was different every time because of these injected fields. The hash was stable within the serialization call but not across two calls that were logically identical.
The actual fix needs two things. First, recursive key sorting so the entire nested structure is canonical regardless of how it was built. Second, a way to strip provider-injected noise fields before computing the hash. llm-message-hash-py does both.
Shape of the fix
from llm_message_hash_py import message_hash, HashOpts, AnthropicOpts
request = {
"model": "claude-sonnet-4-6",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 100,
"stream": True, # noise field, dropped by AnthropicOpts
"request_id": "abc123", # noise field, dropped by AnthropicOpts
}
h = message_hash(request, opts=AnthropicOpts)
# Same hash whether stream=True or False
# Same hash whether request_id is "abc123" or "xyz456"
# OpenAI variant: different noise fields
from llm_message_hash_py import OpenAIOpts
openai_request = {
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}],
"stream": False,
"user": "req-session-123", # user tag is noise, dropped by OpenAIOpts
"n": 1, # n=1 is the default, dropped by OpenAIOpts
}
h2 = message_hash(openai_request, opts=OpenAIOpts)
# Custom opts: specify exactly which top-level fields to drop
from llm_message_hash_py import HashOpts
custom_opts = HashOpts(drop_fields={"stream", "request_id", "x_trace_id", "x_request_uuid"})
h3 = message_hash(request, opts=custom_opts)
# No opts: hash the full request with zero field dropping, just canonical key order
h4 = message_hash(request)
The return value is a 64-character lowercase hex string. It is the sha256 of the canonical JSON bytes. Same length and format regardless of input size.
What it does NOT do
This library produces a hash. It does not store or look up cached values: wiring the hash into a cache, a key-value store, or a database is your code's job. It also does not understand semantic equivalence: two prompts with the same meaning but different wording produce different hashes. It does not handle binary content or file uploads where the payload is not JSON-serializable strings. It also does not do partial in-request hashing: the hash covers the entire dict you pass in, minus the dropped fields. If you want to hash only the messages array and ignore model and parameters, call message_hash({"messages": request["messages"]}) directly.
For Anthropic prompt cache scope hashing specifically (system prompt plus tools, clipped to the cache breakpoint), use prompt-cache-key instead. The two libraries are complementary: llm-message-hash-py hashes a full request structure, prompt-cache-key hashes the Anthropic-specific cache scope.
Inside the lib
The canonicalization step is a recursive dict walk. For every dict found anywhere in the structure, keys are sorted alphabetically before serialization. Lists are left in their original order because list order is semantically meaningful for LLM message arrays: message at index 0 comes before message at index 1, and swapping them would change the conversation. Dicts inside lists are also key-sorted. Only dict keys are sorted, never list elements.
After canonicalization the structure is serialized to JSON with no extra whitespace and encoded as UTF-8 bytes. Those bytes go through hashlib.sha256 and the hex digest is returned. No third-party libraries are used in this path.
HashOpts is a dataclass with one field: drop_fields: frozenset[str]. Before canonicalization, any top-level key in drop_fields is removed from the dict copy. Only top-level keys are dropped: nested noise fields would need a separate pass or a custom HashOpts built for that structure. AnthropicOpts is a pre-built HashOpts that drops stream, request_id, and a short list of other fields the Anthropic SDK injects. OpenAIOpts drops stream, user, and n when present.
One important detail: the library operates on a shallow copy of the top-level dict before dropping noise fields. Your original dict is never mutated. If you are passing the same request dict to both the hash function and the LLM call, there is no need to make a defensive copy before calling message_hash.
26 tests cover: key ordering stability across insertion orders, list ordering preservation, noise field dropping, custom opts construction, empty messages array, large nested tool definitions, hash stability across repeated calls with the same input, and the no-mutation guarantee on the input dict.
When useful
- Cache keys for LLM response caches where the same logical request should always hit the same cache entry regardless of dict construction order
- Idempotency keys: pass the hash as a database row key or API header to detect and short-circuit duplicate calls within a session or batch
- Deduplication in batch jobs where the same prompt may appear multiple times from different input sources
- Change detection: store the hash of the current system prompt and tools on deploy, recompute on the next deploy, compare to know if the cache will be cold and you need to warm it
- Content-addressed storage for request archives where you want stable filenames tied to request content
When not useful
- Situations requiring semantic equivalence rather than structural equivalence
- Binary payloads or file upload arguments where the content is not JSON-serializable
- Security-sensitive deduplication where hash collisions are a threat model concern (sha256 is strong for caching, but if you are using this as a cryptographic commitment, use a dedicated signing library)
- Very large request payloads where the recursive sort step is a performance concern (benchmark at your payload size before committing)
Install
pip install llm-message-hash-py
Zero dependencies. Python 3.9+.
Siblings
| Library | Language | What it does |
|---|---|---|
| llm-message-hash | Rust | Original Rust implementation with the same canonicalization approach |
| prompt-cache-key | Python | Hashes the Anthropic cache scope specifically, clipped to the breakpoint |
| agentidemp-py | Python | Idempotency keys using a similar hashing approach |
| tool-result-cache | Rust | LRU and TTL cache for tool results, a natural consumer of this hash |
| cachebench | Python | Prompt-cache observability and hit-rate measurement |
What's next
The most requested feature so far is a hash_messages_only helper that ignores everything in the request except the messages array. That covers the common case where you want to cache based on conversation content but not on model choice or parameter tuning. A hash_tools_only helper for detecting tool list changes separately is also on the list.
Part of the Hermes Agent Challenge sprint. Source at github.com/MukundaKatta/llm-message-hash-py. PyPI: pip install llm-message-hash-py.
Top comments (0)