I started caching LLM responses last month. The cache key was the obvious thing. Hash the request body. Look up. Hit or miss.
It missed every time.
Three hours later I had the answer. Python dict iteration order is stable inside one process, but the JSON I was hashing came back from json.dumps with whatever the previous library left in memory. Two requests with identical content produced two different byte strings because the messages array had role before content once and content before role the other time. Two different hashes. Two different cache entries. Zero hits.
This is the kind of bug that does not show up in any test you write before shipping.
llm-message-hash is the small Rust crate I wrote so I never have to debug it again. It takes a serde_json::Value (or anything that serializes to one), canonicalizes it according to a configurable HashOpts, and returns a stable sha256-hex. The same logical request always produces the same hash. The crate is on crates.io as llm-message-hash.
The problem
LLM SDK request bodies are JSON objects with a fixed shape. model, messages, tools, max_tokens, temperature, sometimes system. The shape is stable. What changes is everything around the edges.
- Per-call IDs (
request_id,idempotency_key) - Auth headers and metadata (
user,session_id) - Provider-specific telemetry fields that show up in some SDK versions and not others
- Object key order, which is technically not part of JSON's contract
- Top-level wrappers some SDKs add (
raw_response,event,data)
If you hash the raw request body, all of that noise shows up in your cache key. Hit ratios drop to near zero in production, and you cannot tell why without printing both byte strings side by side.
The shape of the fix
You need a hash that says "same logical request" regardless of the noise. That means three things.
- Walk the JSON recursively. Sort every object's keys alphabetically. Lists keep their order because element order is semantically meaningful (a
messagesarray has turns in order). - Drop a configurable set of noise fields before hashing.
- Run sha256 over the canonicalized bytes.
The crate's full surface is small.
use llm_message_hash::{HashOpts, hash};
use serde_json::json;
let req = json!({
"model": "claude-sonnet-4-7",
"messages": [
{"role": "user", "content": "Hello"},
],
"metadata": {"request_id": "req-abc"},
"max_tokens": 1024,
});
// Same request, key order shuffled.
let req_shuffled = json!({
"max_tokens": 1024,
"messages": [
{"content": "Hello", "role": "user"},
],
"metadata": {"request_id": "req-xyz"},
"model": "claude-sonnet-4-7",
});
let opts = HashOpts::anthropic();
assert_eq!(hash(&req, &opts).unwrap(), hash(&req_shuffled, &opts).unwrap());
HashOpts::anthropic() is a preset that drops the noise fields Anthropic adds that should not affect cache identity (metadata, top-level request_id, certain headers). The crate ships presets for OpenAI, Bedrock, and Gemini, plus a HashOpts::default() that drops nothing.
What it does NOT do
- It does not hash response bodies. Responses are not cache keys.
- It does not validate that your request is a valid LLM request. If you hand it garbage, you get a stable hash of garbage.
- It does not handle streaming. If you are caching streamed responses, materialize them first.
- It does not deal with prompt-cache breakpoints. That is a separate concern (see
cachebench).
Inside the lib: one design choice worth showing
The hard part of canonicalizing JSON is not the recursion. It is deciding what counts as "noise."
Some fields are clearly noise. A request ID is freshly minted on every call, so dropping it is obvious. Some fields are clearly content. A user message is the whole point.
The hard cases are in between. temperature looks like content because it influences the model. But if a caller sets temperature: 0.7 once and temperature: 0.70000001 the next time because of floating point round-tripping, you want those to hash the same.
The crate's answer is HashOpts::number_precision: Option<u32>. If set, numbers are normalized to N significant digits before hashing. If not set, numbers are byte-compared. Default is unset, because in practice the float-roundtrip case is rare and silent equality is worse than a cache miss.
let opts = HashOpts::anthropic().with_number_precision(6);
Six significant digits matches what most JSON pretty-printers produce. Beyond that you are caching round-off, which probably is not what you want.
When this is useful
- Building a prompt-cache layer over Anthropic, OpenAI, Bedrock, or Gemini and want stable keys.
- Building an idempotency layer where two identical requests should not double-charge.
- Writing CI tests that assert a prompt body did not drift across a refactor.
- Wiring
cachebenchor any cache-observability lib that needs per-callsite stable identifiers.
When this is NOT what you want
- If you want to hash an entire transcript including responses, this is the wrong shape. Hash request and response separately and combine the digests.
- If you want a similarity match (two prompts that mean the same thing in different words), you want an embedder, not a hash.
Install
[dependencies]
llm-message-hash = "0.1"
Repo: https://github.com/MukundaKatta/llm-message-hash
Sibling libraries
| Lib | Boundary | Repo |
|---|---|---|
| llm-message-hash | Canonical hash of LLM request | this repo |
| cachebench | Cache hit-ratio observability | https://github.com/MukundaKatta/cachebench |
| agentidemp-rs | Idempotency keys via UUIDv5/sha256 | https://github.com/MukundaKatta/agentidemp-rs |
| agentsnap | Snapshot tests for whole agent runs | https://github.com/MukundaKatta/AgentSnapPy |
What's next
A HashOpts::with_extra_drops(&["my_internal_field"]) builder so teams can drop their own noise fields without forking. Already prototyped, will land in v0.2.
Top comments (0)