Mukunda Rao Katta

Posted on May 25

I needed a stable cache key for LLM requests. The hard part was the input list order.

#hermeschallenge #ai #llm #rust

I started caching LLM responses last month. The cache key was the obvious thing. Hash the request body. Look up. Hit or miss.

It missed every time.

Three hours later I had the answer. Python dict iteration order is stable inside one process, but the JSON I was hashing came back from json.dumps with whatever the previous library left in memory. Two requests with identical content produced two different byte strings because the messages array had role before content once and content before role the other time. Two different hashes. Two different cache entries. Zero hits.

This is the kind of bug that does not show up in any test you write before shipping.

llm-message-hash is the small Rust crate I wrote so I never have to debug it again. It takes a serde_json::Value (or anything that serializes to one), canonicalizes it according to a configurable HashOpts, and returns a stable sha256-hex. The same logical request always produces the same hash. The crate is on crates.io as llm-message-hash.

The problem

LLM SDK request bodies are JSON objects with a fixed shape. model, messages, tools, max_tokens, temperature, sometimes system. The shape is stable. What changes is everything around the edges.

Per-call IDs (request_id, idempotency_key)
Auth headers and metadata (user, session_id)
Provider-specific telemetry fields that show up in some SDK versions and not others
Object key order, which is technically not part of JSON's contract
Top-level wrappers some SDKs add (raw_response, event, data)

If you hash the raw request body, all of that noise shows up in your cache key. Hit ratios drop to near zero in production, and you cannot tell why without printing both byte strings side by side.

The shape of the fix

You need a hash that says "same logical request" regardless of the noise. That means three things.

Walk the JSON recursively. Sort every object's keys alphabetically. Lists keep their order because element order is semantically meaningful (a messages array has turns in order).
Drop a configurable set of noise fields before hashing.
Run sha256 over the canonicalized bytes.

The crate's full surface is small.

use llm_message_hash::{HashOpts, hash};
use serde_json::json;

let req = json!({
    "model": "claude-sonnet-4-7",
    "messages": [
        {"role": "user", "content": "Hello"},
    ],
    "metadata": {"request_id": "req-abc"},
    "max_tokens": 1024,
});

// Same request, key order shuffled.
let req_shuffled = json!({
    "max_tokens": 1024,
    "messages": [
        {"content": "Hello", "role": "user"},
    ],
    "metadata": {"request_id": "req-xyz"},
    "model": "claude-sonnet-4-7",
});

let opts = HashOpts::anthropic();
assert_eq!(hash(&req, &opts).unwrap(), hash(&req_shuffled, &opts).unwrap());

HashOpts::anthropic() is a preset that drops the noise fields Anthropic adds that should not affect cache identity (metadata, top-level request_id, certain headers). The crate ships presets for OpenAI, Bedrock, and Gemini, plus a HashOpts::default() that drops nothing.

What it does NOT do

It does not hash response bodies. Responses are not cache keys.
It does not validate that your request is a valid LLM request. If you hand it garbage, you get a stable hash of garbage.
It does not handle streaming. If you are caching streamed responses, materialize them first.
It does not deal with prompt-cache breakpoints. That is a separate concern (see cachebench).

Inside the lib: one design choice worth showing

The hard part of canonicalizing JSON is not the recursion. It is deciding what counts as "noise."

Some fields are clearly noise. A request ID is freshly minted on every call, so dropping it is obvious. Some fields are clearly content. A user message is the whole point.

The hard cases are in between. temperature looks like content because it influences the model. But if a caller sets temperature: 0.7 once and temperature: 0.70000001 the next time because of floating point round-tripping, you want those to hash the same.

The crate's answer is HashOpts::number_precision: Option<u32>. If set, numbers are normalized to N significant digits before hashing. If not set, numbers are byte-compared. Default is unset, because in practice the float-roundtrip case is rare and silent equality is worse than a cache miss.

let opts = HashOpts::anthropic().with_number_precision(6);

Six significant digits matches what most JSON pretty-printers produce. Beyond that you are caching round-off, which probably is not what you want.

When this is useful

Building a prompt-cache layer over Anthropic, OpenAI, Bedrock, or Gemini and want stable keys.
Building an idempotency layer where two identical requests should not double-charge.
Writing CI tests that assert a prompt body did not drift across a refactor.
Wiring cachebench or any cache-observability lib that needs per-callsite stable identifiers.

When this is NOT what you want

If you want to hash an entire transcript including responses, this is the wrong shape. Hash request and response separately and combine the digests.
If you want a similarity match (two prompts that mean the same thing in different words), you want an embedder, not a hash.

Install

[dependencies]
llm-message-hash = "0.1"

Repo: https://github.com/MukundaKatta/llm-message-hash

Sibling libraries

Lib	Boundary	Repo
llm-message-hash	Canonical hash of LLM request	this repo
cachebench	Cache hit-ratio observability	https://github.com/MukundaKatta/cachebench
agentidemp-rs	Idempotency keys via UUIDv5/sha256	https://github.com/MukundaKatta/agentidemp-rs
agentsnap	Snapshot tests for whole agent runs	https://github.com/MukundaKatta/AgentSnapPy

What's next

A HashOpts::with_extra_drops(&["my_internal_field"]) builder so teams can drop their own noise fields without forking. Already prototyped, will land in v0.2.

DEV Community