Mukunda Rao Katta

Posted on May 25

prompt-cache-key: Stable Hashes for Anthropic Prompt Cache Scopes

#hermeschallenge #ai #python #agents

Anthropic's prompt caching is a straightforward deal: if the prefix of your request matches a previous request byte-for-byte up to a cache_control breakpoint, you get a cache hit. The prefix covers the system prompt, your cached tool definitions, and the model name. If any of those bytes change, the cache goes cold. You pay to write the cache again on the next call.

I was running a pre-warm before every agent batch job to make sure the system prompt and tools were already cached before the workers started. It worked, but it was running unconditionally. Every batch start called the warm endpoint, even when nothing had changed between runs. That added latency to every job start and burned cache-write credits on a redundant write. What I wanted was to skip the warm when the scope had not changed since the last warm.

The missing piece was a way to compute, locally and without an API call, whether the cache scope was still the same. If I could hash the system prompt, tools, and model in a way that matched what Anthropic's cache considers a match, I could compare hashes before deciding whether to warm. That is prompt-cache-key.

Shape of the fix

from prompt_cache_key import cache_scope_key, ScopeOptions

SYSTEM_PROMPT = """You are a specialized research assistant.
Your job is to analyze technical documents and extract key findings.
..."""

TOOLS = [
    {"name": "search_papers", "description": "Search academic papers", ...},
    {"name": "fetch_document", "description": "Retrieve a document by URL", ...},
]

opts = ScopeOptions(model="claude-sonnet-4-6")

# Compute the current scope hash
key = cache_scope_key(
    system=SYSTEM_PROMPT,
    tools=TOOLS,
    opts=opts,
)

# Load the last known hash from your store
last_key = load_last_key()  # your code: disk, Redis, env var, etc.

if key != last_key:
    # Scope changed since last warm, warm it
    warmer.warm(system=SYSTEM_PROMPT, tools=TOOLS, model=opts.model)
    save_last_key(key)
    print(f"Cache warmed (scope changed). New key: {key[:8]}...")
else:
    print("Cache scope unchanged, skipping warm")

# System-only variant (no tools)
key_no_tools = cache_scope_key(
    system=SYSTEM_PROMPT,
    opts=ScopeOptions(model="claude-sonnet-4-6"),
)

# Custom breakpoint: if you know the exact token count where cache_control is set
opts_custom = ScopeOptions(
    model="claude-sonnet-4-6",
    cache_breakpoint_tokens=1024,
)
key_custom = cache_scope_key(system=SYSTEM_PROMPT, tools=TOOLS, opts=opts_custom)

The returned key is a 64-character lowercase hex sha256. It is deterministic: the same system prompt, tools, and model always produce the same key on any machine running any Python version that matches the library's major version.

What it does NOT do

This library only computes a hash. It does not call the Anthropic API. It does not warm the cache. It does not know whether Anthropic's actual cache is currently warm or cold: it only tells you whether your local scope has changed since the last time you recorded the key. If your process restarts and you lose the stored key, you will warm on the next run even if the Anthropic cache is still valid from the previous run.

It also does not handle multi-turn message caching: if you are caching mid-conversation turns rather than the static system prefix, the hash does not cover your conversation history. For hashing a full request including messages, use llm-message-hash-py. The two libraries are designed to be used together: prompt-cache-key for the static scope, llm-message-hash-py for the full request deduplication.

The token clip is a heuristic, not exact. It uses a 4-characters-per-token approximation. If your system prompt is close to the breakpoint boundary and precision matters, pass cache_breakpoint_tokens explicitly from an exact token count you got from a previous API response.

Inside the lib

The system prompt clip is the first operation. The cache_control breakpoint defines how many tokens of the prefix Anthropic caches. The library multiplies cache_breakpoint_tokens by 4 (the character-per-token heuristic) and clips the system string at that character position. The clip is done before anything else so the hash covers only the cached portion of the system prompt, not the full string.

After clipping, tools are canonicalized using the same recursive key-sort as llm-message-hash-py: every dict at any depth in the tools list has its keys sorted alphabetically. This makes the tools hash stable regardless of how the tool dicts were constructed. Lists inside tool definitions are left in order because list order in tool definitions (for example, enum values) is semantically meaningful.

The clipped system prompt bytes, the canonical tools JSON bytes, the model string, and separator bytes are concatenated and fed into hashlib.sha256. The separator bytes ensure that system="AB" + model="C" produces a different hash than system="A" + model="BC". The hex digest is returned.

ScopeOptions is a simple dataclass with two fields: model: str (required) and cache_breakpoint_tokens: int (optional, defaults to 2048, which covers most agent system prompts without clipping). You can subclass it if you need to incorporate additional scope fields.

20 tests cover: system-only hashing, system plus tools, model string changes, token clip boundary at different positions, empty tools list, tools with nested structures, hash stability across multiple calls with the same input, and separator-byte collision prevention.

When useful

Pre-warm guard: skip the cache warm call when the scope has not changed, reducing latency and cache-write credits at job start
Deploy gates: compare the scope key before and after a deploy to know in advance whether the cache will be cold when workers start
Multi-node batch jobs: share the key through a common store so only one node warms the cache while the others check the hash and skip
Monitoring and alerting: log the scope key on each job run and alert when it changes unexpectedly, which may indicate a system prompt drift or unintended tool change
Testing: pin the scope key in a snapshot test to catch accidental system prompt changes during development

When not useful

Non-Anthropic providers where prompt caching works differently or is not a feature
Situations where you need byte-perfect clip at the exact token boundary rather than a character approximation
Full-request deduplication that must include the conversation messages (use llm-message-hash-py for that)
Real-time cache hit rate monitoring across many calls (use cachebench for that, which instruments at the API call level)

Install

pip install prompt-cache-key

Zero dependencies. Python 3.9+. PyPI publish is pending (new project 429 cooldown on the registry side).

Siblings

Library	Language	What it does
prompt-cache-warmer	Python	Warms the Anthropic prompt cache by making a minimal API call
prompt-cache-warmer-rs	Rust	Rust port of the cache warmer, crates.io v0.1.0
llm-message-hash-py	Python	Hashes the full LLM request structure including messages
cachebench	Python	Prompt-cache observability: hit rate, write credits, token savings
agentidemp-py	Python	Idempotency keys for agent API calls using a similar hash approach

What's next

I want to add a tools_hash helper that hashes just the tools list independently of the system prompt. When a cache miss happens, knowing whether it was the system prompt or the tools that changed helps you trace the root cause faster. A CLI wrapper for prompt-cache-key check --system-file system.txt --tools-file tools.json would let you run this comparison from a shell script or a deploy pipeline without writing Python.

Part of the Hermes Agent Challenge sprint. Source at github.com/MukundaKatta/prompt-cache-key. PyPI: pip install prompt-cache-key (publish pending).

DEV Community