Caching LLM responses is just content addressing

#go #llm #performance #cli

An LLM review costs money and a few seconds of latency. Reviewing the same diff twice should cost neither. CommitBrief caches every review, but the interesting part isn't that it caches — it's that the cache is content-addressed, so a hit is provably the same review, and there is no such thing as a stale one. Editing a single line of your rules file invalidates exactly the entries it should, and not one more, with zero invalidation logic anywhere in the code.

TL;DR

The cache key is a SHA-256 of everything that determines the answer: the diff, the full system prompt, the provider, the model, the language, and a schema version.
A hit is a disk read — no tokens, no cost, and the cost preflight is skipped entirely.
Invalidation is emergent. Change an input, the key changes, the old entry is never looked up again. Nobody writes "clear cache on rules edit."
One file per response, written atomically, bounded by size with oldest-first eviction.

The key is the whole design

Everything good about this cache falls out of one function. Here it is, complete:

func Compute(args ComputeArgs) string {
    h := sha256.New()
    h.Write([]byte(args.Diff))
    h.Write([]byte("::"))
    h.Write([]byte(args.SystemPrompt))
    h.Write([]byte("::"))
    h.Write([]byte(args.Provider))
    h.Write([]byte(":"))
    h.Write([]byte(args.Model))
    h.Write([]byte(":"))
    h.Write([]byte(args.Lang))
    h.Write([]byte(":"))
    h.Write([]byte(strconv.Itoa(SchemaVersion)))
    if args.WithContext {
        h.Write([]byte(":ctx"))
    }
    if args.Mode != "" {
        h.Write([]byte(":mode:" + args.Mode))
    }
    return hex.EncodeToString(h.Sum(nil))
}

Each input is in the key because each one can change the output:

Diff — the obvious one. A different change is a different review.
SystemPrompt — the fully assembled prompt: your COMMITBRIEF.md rules, the severity rubric, the response-format contract, and any architecture constraints. This is the load-bearing one for invalidation, below.
Provider, Model, Lang — Claude and a local qwen don't return the same findings, and a Turkish review isn't an English one.
SchemaVersion — a constant 1. Bump it and every existing entry stops matching at once, without a migration or touching disk.

The two trailing markers are a lesson in not breaking your own cache. :ctx and :mode: are appended only when set. A plain review writes neither, so its key is byte-identical to what the same review produced three versions ago — adding the --with-context feature and the commit mode didn't invalidate anybody's existing cache. New behavior gets new key-space; unchanged behavior keeps its old keys. That discipline is why upgrades don't silently nuke everyone's cache on the first run.

A hit is a disk read

Lookup is a file read, an unmarshal, and two guards:

func (c *Cache) Get(key string) (Entry, bool) {
    path := c.entryPath(key)
    data, err := os.ReadFile(path)
    if err != nil {
        return Entry{}, false
    }
    var e Entry
    if err := json.Unmarshal(data, &e); err != nil {
        _ = os.Remove(path) // corrupt entry: drop it, next write replaces
        return Entry{}, false
    }
    if e.Version != SchemaVersion {
        return Entry{}, false
    }
    if e.ExpiredAt(c.now()) {
        return Entry{}, false
    }
    return e, true
}

No network, no tokens. And because the lookup happens before the cost preflight in the pipeline, a hit skips the cost estimate altogether — there's nothing to estimate when you're not calling anyone. On an unchanged diff, a re-run is effectively instant and free.

Invalidation you never write

This is the payoff. There is no invalidateCacheAfterEditingRules() anywhere in the codebase, because it would be dead code. The system prompt is in the key, and your rules are in the system prompt. So the moment you change one line of COMMITBRIEF.md, the assembled prompt's bytes change, its SHA-256 changes, and the old entry's key is one nobody will ever compute again. The stale review isn't deleted — it's unreachable, and the next review writes a fresh entry under the new key.

Content addressing means a cache hit is, by construction, a review produced from byte-identical inputs. There's no heuristic deciding whether a cached answer is "still valid," because validity isn't a question you can ask of a content-addressed store — the inputs either hash to the same key or they don't.

The entry, written so a crash can't corrupt it

A cache entry is one JSON file per response:

type Entry struct {
    Version   int       `json:"version"`
    CreatedAt time.Time `json:"created_at"`
    TTL       int64     `json:"ttl"`
    Key       KeyMeta   `json:"key"`
    Result    Result    `json:"result"`
}

Result carries a Format marker — json, markdown-fallback, or plain-text — so a degraded review (post 3) or a CLI provider's pre-formatted output replays down exactly the right renderer path, with no warning re-emitted on a cache hit. Writes are atomic: serialize to a temp file, then rename into place.

tmp := path + ".tmp"
if err := os.WriteFile(tmp, data, 0o600); err != nil {
    return err
}
return os.Rename(tmp, path)

os.Rename is atomic on a POSIX filesystem, so a crash mid-write leaves a .tmp file, never a half-written entry that would later unmarshal into garbage. Mode 0600 keeps the cached review readable only by you. And the first successful write appends .commitbrief/ to the repo's .gitignore, so your cache never lands in a commit.

Bounded, and prunable

Left alone, the cache grows. Two mechanisms keep it in check. If cache.max_size_mb is set, an eviction sweep runs after each write — oldest-first by CreatedAt (file mtime as fallback) — until the total fits, and the just-written entry is always protected, so a single review larger than the budget still survives the write that created it. Entries also carry a TTL, defaulting to seven days. And you can prune by hand:

commitbrief cache stats                          # count, size, age, per-provider
commitbrief cache prune --keep-last 500 --older-than 7d
commitbrief cache inspect <key> --show-content   # one entry's metadata + body

prune keeps an entry only if it's inside both windows — among the newest 500 and younger than seven days.

Where it pays off

When CommitBrief does call a provider, a cost preflight runs first: it estimates input tokens at roughly four characters each, guesses output conservatively (floored at 200 tokens, capped at 1500 — a structured review rarely runs longer), multiplies by the model's price table, and prompts only if the estimate clears your threshold (cost.warn_threshold_usd, default $0.50). A cache hit skips that whole machine. On a paid provider, the second review of an unchanged diff costs literally nothing; on a local Ollama model (post 5) it was already free, but the cache still saves you the inference seconds.

What it is not

A cache hit replays the first answer verbatim — including its mistakes. The cache makes a re-run free; it does not make it better. If the model missed something the first time, the cached entry will keep missing it until the inputs change or you force a fresh call with --no-cache. And the store is deliberately repo-local: .commitbrief/cache/ on your machine, never a shared team server, because there isn't one — the same local-first stance that runs through everything else. The cache saves you tokens and time; it doesn't pretend to be a source of truth.

Repo: github.com/CommitBrief/commitbrief.

Part 6 of **Building CommitBrief. Next: exposing the whole review pipeline as a Model Context Protocol tool — JSON-RPC over stdio, in standard-library Go, with zero new dependencies.