"Your Repo Is the Memory. Your Model Is the Worker."

#claudecode #ai #productivity #webdev

There's a framing I keep coming back to. The model doesn't hold state — the repo does. The model is the worker. The repo is the memory. Once that clicks, a bunch of decisions about how to structure agentic work become obvious.

I've been running Claude Code nearly every day. My token breakdown: cacheRead 72%, input 0.3%, output the rest. That 72% number isn't an accident. It's what happens when you treat the repo as the ground truth and stop asking the model to reconstruct context from scratch.

The basic principle

The model forgets. Not sometimes — always, between sessions, and often within them when context gets long. If your workflow relies on the model remembering what it decided two hours ago, you're building on sand.

The repo doesn't forget. A spec file, a requirements doc, a TODO list with completed items marked — these are durable. The model can read them fresh every time and pick up exactly where things are.

So the workflow becomes: write everything down in the repo, have the model read before it acts, have the model write its decisions back.

What "frozen spec" means in practice

The spec file is the source of truth. It's not a suggestion. When I start a work session I don't say "continue where we left off." I say "read the spec, read the current state of the repo, tell me what's left."

The model reads. The model acts on what it reads. The model marks items done only after I've verified they work — not when it thinks it's done, not when it says it's done. After I've confirmed.

This sounds slow. It's the opposite. I've burned hours debugging half-finished features because a previous session "completed" something that wasn't complete. The [●] only goes in after verification. Every time I've skipped that, I've paid for it.

Why cacheRead 72% matters

When the model reads the same files repeatedly across a session — the spec, the main source files, the test results — those reads get served from cache. The cost difference between input tokens and cached reads is significant. More importantly, the latency drops. Reads that would have taken seconds are near-instant.

The implication: structure your repo so the model reads the same files often. One clean spec file is better than scattered notes. One consolidated state file is better than comments spread across six modules. Predictable read patterns = high cache hit rate = faster, cheaper sessions.

My 0.3% input figure is low because I'm not pasting long context into prompts. The repo provides the context. The prompt is just the directive.

What breaks this pattern

Two things reliably break it.

First: specs that change without the model knowing. If I update the requirements doc mid-session without telling the model to re-read it, the model is now operating on stale information it thinks is current. The fix is trivial — tell it to read the spec again — but easy to forget.

Second: verification that isn't real verification. "Does this look right?" is not verification. Running the tests and reading the output is verification. Clicking through the UI and confirming behavior is verification. The [●] gate only holds if the criterion is external and objective.

The reviewer pattern

One extension of this I've been using: a second model pass that's read-only. After the main agent makes changes, a reviewer agent reads the diff and the spec together, with no ability to edit files. It's surprising how often the reviewer catches things the main agent waved through — not because the main agent is bad, but because the same agent that wrote the code has already mentally "completed" the task and reads its own output charitably.

The reviewer has no such prior. It reads what's there.

What this changes about how I write specs

Short and precise beats long and thorough. A 200-line spec that's been carefully maintained is more useful than a 2,000-line doc that's 70% stale. The model will read the whole thing every time; every line of noise is a tax.

I also stopped writing "what" specs and started writing "what + acceptance criterion" specs. Not "implement pagination" but "implement pagination — verified when /api/posts?page=2 returns the correct slice and the total count header is accurate." The acceptance criterion is what tells me when to mark [●].

The limit

This framing doesn't fix bad judgment. If the spec is wrong, the model faithfully executes the wrong thing. If the acceptance criterion is too easy to satisfy, you get implementations that pass the test and miss the intent.

The model is a good worker. It does what it's told, reads what you give it, marks things done when you say so. Whether the work was the right work — that part is still yours.

Written by **greymoth. I build developer tools and write about where software quietly breaks — Japanese/CJK edge cases, i18n, the boring infra nobody checks. → *glovrex.com** · github.com/greymoth-jp*