If you use Claude Code seriously, you've felt this: the model is brilliant, but the session fights you. It forgets what it did last week. It edits the wrong file. It declares a broken build "done" and moves on. You end up re-explaining your conventions every single morning.
Here's what took me a while to accept: the model was never the bottleneck. The harness around it was. A great model with a sloppy harness is a junior dev with amnesia.
So I stopped tweaking prompts and started building a system. This is the actual setup I run every session — not theory, the real thing.
1. CLAUDE.md is a contract, not a wish list
Most people's CLAUDE.md is a pile of polite suggestions the agent ignores under pressure. Mine reads like a contract with non-negotiables. A few of the rules that earn their place:
## Response style
- Answer first, explain only if asked. No filler.
## Decision forks
- At a real choice point: stop, present options, wait. Don't guess and proceed.
## Done means done
- Never say a task is complete without verifying it. Run it, test it, or
show the output. If you can't verify, say so explicitly.
That last one — "done means done" — is the highest-leverage line in the whole file. But a rule in a doc is only a suggestion until something enforces it. That's where hooks come in.
2. The one hook that changed everything: verify-before-done
Claude Code lets you run a hook when the agent tries to end its turn (a "Stop" hook). I use one that does a simple, brutal check: if the agent modified code and claims it works, but ran no test/build/command that turn, block the stop and make it actually verify.
The logic is basically:
on Stop:
if changed_code_this_turn and claimed_done and not ran_verification:
block("You said it works but verified nothing. Run the feature,
test it, or state plainly what you couldn't verify.")
It's a few dozen lines, but it killed almost all of my "it said it worked and it didn't" pain in one move. The agent can no longer hand me a green checkmark it didn't earn.
3. Model routing: cheap work to cheap models
Subagents inherit the main model unless you say otherwise — which means a top-tier model can quietly end up doing grep sweeps and log-summarizing at premium prices. I route by task: recall/inventory/summarization → a cheap fast model, implementation/tests → a mid model, deep review/diagnosis → the strong one. A small pre-spawn hook auto-pins the tier so I don't have to remember. Same quality on the work that matters, a fraction of the spend on the work that doesn't.
4. Persistent memory + a knowledge graph
The agent writes durable facts to a small file-based memory (one fact per file, with a one-line index it loads each session) and I keep a knowledge graph over my notes. The payoff: it stops re-researching my own past work. Before it builds or researches something, a recall step checks "have we already done this?" — and usually we have.
5. Pre-build recall: don't rebuild what exists
Tied to the memory above: a check on each build/research request that surfaces prior work first. The number of times this has caught me about to rebuild something I finished three weeks ago is genuinely embarrassing.
The point
None of these are clever individually. Together they turn the agent from an amnesiac genius into something that behaves like a senior engineer who remembers the project and refuses to lie about whether the build is green.
If you want a running start, I packaged the two highest-leverage pieces — the CLAUDE.md contract and the verify-before-done guardrail — into a small free starter kit. It's drop-in: copy the files, restart, done in about ten minutes. It's pay-what-you-want with a $0 option, no email wall to read it:
→ Free Claude Code Starter Kit: https://expressive446.gumroad.com/l/qlypgs
Built and run by someone who ships with this stack daily. Happy to answer setup questions in the comments.
Claude is a trademark of Anthropic PBC. This is an independent setup guide and is not affiliated with, endorsed by, or sponsored by Anthropic.
Top comments (0)