Two files, one discipline, and a measured 10-13% of my Claude Code budget.
A while back, mid-session with Claude Code, I typed a pushback in the kind of broken English you only produce past midnight:
"are we using full netflix level doc uodsyed as ws go here ?"
What I meant: are we updating documentation at full Netflix-documentary depth as we go, or are we doing the lazy version that just records what changed without why? Claude correctly inferred the Netflix version. From that point forward the documentation standard for every one of my projects was set.
That session became the basis for what I now call paper-trail: a portable ruleset that makes Claude Code (and any other AI coding agent that respects CLAUDE.md) write documentation at documentary depth instead of git-log depth.
This post is about why that matters more when an AI is doing most of the typing, and what the discipline actually looks like in practice.
The reasoning path is what AI loses
Most documentation captures the final state. The README says what the system does. The CHANGELOG says what version shipped. The commit message says what file changed.
What disappears at session end:
- The three alternatives you considered before picking option C
- The operator pushback that killed your original design
- The verification log that convinced you the fix worked
- The false start at 11pm that explains the weird workaround at line 240
- The dependency you didn't realize existed until something broke
When you're writing code yourself, this knowledge lives in your head, badly, for about a week. After that it's gone.
When an AI agent is doing most of the typing, the gap gets worse. The agent has zero memory of the rejected alternatives. Six months later it confidently suggests a fix you already turned down. There is no record of why you turned it down.
The reasoning path is what makes future debugging possible. AI makes it more valuable and more fragile at the same time.
Two files, one discipline
The structure is simple.
CHANGES.md at the repo root. Chronological log, newest entry on top. Updated as work happens, not after. Each entry covers what changed, why, what was decided, what was rejected, how it was verified, and what's still outstanding.
docs/narrative/<YYYY-MM-DD>-<topic>.md for the bigger arcs. Migrations, incidents, rewrites, source onboarding. Starting state, trigger, decisions, rejected alternatives, phases, verification, final state, what's unblocked.
CHANGES.md is the index. docs/narrative is the story.
Both are plain markdown. Both get committed. Both are designed to be grep-able by your tools and your future agent.
A real CHANGES.md entry
Here's the entry from a Music sync resurrection a few weeks ago (anonymized identifiers, real structure):
2026-05-16: Music sync row restored in iOS Settings view
After the backend consolidation on 2026-05-04, the iOS Settings view lost the row that exposed Music sync to users. The sync pipeline itself was intact in the backend; only the toggle had been removed during the cleanup.
Restored via 6 lines in App/Views/SettingsView.swift, adding the row back under "Data Sources." TestFlight build 47 ships the restored row. Verified end-to-end by pulling a fresh sync from the device and confirming the delivery UUID 8b4f2a9c-7d15-4e83-9bcd-12fa8e5c61d4 landed in the backend.
Decided: restore the row as-is rather than redesign the Settings view (the consolidation rationale doesn't apply to this row).
Rejected: moving Music to a dedicated "Media" section. Too much surface area to redesign for one source.
Outstanding: wire the new Qwen commit a3f2c8e91 for next week's audio path.
commit e74b2c1
One paragraph plus a six-line code block plus four metadata lines. Names the dormant pipeline, the build that shipped, the cross-repo dependency, the rejected alternative.
That's the index entry. The narrative doc tells the story.
The same event as a narrative doc
Title: "The Settings Row That Brought Music Back" at docs/narrative/2026-05-16-music-resurrection.md.
Sections:
- The Trigger: what made us notice Music sync was dark (a test query returned zero rows from a source that should have been daily)
- The Diff: what the original consolidation actually removed, with the line numbers
- What Almost Happened: the redesign-the-whole-view path I considered before realizing six lines of Swift was the answer
- Verification: the delivery UUID that proved the path was wired back end-to-end
- What's Unblocked: the audio path work that depended on Music being live
It reads like a documentary episode. Tradeoffs, false starts, operator decisions, verification numbers. Anyone (including a future me, including a future agent) can reconstruct the reasoning path from the doc alone.
Day-one install
The whole thing is at github.com/niclydon/paper-trail. MIT-licensed, drop-in.
- Copy
DOCUMENTARY_STYLE_DOCUMENTATION.mdinto your project's root (e.g.~/projects/). - In your top-level
CLAUDE.md, add@DOCUMENTARY_STYLE_DOCUMENTATION.md. - In each project's
CLAUDE.md, paste the per-project boilerplate fromtemplates/per-project-boilerplate.md. - Create an empty
CHANGES.mdat each project root. - For the first non-trivial migration or incident, create a narrative doc using the skeleton.
Claude will start appending CHANGES.md entries on its next session in that tree.
There's also a -LITE variant of the ruleset (~75% smaller, same discipline) for sub-agents or context-tight sessions.
What it costs
The honest answer requires two measurements.
The first: when I check /status in Claude Code, my /narrative-docs-update slash command shows up at about 9% of my weekly Claude Pro Max plan usage. That's the cleanly attributable cost. Every time I deliberately invoke the skill to write or update a narrative doc, it adds to that bucket.
The second is harder to measure. CHANGES.md appends happen inline during regular sessions, not as a separate skill invocation. They blend into general usage and don't show up as a line item in /status. The only way to measure them is to look at the content itself.
So I ran the math. Across 158,000 Claude Code messages from 1,737 sessions over the last 30 days, I summed the character count of all assistant output that referenced CHANGES.md, docs/narrative/, or docs/migrations/ paths. The result: 1.16 million characters out of 9.2 million total. 12.6% of Claude Code's written output over 30 days went to documentation work.
The two measurements converge. 9% is the floor, cleanly counted from a dedicated skill. 12.6% is the broader signal that catches inline doc work too. Call it 10-13% of Claude Code output.
The part that surprised me: the discipline isn't applied uniformly. Only 11.5% of my sessions involved documentation work at all. The other 88.5% never touched a CHANGES.md or narrative doc. They're quick queries, exploration, one-offs.
Where documentation work shows up is in the substantial sessions. The ones where I actually built or migrated or debugged something worth recording. Doc-meaningful sessions average about 50,000 characters of assistant output. No-docs sessions average about 850. Documentation effort scales with work effort, which is what you want.
Three things make 10-13% the easiest spend in my Claude Code plan:
The output is durable. The other ~88% of Claude's output is ephemeral chat that disappears when the session ends. That ~12% is markdown files that persist, get committed, and become referenceable.
The agent doesn't remember what it built. Without these docs, the next session has no idea what was rejected, why, or with what verification. Reconstructing reasoning later costs more than recording it now.
A recent debug story made the case concrete. A few weeks ago an iOS pipeline went dark after a backend consolidation. The
CHANGES.mdentry from the original consolidation told me exactly which row had been removed from the Settings view and why. Without that record I'd have spent an hour trace-debugging. With it: six lines of Swift to restore.
The cost of the discipline is small. The cost of skipping it shows up when you need the record and it isn't there.
The payoff
Two files, one discipline, ten-to-thirteen percent of my Claude Code budget. In exchange: a searchable record of every non-trivial decision, the rejection rationale for the alternatives, and verification numbers that survive every session reset.
AI doesn't remove the need for documentation. It makes the reasoning path both more valuable (because the agent does more of the typing) and more fragile (because the agent forgets everything when the session ends).
If you're going to let an AI write most of your code, give it (and yourself) a paper trail.
Drop-in repo: github.com/niclydon/paper-trail. MIT-licensed.

Top comments (0)