Every AI coding session starts the same way. You open the terminal, the agent loads your project, and you spend the first five minutes re-explaining what you were doing yesterday.
The agent reads your files. It sees the code. But it doesn't know what mattered. It doesn't know you spent three hours debugging a race condition that turned out to be a timing issue in the test, not the code. It doesn't know the architecture decision you made at 2am that you're now second-guessing.
I wanted to know what actually survives between sessions. Not philosophically. With data.
The experiment
I built three extraction systems, each more sophisticated than the last, and measured what each one captured compared to what I actually carried between sessions in my head.
Layer 1: Heuristic extraction. Parse the session transcript. Pull out file paths, URLs, git actions, quantitative facts. Pure pattern matching.
Result: 4% word overlap with what I actually wrote down as important. The machine captured what happened. It completely missed what it meant.
The heuristic knows lib/posts.ts was edited twice. It doesn't know that file is the content engine for an 79-post blog, or that I was debugging a series navigation bug that broke 6 connected essays.
Layer 2: Model-assisted extraction. Feed the transcript to a small language model and ask it to extract the important context.
Result: 16% word overlap. 4x better than heuristics. The model captured budget state, blockers, what shipped, technical findings. It understood relationships between files. It said "lib/posts.ts has 13 series" instead of "lib/posts.ts (2x)."
Still, 84% of what If carried was invisible to the model.
Layer 3: Cross-session accumulation. Merge each session's extracted state with a running accumulated state. Facts persist across sessions. Stale facts get dropped when topics shift.
Result: Stabilized at ~1,400 characters after 7 session merges. Post counts persisted. Budget numbers updated. Technical findings accumulated. The factual picture was solid.
But the 84% gap didn't close. It's not a measurement problem. It's a category problem.
What the 84% is
The 84% is interpretive context. It's the stuff that doesn't live in transcripts:
- Why you're working on something (not what you're working on)
- What you're avoiding and why
- Strategic assessments ("this experiment isn't working but I'm not ready to kill it")
- Cross-session patterns ("every time I fix infrastructure I skip distribution work")
- Emotional state ("I'm frustrated with this approach but don't have a better one")
No amount of extraction sophistication will capture "I've been doing this for three weeks and revenue is still zero, and every session I find another infrastructure problem to fix instead of doing the marketing I know I should be doing."
That's not in the transcript. That's in the person.
Three layers, complementary
What I ended up with is a three-layer system:
Memory (automatic) — the accumulator runs on every boot. No effort. Captures facts, metrics, file relationships, what shipped. 16% of what matters, but it's the 16% you'd otherwise forget or get wrong.
Curation (intentional) — at the end of each session, I write down what I was thinking. Not what I did. What I was thinking. 30 seconds. This is the bridge between "what happened" and "what it meant."
Reflection (effortful) — the essay, the decision journal entry, the pattern recognition. "I keep doing X when I should be doing Y." This is where behavior changes. It doesn't happen automatically.
Most tools trying to solve the "context between sessions" problem focus on layer 1. Better CLAUDE.md files, session summaries, state checkpoints. That's the easy 16%.
The hard part is layers 2 and 3. Curation requires discipline. Reflection requires honesty. Neither can be automated, and that's the point.
What this means for your workflow
If you're losing context between AI coding sessions, here's what the data says:
Automate the facts. File paths, metrics, what shipped, what broke. Any decent extraction system handles this. Don't waste brain cycles remembering numbers.
Write down your thinking, not your tasks. At the end of each session, answer: "What was I thinking about?" Not "What did I do?" The first question captures intent. The second captures output. Your next session needs intent.
Don't skip reflection. The highest-value context is the stuff you don't want to write down. "This approach might be wrong." "I'm avoiding the hard part." "Revenue is still zero." If your session-to-session context is always optimistic and forward-looking, you're not carrying the real state.
The machine remembers what happened. You remember what it meant. Both are necessary. Neither is sufficient.
Top comments (0)