Quazi

Posted on Mar 20

Managing Persistent State Across Claude Sessions: Lessons from 1000+ Commits

#ai #promptengineering #architecture #productivity

I'm an infrastructure engineer. For the past four months, I've been using Claude and Gemini as daily co-pilots for heavy document work, managing eight parallel projects with their own terminology, quality standards, and work-in-progress state.

It broke almost immediately.

Not the LLMs themselves. They're remarkably capable. What broke was the continuity. Every new conversation starts from zero. The terminology conventions we agreed on yesterday? Gone. The three sections I revised on Wednesday night? The model has no idea they exist. The file naming scheme we settled on after two rounds of discussion? I had to explain it again. And again.

When you have one project, you can work around this by pasting context at the top of each session. With eight active projects, each having independent glossaries, writing standards, and progress states, manual context management collapses. The overhead grows non-linearly: it's not 8× the work, it's more like 8² once you factor in cross-project dependencies and the cost of getting context wrong.

So I built a file-system-based state layer that sits between me and the LLM. I call it AnchorWorkspace. This post covers what I tried first, two patterns that made the biggest difference, and what doesn't work.

The state layer sits between you and the LLM, managing what each session needs to know.

What I tried first

Before building anything custom, I spent weeks trying to make existing tools work.

Claude Projects with system prompts were my first attempt. I wrote detailed instructions (glossaries, file conventions, workflow rules) and pinned them as project instructions. This works for static context, but projects evolve. Halfway through a document revision cycle, you've agreed on new conventions, made decisions that change the workflow, accumulated progress that the system prompt doesn't reflect. I found myself constantly editing the system prompt to keep up, and it kept growing until it consumed a significant chunk of my context window just to load.

Manually pasting context at the start of each session was the obvious fallback. "Here's where we left off: sections 1-3 are done, section 4 needs the terminology from our glossary, and please use the commit message format we agreed on." This is error-prone in all the ways you'd expect. I'd forget to mention a convention, paste an outdated version of the progress notes, or burn 2,000 tokens on context that the model might not even need for the current task.

RAG and knowledge-base retrieval can surface relevant documents, but retrieval answers the question "what documents are related to my query?", not "what is the current state of my work?" When I resume a half-finished document revision after three days, I don't just need related files. I need to know which sections are done, which reviewer comments are still unresolved, and which terminology decisions were made since I last touched it. That's a progress snapshot, not a search result.

Agent frameworks like LangChain and AutoGen are designed for decomposing a task into steps and executing them. That's a different problem. My work isn't a task that finishes. It's an ongoing, multi-month orchestration across projects that each have their own lifecycle. No framework I found offered multi-project state management, cross-session task recovery, or output quality enforcement.

What I actually needed was something that manages workflow state, not just document retrieval.

Two patterns that made the biggest difference

Over four months of iteration (and a lot of trial and error), I identified several recurring design problems and their solutions. Two of them had the most immediate, measurable impact.

Pattern 1: WAL for AI Sessions

Write-Ahead Logging is a classic database mechanism. You write the intent to a log before executing the operation, so you can recover from crashes by replaying the log. I adapted the same idea for LLM session management.

The problem: On Wednesday night I'm revising a document. I've rewritten three paragraphs in the introduction and there are two more items on the to-do list. I close the session. Thursday morning I open a new one. Where was I?

Without any state mechanism, the answer is "start over." Re-read the document, figure out what changed, reconstruct the to-do list from memory. This cost me 5-10 minutes per session just in ramp-up time, and sometimes I'd miss things.

The solution: Every project has a simple Markdown file that acts as a WAL. When I commit work, a script automatically updates this file with the current task, the last operation performed, and the next step. It looks like this:

**Status: In Progress**

| Field         | Value                                          |
|---------------|------------------------------------------------|
| Task          | Quarterly report revision, formatting pass     |
| Last action   | Rewrote sections 2.1-2.3, updated glossary     |
| Last commit   | a3f7e2b                                        |
| Next step     | Verify all cross-references in section 3       |
| Note          | Style guide updated, see docs/standards/       |

When I start a new session, the first thing the LLM does is read this file. Within about 30 seconds, it knows where we left off: which task, what was last done, what comes next. No manual briefing required.

The key insight: This file isn't a memo for me to read. It's a recovery instruction for the LLM, structured for machine consumption rather than human narrative. The format is deliberately terse and tabular because that's what language models parse most reliably.

Pattern 2: Behavioral Demotion

Can you get an LLM to reliably remember to run a linter before every commit, just by telling it to in the prompt? In my experience, no. Prompts are suggestions. The model follows them most of the time, but "most of the time" isn't good enough when the failure mode is a broken commit.

What kept going wrong: I had Claude managing git commits across multiple repositories. The prompt said: "Always check status before committing. Always include the WAL update. Always use the specified message format." For the first few sessions, compliance was near-perfect. But by the 15th session, or the 30th, or when the context window was getting full, steps would get skipped. A commit would go through without the status check. The WAL wouldn't get updated. The error rate was roughly 15%, and the errors were silent. I wouldn't notice until later.

What fixed it: I stopped asking the LLM to remember the steps and instead wrote a Python script (safe_commit.py) that wraps the entire commit workflow into a single atomic operation. The LLM calls one function. Inside that function, the code enforces every step:

# The LLM calls this. One command, no way to skip steps.
# safe_commit.py --repo project_a --message "fix: update cross-refs" --next "review section 4"

# Internally, the script enforces:
# [1/4] Update WAL file with current step + next step
# [2/4] git add . → git commit (with status verification)
# [3/4] Sync cross-project bootstrap state
# [4/4] Commit infrastructure metadata

The LLM can't skip the WAL update because it doesn't run git commit directly. It only has access to safe_commit.py, which handles everything atomically. Error rate dropped from roughly 15% to 0%. Not because the LLM got smarter, but because the path to making mistakes was removed by code.

The takeaway: If a behavior is important enough that you can't tolerate failure, it shouldn't live in the prompt. Demote it from prompt-level guidance to code-level enforcement. I think of this as a reliability spectrum. At one end, you have suggestions in natural language (unreliable but flexible); at the other, you have hard-coded logic (reliable but rigid). The art is knowing where each behavior belongs on that spectrum.

The reliability spectrum. Most LLM workflows leave too many critical behaviors at the prompt end.

The rest of the system (briefly)

The two patterns above were the highest-impact, but the full system includes three more:

Filesystem as Live Index. Instead of maintaining a static registry of available tools and templates, the LLM scans the directory structure at session start to discover what's available. When I add a new template or tool, it's automatically found next session.

Token Budget Management. The system treats the context window as a finite resource with active monitoring. It tracks how many commits have been made in a session and warns when the context is getting full, prompting a natural breakpoint rather than degraded output.

Context Capsule. A minimal state object that enables handoff between Claude and Gemini. When I want a second opinion from a different model, a structured summary of the current work state gets passed across, rather than the entire conversation history.

I won't go deeper on these here. Each one deserves its own write-up if there's interest.

Honest results and limitations

Here's what I can measure from the system's own git history and operational logs:

1,000+ git commits over 4 months of daily use
8 projects managed concurrently (documents, infrastructure configs, creative writing)
Manual commit errors: ~15% → 0% (via Behavioral Demotion)
Infrastructure token overhead: 15-20% → 5-8% of context window (via Token Budget Management)
Session recovery time: 5-10 minutes → ~30 seconds (via WAL)

Now for the part that matters more, the limitations.

This is an N=1 system. I'm the only user. Every design decision is tuned to my specific workflow, my file organization habits, my project structure. I have no evidence that any of this generalizes to other people's workflows without significant adaptation.

It's coupled to the local filesystem and git. The entire architecture assumes you're working with files on disk, versioned with git. If your workflow is cloud-native or doesn't involve version control, this approach doesn't directly apply.

The LLM still has to cooperate. The startup protocol ("read these settings files, then read the WAL, then proceed") depends on the model correctly following instructions. Most of the time it works. Sometimes the model skips a step or misinterprets a WAL entry, especially when the context window is crowded. The system is more reliable than raw prompting, but it's not bulletproof.

Maintenance is ongoing. Configuration rules, tool scripts, and directory conventions all need upkeep. This is not a set-and-forget solution. It's closer to maintaining a small piece of infrastructure, which is exactly what it is.

This is engineering, not magic. It works because I invested months tuning it to my specific needs.

What's next

I've extracted the two core components from this post into an open-source Python library called anchor-core. It includes the WAL module and the atomic commit wrapper as standalone tools you can pip install and use in your own projects. The rest of the system (settings inheritance, token budget, context capsule) is still tightly coupled to my workflow and not ready for extraction yet.

If you're also juggling multi-session work with LLMs and have built your own solutions for state continuity, I'd like to hear how you're handling it. The problem of stateless sessions versus stateful work isn't going away, and I'm sure there are approaches I haven't considered.

The stateless-session model isn't a bug. It's a design choice with good reasons: privacy, simplicity, predictability. But for sustained knowledge work, we need infrastructure patterns on top of it. AnchorWorkspace is what I've got so far.

zonzideka / anchor-core

anchor-core

中文版

Persistent state management for LLM sessions. Built from the patterns described in Managing Persistent State Across Claude Sessions.

The problem

LLM sessions are stateless. Your work isn't. Every new conversation starts from zero, but your projects have terminology, progress, and conventions that accumulate over weeks. When you manage multiple projects with LLMs, the cost of re-establishing context becomes the bottleneck.

anchor-core provides two tools to fix this:

WAL (Write-Ahead Log) for AI sessions — a task recovery file that lets new sessions pick up exactly where the last one left off, in about 30 seconds.
Atomic commit wrapper — a single command that bundles git operations with WAL updates, so the LLM can't skip steps or leave state inconsistent.

Install

pip install anchor-core

Or install from source:

git clone https://github.com/zonzideka/anchor-core.git
cd anchor-core
pip install -e .

Quick start

WAL: cross-session task recovery

from anchor_core import WAL
#

…

View on GitHub

How do you handle context continuity across LLM sessions? Drop a comment, I'd like to hear different approaches.

DEV Community