yureki_lab

Posted on Jul 5

How I Gave Claude Code a Long-Term Memory: 5 Lessons from State Persistence

#ai #claudecode #productivity #softwareengineering

TL;DR

I run a fully autonomous implementation system built on Claude Code that works on my projects around the clock. The single biggest upgrade I ever made to it wasn't a smarter prompt or a better model — it was giving it a file-based long-term memory. This post covers the state persistence design that turned my agent from a goldfish into a coworker, and the 5 lessons I learned getting there. 💡

The Problem

Every Claude Code session starts with total amnesia.

That's fine when you are the memory. You sit there, you remember that yesterday you decided to use SQLite instead of Postgres, you remember that the flaky test is a known issue, and you steer accordingly.

It falls apart the moment you take the human out of the loop. My system launches Claude Code sessions on a schedule, with nobody watching. Early on, the transcript of any given night looked like this:

Night 1: Agent investigates a failing build, figures out the root cause is a version mismatch, patches it. Great.
Night 2: New session, zero context. Agent sees a related warning, re-investigates the same dependency tree from scratch, and "fixes" it a second time — differently. Now I have two conflicting fixes.
Night 3: Agent reverts night 2's change because it looks wrong. Which it is. But so was doing the work three times.

The model wasn't dumb. It was stateless. Every session re-derived the world from the repo, and anything that wasn't in the repo — decisions, dead ends, "we tried this and it broke" — evaporated at session end.

The constraint that made this interesting: I wanted persistence without adding infrastructure. No vector database, no memory service, no external API. The whole system runs on a single Mac mini (macOS 15, Claude Code, launchd for scheduling), and I wanted memory to survive anything short of disk failure.

How I Solved It

The answer turned out to be embarrassingly boring: Markdown files, read at session start, written at decision time.

The three-file core

My persistence layer is built around three files per project, each with exactly one job:

project/
├── CLAUDE.md        # HOW to work: rules, conventions, scope (rarely changes)
├── state/
│   ├── memo.md      # WHERE we are: current status, next action (changes every session)
│   └── tasks.md     # WHAT remains: task list with a status per item
└── decisions.md     # WHY we chose things: append-only decision log

The separation matters more than the format:

The project spec file (CLAUDE.md) holds durable rules — coding conventions, what the agent must never touch, how to report results. Claude Code loads it automatically at session start, which makes it the natural home for anything that should apply to every session forever.
The state memo is the handoff note. It answers one question: if a new session starts right now with zero context, what does it need to know to continue? Current phase, last completed step, next step, any active weirdness.
The task file is a checklist with statuses (todo / doing / done / blocked). Sessions pick up the first non-done item instead of inventing work.
The decision log is append-only. Every entry gets an ID (D-001, D-002, …), a date, the decision, and — critically — the reason. You never edit old entries; you supersede them with new ones.

The write rules

Files alone do nothing. The behavior comes from two rules baked into the agent's instructions:

Rule 1: Read before work. Every session starts by reading the spec file, the state memo, and the task file, in that order, before touching any code. Non-negotiable. A session that skips this is a session that repeats last week's mistakes.

Rule 2: Write at decision time, not session end. My first version said "update the state memo before you finish." Terrible idea. Sessions die — context fills up, the process gets killed, the machine reboots. Anything buffered for "the end" is exactly what gets lost. Now the rule is: the moment something is decided or discovered, it gets written. The state memo is updated mid-session, every time reality changes.

flowchart LR
    A[Session start] --> B[Read spec + memo + tasks]
    B --> C[Do work]
    C --> D{Decision or discovery?}
    D -- yes --> E[Write memo / decision log immediately]
    E --> C
    D -- no --> C
    C --> F[Session ends - even abruptly]
    F --> G[Next session starts with full context]

The conflict rule

Once you have multiple files, they will contradict each other. An old strategy note says "we're doing X", the state memo says "we pivoted to Y". An agent without a tie-breaker will pick whichever it read last — or worse, try to satisfy both.

So there's an explicit priority order written into the spec file:

state memo  >  strategy doc  >  project spec  >  decision log  >  everything else

The state memo — the most recently updated, most operational file — always wins. The decision log is deliberately near the bottom: it's history, and history explains the present but doesn't override it. When the agent changes course, it overwrites the current-state file and appends a new decision entry, so the old reasoning survives without ever being mistaken for the current plan.

What it looks like in practice

A real (lightly genericized) state memo from my system:

# State memo — updated 2026-07-04 23:10

## Current phase
Trial run of the nightly build-repair loop (day 12 of 14).

## Last completed
Migrated scheduled jobs to the new machine; all paths rewritten.
Verified one full unattended cycle end-to-end.

## Next action
Watch tonight's run. If the headless timeout recurs, apply D-029
(switch to the local integration instead of the cloud connector).

## Active issues
- Cloud connector times out under headless runs (~30% of nights).
  Hypothesis and fix candidates recorded in D-028 / D-029.

A fresh session reading this needs zero archaeology. It knows the phase, the last verified state, the exact next action, and the known failure mode with a pointer to the reasoning. That last line is the goldfish-to-coworker moment: the agent doesn't re-diagnose the timeout from scratch — it recognizes it.

Lessons Learned

1. Session memory is the bottleneck, not model intelligence. I spent weeks tuning prompts to make sessions "smarter" when the actual failure was that each session was brilliant and blind. Claude Code (I'm on the 2026 releases) is easily capable of multi-week projects — if and only if context survives between sessions. Fix memory before you touch the prompt.

2. Write-on-decision beats write-on-exit, every time. Sessions end abruptly more often than you think: context limits, crashes, machine restarts. Across six months of 24/7 operation, mid-session persistence has saved me from replaying lost work dozens of times. If your agent buffers its learnings for a final summary step, you have a memory system that fails exactly when you need it most.

3. Separate "how", "where", and "why" into different files. My first attempt was one giant NOTES.md. It became a swamp — 700 lines where stale plans sat next to current status, and the agent couldn't tell which was which. Splitting by rate of change (rules change rarely, status changes hourly, decisions only append) fixed it. Each file gets one job and one update pattern.

4. An explicit priority order is not optional. Contradictions between files aren't an edge case; they're the steady state of any project that pivots. The single line "state memo always wins" has resolved more agent confusion than anything else in my setup. Without it, your agent averages your old plan and your new plan into something that is neither. ⚠️

5. Persist decisions and dead ends — not transcripts. My instinct was to keep everything. Wrong. Full session logs are write-only garbage: no future session will ever read 40k tokens of history. What's worth its weight in gold is tiny: what we decided, why, and what we tried that didn't work. "The obvious refactor breaks the release script — see D-014" is one line and saves an entire wasted night. Compression is the feature, not a compromise.

What's Next

The current design is deliberately low-tech, and I'm keeping it that way — but two upgrades are on my list:

Memory review passes: a periodic session whose only job is pruning the state files — merging duplicates, deleting stale entries, flagging contradictions before they bite.
Cross-project memory: my agents work across several codebases, and lessons learned in one ("this library's v3 API silently changed behavior") currently don't transfer. A shared, read-mostly lessons file is the next experiment.

I'll write both up once they've survived a few weeks of unattended runs — I only publish patterns after they've been through real nights, not demos.

Wrap-up

If your Claude Code sessions feel like hiring a genius with amnesia every morning, the fix probably isn't a better prompt. It's three Markdown files and two write rules.

If this was useful: follow me here on Dev.to — I'm documenting the whole journey of running a fully autonomous implementation system, war stories included. And if you've built agent memory a different way (vector stores? SQLite? something weirder?), I genuinely want to hear about it in the comments. 🚀

DEV Community