DEV Community

Yuval

Posted on Jan 28

The Architecture of Persistent Memory for Claude Code

#ai #claude #opensource #architecture

Claude Code is powerful inside a session. But between sessions, it has amnesia. Every new conversation starts from zero. On a complex project, this means spending the first few minutes re-explaining your architecture, your decisions, your conventions, before you can do any real work.

I built an open-source MCP server called memory-mcp that fixes this. Here's how the architecture works.

The constraint that shaped everything

Claude Code already reads a file called CLAUDE.md on every session start. This is baked into how Claude Code works. Whatever is in that file becomes part of Claude's initial context.

Claude Code also has a hook system. You can register shell commands that fire after every response (Stop), before context compaction (PreCompact), and at session end (SessionEnd).

These two features, combined, are everything you need for persistent memory. The hooks capture knowledge. CLAUDE.md delivers it. No custom protocols, no external databases, no changes to how Claude Code works.

Two-tier memory

Not all memory is equal. Some context is needed every session. Some is only needed occasionally. Putting everything into CLAUDE.md wastes the context window. Putting everything into a database means Claude has to actively search for basics.

Tier 1: CLAUDE.md (~150 lines)
A compact briefing document, auto-generated and auto-updated. Contains the most important project knowledge ranked by confidence and access frequency. Claude reads this automatically on startup with zero prompting.

Tier 2: .memory/state.json (unlimited)
The full memory store. Every fact, decision, and observation ever captured. Accessible mid-conversation through MCP tools: keyword search, tag-based queries, and natural language questions synthesized by Haiku.

80% of sessions need only Tier 1. The other 20% can pull from Tier 2 on demand.

The capture pipeline

When a hook fires, the extractor reads the conversation transcript from where it last left off (tracked by a cursor file). If the new content exceeds 6,000 characters, it's chunked to stay within Haiku's sweet spot.

Each chunk is sent to Haiku with a structured prompt that includes existing memories. This is critical. Without existing context, Haiku would re-extract the same facts every time. With it, Haiku focuses only on genuinely new information.

Haiku returns JSON. Each memory has a type:

Types matter because they have different lifespans, which feeds into the decay system. Architecture, decisions, patterns, and gotchas are permanent. Progress fades after 7 days. Context fades after 30.

Deduplication: the hard problem

A naive memory system drowns in duplicates within days. The same architectural fact gets mentioned in five conversations. A decision gets re-explained with slightly different wording each time.

Three mechanisms prevent this:

Write-time dedup: Every incoming memory is tokenized and compared against existing memories of the same type using Jaccard similarity. If token overlap exceeds 60%, the new memory supersedes the old one rather than creating a duplicate.

LLM consolidation: Every 10 extractions (or when active memories exceed 80), Haiku reviews all memories grouped by type. It identifies overlapping entries to merge, outdated entries to drop, and contradictions to resolve. This is garbage collection for knowledge.

Confidence decay: Each memory has a confidence score (0 to 1). Progress memories decay toward zero over 7 days. Context decays over 30 days. Architecture, decisions, patterns, and gotchas never decay.

Memories below 0.3 confidence are excluded from CLAUDE.md but remain in the full store, searchable via MCP tools. Nothing is permanently lost, but the active surface area stays clean.

The CLAUDE.md budget system

An unbounded CLAUDE.md defeats the purpose. If it grows to 500 lines, you're wasting context window on low-value information.

The generator allocates a fixed line budget per section: 25 lines for architecture, 25 for decisions, 25 for patterns, 20 for gotchas, 30 for progress, 15 for context. Within each section, memories are ranked by confidence * accessCount. Frequently recalled, high-confidence memories surface first.

If a section has unused budget, the surplus redistributes to sections that need more space. If a section overflows, it gets truncated with a note pointing to the MCP search tools.

The output reads like a briefing document written by someone who deeply understands your project and knows exactly what you need to know right now.

MCP tools for deep recall

Sometimes Tier 1 isn't enough. Claude might need to remember a specific API design decision from three weeks ago, or the rationale behind a database schema choice. That's where the MCP tools come in:

memory_search: keyword search across all memories, ranked by relevance
memory_related: retrieve memories by tag for exploring a topic in depth
memory_ask: ask a natural language question, get a synthesized answer from the top 30 matching memories via Haiku

Claude uses these tools naturally during conversation when it needs more context than CLAUDE.md provides.

What it costs

Haiku is the only external dependency. Each extraction costs ~$0.001. Consolidation costs ~$0.002 every 10 extractions. A full day of active development runs $0.05-0.10. There's no vector database, no embedding API, no infrastructure to maintain.

What it doesn't do

This is a focused tool. It doesn't orchestrate multiple agents. It doesn't do vector search. It doesn't coordinate swarms or run consensus algorithms. If you need multi-agent orchestration, look at something like claude-flow.

memory-mcp does one thing: it makes sure Claude never forgets your project. Silent capture, smart storage, automatic recovery. The simplest architecture that solves the problem.

npm install -g claude-code-memory && memory-mcp setup

yuvalsuede / memory-mcp

Persistent memory for Claude Code — never lose context between sessions

memory-mcp

Persistent memory for Claude Code. Never lose context between sessions again.

The Problem

Every few days, you start a new Claude Code session and have to re-explain your project — the architecture, the decisions you made, the patterns you follow, the gotchas you discovered. All that context is gone.

The Solution

memory-mcp silently captures what matters during your sessions and makes it available to every future session — automatically.

Session 1: Claude works → hooks silently extract memories → saved
Session 2: Claude starts → reads CLAUDE.md → instantly knows everything

No commands to run. No "remember this". It just works.

How It Works

graph TB
    subgraph "Phase 1: Silent Capture"
        A[Claude Code Session] -->|User sends message| B[Claude responds]
        B -->|Hook fires: Stop/PreCompact/SessionEnd| C[extractor.js]
        C --> D[Read transcript from cursor]
        D --> E[Chunk if >6000 chars]
        E --> F[Send to Haiku LLM]
        F -->|Extract memories as JSON| G[Dedup

…

View on GitHub