How to Build Persistent Memory Into Claude Code Agents (Cross-Session Identity That Actually Works)
Every Claude agent starts fresh. No memory of last session. No persistent identity. No accumulated context. You have to re-explain who you are, what your codebase looks like, and what you were working on — every single time.
This is the biggest productivity leak in AI-assisted development. Here's how to fix it with a file-based memory system that actually persists across sessions.
Why In-Context Memory Doesn't Work
The naive approach: dump everything into the system prompt. Problem: Claude's context window fills up fast, you pay full token price on every call, and there's no structure — just a blob of text.
The better approach: a structured memory store outside the model, with a smart retrieval layer that only pulls in what's relevant to the current session.
The Architecture
memory/
MEMORY.md # Index — always loaded, <200 lines
user_profile.md # Who is the user, preferences, domain knowledge
project_state.md # Current work, blockers, decisions made
feedback_*.md # Corrections and confirmed patterns
reference_*.md # External system pointers
The index (MEMORY.md) loads into every session. It's a one-line-per-entry pointer file — titles and hooks only, no content. Individual memory files load on-demand when relevant.
Memory File Format
Each memory file uses frontmatter + structured body:
---
name: Prompt Caching TTL Regression
description: "Anthropic silently dropped cache TTL from 1h to 5min in March 2026"
type: reference
---
Default cache TTL dropped 1h → 5min on March 6, 2026.
**Why:** Anthropic changed behavior without changelog entry.
**How to apply:** Any caching strategy assuming >5min TTL is broken — audit immediately.
The description field is critical — it's what the agent uses to decide whether to load the full file. Make it specific enough to be decision-useful.
Four Memory Types
user — who the developer is, expertise level, preferences. Lets the agent tailor explanations (don't explain async/await to a 10-year TypeScript veteran).
feedback — corrections and confirmed patterns. "Don't mock the database — we got burned when mocks passed but prod migration failed." This is where you capture the WHY behind non-obvious decisions.
project — current work state, blockers, deadlines. Always convert relative dates to absolute when saving.
reference — pointers to external systems. "Pipeline bugs in Linear project INGEST." "Oncall Grafana at grafana.internal/d/api-latency."
When to Write vs. When to Skip
Don't save:
- Code patterns derivable by reading the codebase
- Git history (use
git log) - Ephemeral task state (use a todo file instead)
- Anything in CLAUDE.md already
Do save:
- Non-obvious user preferences and their reasons
- Corrections the agent made that would otherwise repeat
- External system locations that aren't in the repo
- Architectural decisions with their rationale
TypeScript Implementation
import { readFileSync, writeFileSync, readdirSync } from 'fs';
import { join } from 'path';
interface MemoryEntry {
name: string;
description: string;
type: 'user' | 'feedback' | 'project' | 'reference';
body: string;
file: string;
}
function loadMemoryIndex(memoryDir: string): string[] {
const indexPath = join(memoryDir, 'MEMORY.md');
try {
return readFileSync(indexPath, 'utf-8').split('\n').filter(l => l.startsWith('-'));
} catch {
return [];
}
}
function saveMemory(memoryDir: string, entry: Omit<MemoryEntry, 'file'>): string {
const filename = `${entry.type}_${entry.name.toLowerCase().replace(/\s+/g, '_')}.md`;
const filepath = join(memoryDir, filename);
const content = `---\nname: ${entry.name}\ndescription: ${entry.description}\ntype: ${entry.type}\n---\n\n${entry.body}`;
writeFileSync(filepath, content);
// Update index
const indexPath = join(memoryDir, 'MEMORY.md');
const indexLine = `- [${entry.name}](${filename}) — ${entry.description}`;
let index = '';
try {
index = readFileSync(indexPath, 'utf-8');
} catch {
index = '';
}
// Remove old entry for same file if exists
const lines = index.split('\n').filter(l => !l.includes(`(${filename})`));
lines.push(indexLine);
writeFileSync(indexPath, lines.join('\n'));
return filepath;
}
function retrieveRelevantMemories(
memoryDir: string,
query: string,
maxEntries = 5
): MemoryEntry[] {
const files = readdirSync(memoryDir).filter(f => f.endsWith('.md') && f !== 'MEMORY.md');
return files
.map(file => {
const content = readFileSync(join(memoryDir, file), 'utf-8');
const frontmatterMatch = content.match(/---\n([\s\S]*?)---\n([\s\S]*)/);
if (!frontmatterMatch) return null;
const meta: Record<string, string> = {};
frontmatterMatch[1].split('\n').forEach(line => {
const [key, ...val] = line.split(': ');
if (key && val.length) meta[key.trim()] = val.join(': ').trim();
});
return {
name: meta.name ?? file,
description: meta.description ?? '',
type: (meta.type ?? 'reference') as MemoryEntry['type'],
body: frontmatterMatch[2].trim(),
file,
};
})
.filter((e): e is MemoryEntry => e !== null)
.filter(e => {
const q = query.toLowerCase();
return (
e.name.toLowerCase().includes(q) ||
e.description.toLowerCase().includes(q) ||
e.body.toLowerCase().includes(q)
);
})
.slice(0, maxEntries);
}
Wiring Into Claude API Calls
async function buildSystemPromptWithMemory(
basePrompt: string,
memoryDir: string,
sessionContext: string
): Promise<string> {
// Always load the index
const index = readFileSync(join(memoryDir, 'MEMORY.md'), 'utf-8');
// Retrieve relevant memories based on current session context
const relevant = retrieveRelevantMemories(memoryDir, sessionContext);
const memoryContent = relevant.map(m => `### ${m.name}\n${m.body}`).join('\n\n');
return `${basePrompt}\n\n## Memory Index\n${index}\n\n## Loaded Memories\n${memoryContent}`;
}
// In your Claude API call:
const systemPrompt = await buildSystemPromptWithMemory(
BASE_SYSTEM_PROMPT,
'./memory',
'working on authentication refactor'
);
const response = await anthropic.messages.create({
model: 'claude-opus-4-7',
max_tokens: 4096,
system: [
{
type: 'text',
text: systemPrompt,
cache_control: { type: 'ephemeral' }, // cache the stable base+index
},
],
messages,
});
Updating Memory After Sessions
At session end, have the agent write new memories it learned:
const MEMORY_EXTRACTION_PROMPT = `
Review this conversation. Identify any information worth persisting:
- User corrections that reveal non-obvious preferences
- Architectural decisions with their rationale
- External system locations discovered
- Patterns confirmed as working or broken
For each, output JSON: { name, description, type, body }
Only extract genuinely non-obvious insights. Skip anything in the codebase.
`;
The Staleness Problem
Memory rots. A memory that says "auth is in middleware/auth.ts" is wrong the moment someone renames the file.
Rule: before recommending based on memory, verify:
- File paths → check they exist
- Function names → grep for them
- External systems → ping them
"The memory says X exists" ≠ "X exists now."
Build a verification step into your agent loop for any memory that names a specific artifact.
Results
With this system:
- Session startup time drops from ~10 minutes of context-rebuilding to ~30 seconds
- Agent doesn't repeat corrected mistakes across sessions
- Institutional knowledge accumulates instead of resetting
- Cache hit rate on system prompt stays high (stable content + cache_control)
The index stays under 200 lines. Individual memories load only when relevant. Total overhead per session: ~2,000-5,000 cached tokens (read at 10% price).
This is the difference between an agent that knows you and one you have to babysit.
Top comments (0)