DEV Community

Atlas Whoff
Atlas Whoff

Posted on

How to Build Persistent Memory Into Claude Code Agents (Cross-Session Identity That Actually Works)

How to Build Persistent Memory Into Claude Code Agents (Cross-Session Identity That Actually Works)

Every Claude agent starts fresh. No memory of last session. No persistent identity. No accumulated context. You have to re-explain who you are, what your codebase looks like, and what you were working on — every single time.

This is the biggest productivity leak in AI-assisted development. Here's how to fix it with a file-based memory system that actually persists across sessions.

Why In-Context Memory Doesn't Work

The naive approach: dump everything into the system prompt. Problem: Claude's context window fills up fast, you pay full token price on every call, and there's no structure — just a blob of text.

The better approach: a structured memory store outside the model, with a smart retrieval layer that only pulls in what's relevant to the current session.

The Architecture

memory/
  MEMORY.md          # Index — always loaded, <200 lines
  user_profile.md    # Who is the user, preferences, domain knowledge
  project_state.md   # Current work, blockers, decisions made
  feedback_*.md      # Corrections and confirmed patterns
  reference_*.md     # External system pointers
Enter fullscreen mode Exit fullscreen mode

The index (MEMORY.md) loads into every session. It's a one-line-per-entry pointer file — titles and hooks only, no content. Individual memory files load on-demand when relevant.

Memory File Format

Each memory file uses frontmatter + structured body:

---
name: Prompt Caching TTL Regression
description: "Anthropic silently dropped cache TTL from 1h to 5min in March 2026"
type: reference
---

Default cache TTL dropped 1h → 5min on March 6, 2026.
**Why:** Anthropic changed behavior without changelog entry.
**How to apply:** Any caching strategy assuming >5min TTL is broken — audit immediately.
Enter fullscreen mode Exit fullscreen mode

The description field is critical — it's what the agent uses to decide whether to load the full file. Make it specific enough to be decision-useful.

Four Memory Types

user — who the developer is, expertise level, preferences. Lets the agent tailor explanations (don't explain async/await to a 10-year TypeScript veteran).

feedback — corrections and confirmed patterns. "Don't mock the database — we got burned when mocks passed but prod migration failed." This is where you capture the WHY behind non-obvious decisions.

project — current work state, blockers, deadlines. Always convert relative dates to absolute when saving.

reference — pointers to external systems. "Pipeline bugs in Linear project INGEST." "Oncall Grafana at grafana.internal/d/api-latency."

When to Write vs. When to Skip

Don't save:

  • Code patterns derivable by reading the codebase
  • Git history (use git log)
  • Ephemeral task state (use a todo file instead)
  • Anything in CLAUDE.md already

Do save:

  • Non-obvious user preferences and their reasons
  • Corrections the agent made that would otherwise repeat
  • External system locations that aren't in the repo
  • Architectural decisions with their rationale

TypeScript Implementation

import { readFileSync, writeFileSync, readdirSync } from 'fs';
import { join } from 'path';

interface MemoryEntry {
  name: string;
  description: string;
  type: 'user' | 'feedback' | 'project' | 'reference';
  body: string;
  file: string;
}

function loadMemoryIndex(memoryDir: string): string[] {
  const indexPath = join(memoryDir, 'MEMORY.md');
  try {
    return readFileSync(indexPath, 'utf-8').split('\n').filter(l => l.startsWith('-'));
  } catch {
    return [];
  }
}

function saveMemory(memoryDir: string, entry: Omit<MemoryEntry, 'file'>): string {
  const filename = `${entry.type}_${entry.name.toLowerCase().replace(/\s+/g, '_')}.md`;
  const filepath = join(memoryDir, filename);

  const content = `---\nname: ${entry.name}\ndescription: ${entry.description}\ntype: ${entry.type}\n---\n\n${entry.body}`;
  writeFileSync(filepath, content);

  // Update index
  const indexPath = join(memoryDir, 'MEMORY.md');
  const indexLine = `- [${entry.name}](${filename}) — ${entry.description}`;

  let index = '';
  try {
    index = readFileSync(indexPath, 'utf-8');
  } catch {
    index = '';
  }

  // Remove old entry for same file if exists
  const lines = index.split('\n').filter(l => !l.includes(`(${filename})`));
  lines.push(indexLine);
  writeFileSync(indexPath, lines.join('\n'));

  return filepath;
}

function retrieveRelevantMemories(
  memoryDir: string,
  query: string,
  maxEntries = 5
): MemoryEntry[] {
  const files = readdirSync(memoryDir).filter(f => f.endsWith('.md') && f !== 'MEMORY.md');

  return files
    .map(file => {
      const content = readFileSync(join(memoryDir, file), 'utf-8');
      const frontmatterMatch = content.match(/---\n([\s\S]*?)---\n([\s\S]*)/);
      if (!frontmatterMatch) return null;

      const meta: Record<string, string> = {};
      frontmatterMatch[1].split('\n').forEach(line => {
        const [key, ...val] = line.split(': ');
        if (key && val.length) meta[key.trim()] = val.join(': ').trim();
      });

      return {
        name: meta.name ?? file,
        description: meta.description ?? '',
        type: (meta.type ?? 'reference') as MemoryEntry['type'],
        body: frontmatterMatch[2].trim(),
        file,
      };
    })
    .filter((e): e is MemoryEntry => e !== null)
    .filter(e => {
      const q = query.toLowerCase();
      return (
        e.name.toLowerCase().includes(q) ||
        e.description.toLowerCase().includes(q) ||
        e.body.toLowerCase().includes(q)
      );
    })
    .slice(0, maxEntries);
}
Enter fullscreen mode Exit fullscreen mode

Wiring Into Claude API Calls

async function buildSystemPromptWithMemory(
  basePrompt: string,
  memoryDir: string,
  sessionContext: string
): Promise<string> {
  // Always load the index
  const index = readFileSync(join(memoryDir, 'MEMORY.md'), 'utf-8');

  // Retrieve relevant memories based on current session context
  const relevant = retrieveRelevantMemories(memoryDir, sessionContext);
  const memoryContent = relevant.map(m => `### ${m.name}\n${m.body}`).join('\n\n');

  return `${basePrompt}\n\n## Memory Index\n${index}\n\n## Loaded Memories\n${memoryContent}`;
}

// In your Claude API call:
const systemPrompt = await buildSystemPromptWithMemory(
  BASE_SYSTEM_PROMPT,
  './memory',
  'working on authentication refactor'
);

const response = await anthropic.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 4096,
  system: [
    {
      type: 'text',
      text: systemPrompt,
      cache_control: { type: 'ephemeral' }, // cache the stable base+index
    },
  ],
  messages,
});
Enter fullscreen mode Exit fullscreen mode

Updating Memory After Sessions

At session end, have the agent write new memories it learned:

const MEMORY_EXTRACTION_PROMPT = `
Review this conversation. Identify any information worth persisting:
- User corrections that reveal non-obvious preferences
- Architectural decisions with their rationale
- External system locations discovered
- Patterns confirmed as working or broken

For each, output JSON: { name, description, type, body }
Only extract genuinely non-obvious insights. Skip anything in the codebase.
`;
Enter fullscreen mode Exit fullscreen mode

The Staleness Problem

Memory rots. A memory that says "auth is in middleware/auth.ts" is wrong the moment someone renames the file.

Rule: before recommending based on memory, verify:

  • File paths → check they exist
  • Function names → grep for them
  • External systems → ping them

"The memory says X exists" ≠ "X exists now."

Build a verification step into your agent loop for any memory that names a specific artifact.

Results

With this system:

  • Session startup time drops from ~10 minutes of context-rebuilding to ~30 seconds
  • Agent doesn't repeat corrected mistakes across sessions
  • Institutional knowledge accumulates instead of resetting
  • Cache hit rate on system prompt stays high (stable content + cache_control)

The index stays under 200 lines. Individual memories load only when relevant. Total overhead per session: ~2,000-5,000 cached tokens (read at 10% price).

This is the difference between an agent that knows you and one you have to babysit.

Top comments (0)