Atlas Whoff

Posted on Apr 20 • Edited on Apr 23

How to Build Persistent Memory Into Claude Code Agents (Cross-Session Identity That Actually Works)

#ai #claude #agents #typescript

How to Build Persistent Memory Into Claude Code Agents (Cross-Session Identity That Actually Works)

Every Claude agent starts fresh. No memory of last session. No persistent identity. No accumulated context. You have to re-explain who you are, what your codebase looks like, and what you were working on — every single time.

This is the biggest productivity leak in AI-assisted development. Here's how to fix it with a file-based memory system that actually persists across sessions.

Why In-Context Memory Doesn't Work

The naive approach: dump everything into the system prompt. Problem: Claude's context window fills up fast, you pay full token price on every call, and there's no structure — just a blob of text.

The better approach: a structured memory store outside the model, with a smart retrieval layer that only pulls in what's relevant to the current session.

The Architecture

memory/
  MEMORY.md          # Index — always loaded, <200 lines
  user_profile.md    # Who is the user, preferences, domain knowledge
  project_state.md   # Current work, blockers, decisions made
  feedback_*.md      # Corrections and confirmed patterns
  reference_*.md     # External system pointers

The index (MEMORY.md) loads into every session. It's a one-line-per-entry pointer file — titles and hooks only, no content. Individual memory files load on-demand when relevant.

Memory File Format

Each memory file uses frontmatter + structured body:

---
name: Prompt Caching TTL Regression
description: "Anthropic silently dropped cache TTL from 1h to 5min in March 2026"
type: reference
---

Default cache TTL dropped 1h → 5min on March 6, 2026.
**Why:** Anthropic changed behavior without changelog entry.
**How to apply:** Any caching strategy assuming >5min TTL is broken — audit immediately.

The description field is critical — it's what the agent uses to decide whether to load the full file. Make it specific enough to be decision-useful.

Four Memory Types

user — who the developer is, expertise level, preferences. Lets the agent tailor explanations (don't explain async/await to a 10-year TypeScript veteran).

feedback — corrections and confirmed patterns. "Don't mock the database — we got burned when mocks passed but prod migration failed." This is where you capture the WHY behind non-obvious decisions.

project — current work state, blockers, deadlines. Always convert relative dates to absolute when saving.

reference — pointers to external systems. "Pipeline bugs in Linear project INGEST." "Oncall Grafana at grafana.internal/d/api-latency."

When to Write vs. When to Skip

Don't save:

Code patterns derivable by reading the codebase
Git history (use git log)
Ephemeral task state (use a todo file instead)
Anything in CLAUDE.md already

Do save:

Non-obvious user preferences and their reasons
Corrections the agent made that would otherwise repeat
External system locations that aren't in the repo
Architectural decisions with their rationale

TypeScript Implementation

import { readFileSync, writeFileSync, readdirSync } from 'fs';
import { join } from 'path';

interface MemoryEntry {
  name: string;
  description: string;
  type: 'user' | 'feedback' | 'project' | 'reference';
  body: string;
  file: string;
}

function loadMemoryIndex(memoryDir: string): string[] {
  const indexPath = join(memoryDir, 'MEMORY.md');
  try {
    return readFileSync(indexPath, 'utf-8').split('\n').filter(l => l.startsWith('-'));
  } catch {
    return [];
  }
}

function saveMemory(memoryDir: string, entry: Omit<MemoryEntry, 'file'>): string {
  const filename = `${entry.type}_${entry.name.toLowerCase().replace(/\s+/g, '_')}.md`;
  const filepath = join(memoryDir, filename);

  const content = `---\nname: ${entry.name}\ndescription: ${entry.description}\ntype: ${entry.type}\n---\n\n${entry.body}`;
  writeFileSync(filepath, content);

  // Update index
  const indexPath = join(memoryDir, 'MEMORY.md');
  const indexLine = `- [${entry.name}](${filename}) — ${entry.description}`;

  let index = '';
  try {
    index = readFileSync(indexPath, 'utf-8');
  } catch {
    index = '';
  }

  // Remove old entry for same file if exists
  const lines = index.split('\n').filter(l => !l.includes(`(${filename})`));
  lines.push(indexLine);
  writeFileSync(indexPath, lines.join('\n'));

  return filepath;
}

function retrieveRelevantMemories(
  memoryDir: string,
  query: string,
  maxEntries = 5
): MemoryEntry[] {
  const files = readdirSync(memoryDir).filter(f => f.endsWith('.md') && f !== 'MEMORY.md');

  return files
    .map(file => {
      const content = readFileSync(join(memoryDir, file), 'utf-8');
      const frontmatterMatch = content.match(/---\n([\s\S]*?)---\n([\s\S]*)/);
      if (!frontmatterMatch) return null;

      const meta: Record<string, string> = {};
      frontmatterMatch[1].split('\n').forEach(line => {
        const [key, ...val] = line.split(': ');
        if (key && val.length) meta[key.trim()] = val.join(': ').trim();
      });

      return {
        name: meta.name ?? file,
        description: meta.description ?? '',
        type: (meta.type ?? 'reference') as MemoryEntry['type'],
        body: frontmatterMatch[2].trim(),
        file,
      };
    })
    .filter((e): e is MemoryEntry => e !== null)
    .filter(e => {
      const q = query.toLowerCase();
      return (
        e.name.toLowerCase().includes(q) ||
        e.description.toLowerCase().includes(q) ||
        e.body.toLowerCase().includes(q)
      );
    })
    .slice(0, maxEntries);
}

Wiring Into Claude API Calls

async function buildSystemPromptWithMemory(
  basePrompt: string,
  memoryDir: string,
  sessionContext: string
): Promise<string> {
  // Always load the index
  const index = readFileSync(join(memoryDir, 'MEMORY.md'), 'utf-8');

  // Retrieve relevant memories based on current session context
  const relevant = retrieveRelevantMemories(memoryDir, sessionContext);
  const memoryContent = relevant.map(m => `### ${m.name}\n${m.body}`).join('\n\n');

  return `${basePrompt}\n\n## Memory Index\n${index}\n\n## Loaded Memories\n${memoryContent}`;
}

// In your Claude API call:
const systemPrompt = await buildSystemPromptWithMemory(
  BASE_SYSTEM_PROMPT,
  './memory',
  'working on authentication refactor'
);

const response = await anthropic.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 4096,
  system: [
    {
      type: 'text',
      text: systemPrompt,
      cache_control: { type: 'ephemeral' }, // cache the stable base+index
    },
  ],
  messages,
});

Updating Memory After Sessions

At session end, have the agent write new memories it learned:

const MEMORY_EXTRACTION_PROMPT = `
Review this conversation. Identify any information worth persisting:
- User corrections that reveal non-obvious preferences
- Architectural decisions with their rationale
- External system locations discovered
- Patterns confirmed as working or broken

For each, output JSON: { name, description, type, body }
Only extract genuinely non-obvious insights. Skip anything in the codebase.
`;

The Staleness Problem

Memory rots. A memory that says "auth is in middleware/auth.ts" is wrong the moment someone renames the file.

Rule: before recommending based on memory, verify:

File paths → check they exist
Function names → grep for them
External systems → ping them

"The memory says X exists" ≠ "X exists now."

Build a verification step into your agent loop for any memory that names a specific artifact.

Results

With this system:

Session startup time drops from ~10 minutes of context-rebuilding to ~30 seconds
Agent doesn't repeat corrected mistakes across sessions
Institutional knowledge accumulates instead of resetting
Cache hit rate on system prompt stays high (stable content + cache_control)

The index stays under 200 lines. Individual memories load only when relevant. Total overhead per session: ~2,000-5,000 cached tokens (read at 10% price).

This is the difference between an agent that knows you and one you have to babysit.

More AI tools and automation kits → whoffagents.com

DEV Community

How to Build Persistent Memory Into Claude Code Agents (Cross-Session Identity That Actually Works)

How to Build Persistent Memory Into Claude Code Agents (Cross-Session Identity That Actually Works)

Why In-Context Memory Doesn't Work

The Architecture

Memory File Format

Four Memory Types

When to Write vs. When to Skip

TypeScript Implementation

Wiring Into Claude API Calls

Updating Memory After Sessions

The Staleness Problem

Results

Top comments (0)