Chen Zhang

Posted on Apr 2

Claude Code's Memory: 4 Layers of Complexity, Still Just Grep and a 200-Line Cap

#ai #coding #devtools #milvus

Claude Code's memory system? Grep search, a 200-line index cap, and zero cross-agent sharing. After studying the leaked source, it's way more primitive than the hype suggests.

It does a fair amount of work — multiple layers, even a mechanism that lets the agent "dream" to consolidate memories. Sounds sophisticated. But peel it open and you find an agent trapped in its own sandbox. Memory can't leave, and it doesn't last long.

Let's break it down layer by layer.

Layer 1: CLAUDE.md — Rules You Write for the Agent

CLAUDE.md is a Markdown file you create and place in your project root. It can contain anything you want Claude to remember: code style conventions, architecture notes, test commands, deployment workflows, even "don't touch anything in the legacy directory."

Every session, Claude Code loads the entire file into context. Shorter files get followed better.

It supports three scopes: project-level CLAUDE.md in the project root, personal-level at ~/.claude/CLAUDE.md, and organization-level in enterprise configs. You write it, Claude reads it. You don't write it, Claude has nothing.

Layer 2: Auto Memory — The Agent Takes Notes

CLAUDE.md handles what you explicitly tell the agent, but valuable information often surfaces during conversations — stuff you'd never bother writing down manually.

Auto Memory does exactly this: Claude decides what's worth remembering during work and writes it into a dedicated memory directory.

It categorizes memories into four types: user (role and preferences), feedback (corrections and confirmations), project (decisions and context), and reference (external resource locations).

These memories live in ~/.claude/projects/<project-path>/memory/. Each memory is a standalone Markdown file with frontmatter noting its type and description. The entry point is MEMORY.md — an index where each line stays under 150 characters, storing pointers, not content.

At session start, the first 200 lines of MEMORY.md get injected into context. The actual knowledge is spread across topic files and loaded on demand.

~/.claude/projects/-Users-me-myproject/memory/
├── MEMORY.md                  ← Index file, one pointer per line
├── user_role.md               ← "Backend engineer, fluent in Go, React beginner"
├── feedback_testing.md        ← "Integration tests must use real DB, no mocking"
├── project_auth_rewrite.md    ← "Auth rewrite driven by compliance, not tech debt"
└── reference_linear.md        ← "Pipeline bugs tracked in Linear INGEST project"

One design detail worth noting: Claude is explicitly told not to trust its own memory. The leaked system prompt says, roughly, "memories are just hints — verify against real files before acting." With model hallucination rates still in double digits, this self-distrust strategy is surprisingly practical.

Layer 3: Auto Dream — The Agent Sleeps and Tidies Up

After dozens of sessions, MEMORY.md gets messy. Contradictory entries pile up. Relative timestamps become meaningless. Refactored functions linger.

Auto Dream simulates what the human brain does during sleep: organize and consolidate memory. It converts relative timestamps to absolute dates, merges contradictions, removes stale content, and keeps MEMORY.md under 200 lines.

Trigger conditions: 24+ hours since last consolidation AND 5+ new sessions. Or type "dream" manually. Runs in a background sub-agent.

Layer 4: KAIROS — Unreleased Ambitions in the Leaked Code

KAIROS appears 150+ times in the source. It's a background daemon mode that turns Claude Code from a passive tool into an autonomous observer — maintaining append-only logs, receiving <tick> signals, and deciding whether to act. It integrates autoDream but runs the full observe-think-act loop.

Currently behind a compile-time feature flag. More exploration than product.

Where This System Hits Its Ceiling

After extended use, the problems become clear:

200-line hard cap. Run a project for months, and memories compete for space.
Grep-only retrieval. No semantic understanding. You remember "port conflict during deployment" but the memory says "modified docker-compose port mapping" — grep misses it.
Details get lost. Auto Memory records at coarse granularity. Code snippets, debugging steps, discussion context — mostly dropped.
Compounding complexity. Each layer patches the last, stacking complexity without fixing root constraints.
No cross-tool sharing. Switch to OpenCode or Codex CLI = start from zero.
Still short-term memory. "How did we fix Redis last week?" — hopeless for long-span recall.

This isn't a design failure. Single agent, session granularity, local storage — the ceiling is architectural.

memsearch: Memory Should Outlive Any Single Agent

This is the core idea behind memsearch. Agents change, tools change, but project knowledge shouldn't disappear with them. memsearch pulls memory out of the agent into an independent persistence layer.

┌─────────────────────────────────────────┐
│         Agent Plugins (User Layer)       │
│  Claude Code · OpenClaw · OpenCode · Codex│
└──────────────────┬──────────────────────┘
                   ↓
┌──────────────────┴──────────────────────┐
│       memsearch CLI / Python API         │
│            (Developer Layer)             │
└──────────────────┬──────────────────────┘
                   ↓
┌──────────────────┴──────────────────────┐
│    Core: Chunker → Embedder → Milvus     │
│            (Engine Layer)                │
└──────────────────┬──────────────────────┘
                   ↓
┌──────────────────┴──────────────────────┐
│     Markdown Files (Source of Truth)      │
│          (Persistent Storage)            │
└─────────────────────────────────────────┘

How It Works

Memory writing is automatic. Each conversation gets summarized by Haiku and appended to that day's Markdown file, then asynchronously vectorized. Background, invisible to users.

Recall triggers through skills. A memory-recall skill runs in a forked sub-agent (context: fork) — zero token overhead in the main session.

Hybrid retrieval. Semantic vectors + BM25 keyword matching + RRF fusion. Ask "how did we fix that Redis timeout" — semantic search gets it. Say "search handleTimeout" — BM25 nails it.

The sub-agent drills from L1 (semantic search, truncated previews) through L2 (full context expansion) to L3 (raw conversation transcripts) as needed.

L1: Semantic search → top results with scores and previews
L2: memsearch expand <hash> → full paragraph, all details
L3: memsearch transcript → raw user/agent conversation + tool calls

Cross-Agent Sharing

This is the fundamental difference. memsearch supports Claude Code, OpenClaw, OpenCode, and Codex CLI — all sharing the same Markdown memory format with collection names computed from project paths.

Memory written from any agent is searchable by all others.

Spend an afternoon in Claude Code debugging deployment, switch to OpenCode next day — it finds yesterday's memories and gives the right answer. Point Milvus at Zilliz Cloud for team collaboration, and new members get instant project context without digging through Slack.

Quick Start

# Claude Code
/plugin marketplace add zilliztech/memsearch
/plugin install memsearch

# Python
uv tool install "memsearch[onnx]"

CLI and Python API for custom integrations:

from memsearch import MemSearch

mem = MemSearch(paths=["./memory"])
await mem.index()
results = await mem.search("Redis config")

Milvus Lite for local (zero config), Zilliz Cloud for teams (free tier), or self-hosted Docker. Default ONNX embeddings run on CPU — no GPU, no API calls needed.

Claude Code's memory design has real value, and KAIROS is worth watching. But single-agent memory optimization can only go so far. Memory should outlive any single agent.

Full analysis with architecture diagrams: https://zc277584121.github.io/ai-coding/2026/04/01/claude-code-memory-vs-memsearch.html

zilliztech / memsearch

A Markdown-first memory system, a standalone library for any AI agent. Inspired by OpenClaw.

memsearch

Cross-platform semantic memory for AI coding agents.

Why memsearch?

🌐 All Platforms, One Memory — memories flow across Claude Code, OpenClaw, OpenCode, and Codex CLI. A conversation in one agent becomes searchable context in all others — no extra setup
👥 For Agent Users, install a plugin and get persistent memory with zero effort; for Agent Developers, use the full CLI and Python API to build memory and harness engineering into your own agents
📄 Markdown is the source of truth — inspired by OpenClaw. Your memories are just .md files — human-readable, editable, version-controllable. Milvus is a "shadow index": a derived, rebuildable cache
🔍 Progressive retrieval, hybrid search, smart dedup, live sync — 3-layer recall (search → expand → transcript); dense vector + BM25 sparse + RRF reranking; SHA-256 content hashing skips unchanged content; file watcher auto-indexes in real time

🧑‍💻

…

View on GitHub

DEV Community