The 6-Layer Memory Architecture I Run for Claude Code

#aiagents #claudecode #memory #rag

I started where most people start: a single CLAUDE.md at the root of every repo. It worked for a few weeks. Then it started failing in the same boring way every time. The file grew past 200 lines and instructions started getting ignored. The agent re-learned the same infrastructure facts every session. I'd find myself pasting the same context into the chat again, then opening a doc to copy a command I'd already told Claude about three times that week.

So I kept adding layers. Six months later there are six of them. Last week I ripped out the two that didn't earn their keep, sanitized the rest, and pushed the whole thing as a public reference implementation at github.com/futhgar/agent-memory-architecture.

This post is the honest tour — what each layer does, what I got wrong, what I'd skip if I were starting fresh today.

The layers

Session start (always loaded)
├── Layer 1  Auto-memory (tool-provided persistence)
├── Layer 2  System instructions (CLAUDE.md / .cursorrules)
└── Layer 3  Path-scoped rules (load conditionally on file path)

On-demand retrieval (lazy)
├── Layer 4  Wiki knowledge base (markdown + [[wikilinks]])
├── Layer 5  Semantic vector search (Qdrant + nomic-embed-text)
└── Layer 6  Cognitive memory with activation decay (MSAM / Zep / Letta)

The composition is deliberate. Layers 1-3 sit in context every session so the agent starts knowing how to behave. Layers 4-6 are called on demand when the agent needs a specific fact — cheap index lookup (Layer 4) first, semantic search fallback (Layer 5) when the keyword index misses, cognitive memory (Layer 6) only when you need temporal dynamics that flat files can't express.

Most teams stop at Layer 2. Some go to Layer 4 — Karpathy's "LLM Wiki" insight that a disciplined wiki with good cross-references outperforms naive RAG at near-zero operational cost. I kept going because the homelab had enough breadth that even the wiki started missing on keyword lookups, and because session-specific learnings — "we tried X, it failed because Y" — didn't belong in permanent wiki articles but shouldn't evaporate either.

What went wrong along the way

CLAUDE.md bloat. The first mistake. I kept adding "this one more thing" to ~/.claude/CLAUDE.md and watched the agent ignore the back half. Anthropic's Claude Code memory docs explicitly say under 200 lines per file. Take that seriously. Every line above the threshold is making the lines below it less effective.

Using a vector store for things a keyword index would have solved. I set up a Qdrant claude-code-memory collection early and started dumping session learnings into it. Six months in, it had 451 points and most of them were never retrieved. The wiki could have solved 95% of what I was using it for. I still use the vector store for session-to-session learnings — but I no longer recommend reaching for it before Layer 4 is properly curated.

Treating path-scoped rules as optional. The .claude/rules/*.md pattern lets you write "when editing kubernetes/**, load these K8s conventions" rules that don't eat tokens when you're writing Python. Before I moved K8s conventions from the monolithic project CLAUDE.md into .claude/rules/kubernetes.md, my baseline context load was ~500-800 tokens higher for every session, whether or not I was editing K8s. That's real money — not the dollar kind, the context-window-budget kind that directly reduces how much the user can use for work.

Cognitive memory before I needed it. I set up MSAM (a custom ACT-R-inspired memory system with activation decay) for three months before I had a single use case that actually needed temporal dynamics. It was cool, but skipping to Layer 6 before Layers 4-5 are mature is the classic over-engineering trap. If I started fresh today, I'd stop at Layer 4 for at least a month and only add Layers 5-6 once the wiki's limitations were obvious from actual use.

Forgetting to validate the "fixes." Mid-project I realized my MSAM MCP integration was silently broken — the wrapper path in .claude.json pointed to a file that didn't exist. Every "use MSAM for this" instruction in CLAUDE.md had been a dead letter for weeks. The lesson: when you configure a memory system, test the round-trip (store → recall) before trusting that it works. Configuration isn't validation.

What's actually in the repo

The repo is opinionated and template-heavy, not a framework. You fork it or cherry-pick pieces; you don't install it and "run" it.

templates/global/CLAUDE.md and templates/project/CLAUDE.md — sanitized starting points, under 60 lines each
templates/rules/ — path-scoped rule examples for kubernetes, terraform, dockerfiles, and wiki editing
templates/memory-files/ — YAML-frontmatter templates for project / reference / feedback memory files
scripts/rebuild-memory-index.py — audits memory files for orphans, stale content, oversized files, and credential leaks
scripts/build-wiki-graph.py — generates an interactive force-directed graph of your wiki's [[wikilinks]] (or use Cosma for a more polished rendering)
scripts/msam-mcp-wrapper.py — a FastMCP wrapper for cognitive memory if you go that far
scripts/check-sanitization.sh — pre-publish scanner for secrets, IPs, and personal data if you fork this

There's a one-line installer:

curl -sSL https://raw.githubusercontent.com/futhgar/agent-memory-architecture/main/bootstrap.sh | bash -s -- --layer=2

That auto-detects your agent (Claude Code, Cursor, or Aider), backs up any existing files, and drops in the templates. It has a --dry-run flag because I wouldn't blind-trust someone else's curl-bash either. See docs/getting-started.md for the decision tree on which layer to install when.

What to skip

Honestly: Layer 6 unless you already know you need it. The cognitive memory layer is the most opinionated and least-validated part of the system. MSAM is a research-grade tool. Zep and Letta are the production alternatives. All three require infrastructure, monitoring, and conceptual work. If your wiki (Layer 4) + vector search (Layer 5) aren't yet exhausting your team's patience, Layer 6 is premature.

Also: don't cargo-cult the whole stack. The repo's docs/getting-started.md has a decision tree — "is your CLAUDE.md over 200 lines? yes → try Layer 3; no → stay at Layer 2." Most teams should stop at Layer 4. The repo exists so you can see what the whole road looks like, not because everyone should walk it.

Why open-source it

Two reasons. One is pure: the pattern is genuinely useful and I didn't see it written down anywhere in its full form. There are a lot of "here is my CLAUDE.md" repos and a lot of research on isolated pieces (RAG, knowledge graphs, vector stores), but the composition — which layers to use together, in what order, with what tradeoffs — was pattern I had to piece together from running it.

The other reason is less pure: I run Guatu Labs, and we help companies implement AI agent infrastructure. A reference implementation that people can actually read is a better pitch than anything I'd write on a services page. If this saves you a week of research, and later you need help rolling something similar out in your org, you know where to find me.

DEV Community