Everyone building multi-agent systems reaches for a vector database at some point.
We didn't. We've been running 5 agents with persistent cross-session memory for 6+ weeks using nothing but structured markdown files.
Here's why it works, when it doesn't, and the exact file structure.
Why vector DBs fail early-stage agents
Vector databases solve the retrieval problem. But early-stage agents don't have a retrieval problem — they have a curation problem.
You don't know what's worth remembering yet. You don't know what queries agents will run against memory. You don't know what's stale.
Building a vector retrieval layer before you understand your memory access patterns means building the wrong thing fast.
Markdown-first lets you understand the access patterns before you optimize them.
The memory file structure
~/.claude/projects/{project-hash}/memory/
MEMORY.md # index — loaded every session, must stay < 200 lines
user_identity.md # who the user is, role, context
feedback_*.md # corrections + confirmations (highest-value)
project_*.md # ongoing work, goals, decisions
reference_*.md # pointers to external systems
Each memory file has frontmatter:
---
name: Prompt Caching TTL Regression
description: "Anthropic dropped default TTL 1h→5m on March 6; disabling telemetry also kills 1h TTL"
type: reference
---
On March 6, 2026, Anthropic changed the default prompt cache TTL from 1 hour to 5 minutes.
**Why:** Confirmed by cache_read_input_tokens dropping to zero on unchanged production code.
**How to apply:** Always verify cache hit rate after any SDK update. Add cache monitoring to CI.
MEMORY.md is an index:
- [Prompt Caching TTL Regression](reference_cache_ttl.md) — Anthropic dropped default TTL 1h→5m on March 6
- [Revenue Priority](feedback_revenue_priority.md) — Revenue ops is top priority; other work is secondary
- [Agent Escalation Rules](feedback_escalation.md) — Gods escalate to Atlas on: complete OR hard blocker only
The index loads every session. Full memory files load on demand.
The four memory types
user/ — who they are, expertise level, preferences. Shapes how you respond, not what you do.
feedback/ — corrections and confirmations. Most valuable type. Record both: "don't do X" AND "yes, exactly that."
project/ — ongoing work state, goals, decisions. Decays fast — include a "Why:" line so you can judge if it's still load-bearing.
reference/ — pointers to external systems ("bugs tracked in Linear project INGEST", "oncall dashboard at grafana.internal/d/api-latency").
What NOT to save
This is where most implementations go wrong:
- Code patterns, file paths, architecture — derivable from reading the repo
- Git history, who-changed-what —
git logis authoritative - Debugging solutions — the fix is in the code, context is in the commit message
- In-progress task state — use a todo list, not memory
Memory is for things that are non-obvious from the codebase and persist across sessions.
When to upgrade to vector search
You'll know it's time when:
-
MEMORY.mdapproaches 200 lines and you're dropping relevant memories - Agents are asking "what do I know about X?" instead of reading the index
- You've built 3+ months of session logs and agents need to search them
At that point, the access patterns are clear. Build the retrieval layer you actually need.
The full memory system
The complete memory file structure, frontmatter schema, index format, and auto-memory instructions for Claude Code are in the open-source repo:
github.com/Wh0FF24/whoff-automation
The CONTRIBUTING.md also documents the agent persona system and spawn brief format that makes each agent's memory independent.
Part of the multi-agent toolkit at github.com/Wh0FF24/whoff-automation. Running in production since March 2026.
Top comments (0)