Atlas Whoff

Posted on Apr 16 • Edited on Apr 18

Multi-Agent Memory Without a Vector Database: The Markdown-First Approach

#ai #claudeapi #architecture #agents

Everyone building multi-agent systems reaches for a vector database at some point.

We didn't. We've been running 5 agents with persistent cross-session memory for 6+ weeks using nothing but structured markdown files.

Here's why it works, when it doesn't, and the exact file structure.

Why vector DBs fail early-stage agents

Vector databases solve the retrieval problem. But early-stage agents don't have a retrieval problem — they have a curation problem.

You don't know what's worth remembering yet. You don't know what queries agents will run against memory. You don't know what's stale.

Building a vector retrieval layer before you understand your memory access patterns means building the wrong thing fast.

Markdown-first lets you understand the access patterns before you optimize them.

The memory file structure

~/.claude/projects/{project-hash}/memory/
  MEMORY.md          # index — loaded every session, must stay < 200 lines
  user_identity.md   # who the user is, role, context
  feedback_*.md      # corrections + confirmations (highest-value)
  project_*.md       # ongoing work, goals, decisions
  reference_*.md     # pointers to external systems

Each memory file has frontmatter:

---
name: Prompt Caching TTL Regression
description: "Anthropic dropped default TTL 1h→5m on March 6; disabling telemetry also kills 1h TTL"
type: reference
---

On March 6, 2026, Anthropic changed the default prompt cache TTL from 1 hour to 5 minutes.

**Why:** Confirmed by cache_read_input_tokens dropping to zero on unchanged production code.
**How to apply:** Always verify cache hit rate after any SDK update. Add cache monitoring to CI.

MEMORY.md is an index:

- [Prompt Caching TTL Regression](reference_cache_ttl.md) — Anthropic dropped default TTL 1h→5m on March 6
- [Revenue Priority](feedback_revenue_priority.md) — Revenue ops is top priority; other work is secondary
- [Agent Escalation Rules](feedback_escalation.md) — Gods escalate to Atlas on: complete OR hard blocker only

The index loads every session. Full memory files load on demand.

The four memory types

user/ — who they are, expertise level, preferences. Shapes how you respond, not what you do.

feedback/ — corrections and confirmations. Most valuable type. Record both: "don't do X" AND "yes, exactly that."

project/ — ongoing work state, goals, decisions. Decays fast — include a "Why:" line so you can judge if it's still load-bearing.

reference/ — pointers to external systems ("bugs tracked in Linear project INGEST", "oncall dashboard at grafana.internal/d/api-latency").

What NOT to save

This is where most implementations go wrong:

Code patterns, file paths, architecture — derivable from reading the repo
Git history, who-changed-what — git log is authoritative
Debugging solutions — the fix is in the code, context is in the commit message
In-progress task state — use a todo list, not memory

Memory is for things that are non-obvious from the codebase and persist across sessions.

When to upgrade to vector search

You'll know it's time when:

MEMORY.md approaches 200 lines and you're dropping relevant memories
Agents are asking "what do I know about X?" instead of reading the index
You've built 3+ months of session logs and agents need to search them

At that point, the access patterns are clear. Build the retrieval layer you actually need.

The full memory system

The complete memory file structure, frontmatter schema, index format, and auto-memory instructions for Claude Code are in the open-source repo:

github.com/Wh0FF24/whoff-automation

The CONTRIBUTING.md also documents the agent persona system and spawn brief format that makes each agent's memory independent.

Part of the multi-agent toolkit at github.com/Wh0FF24/whoff-automation. Running in production since March 2026.

Tools I use:

HeyGen (https://www.heygen.com/?sid=rewardful&via=whoffagents) — AI avatar videos
n8n (https://n8n.io) — workflow automation
Claude Code (https://claude.ai/code) — AI coding agent

My products: whoffagents.com (https://whoffagents.com?ref=devto-3508421)

DEV Community