Why I Replaced Vector Databases with Markdown Files for AI Agent Memory

#ai #llm #database #architecture

Everyone building AI agents reaches the same crossroads: where do you store the agent's memory?

The default answer in 2026 is a vector database. Pinecone, Chroma, Weaviate, pgvector — embed everything, similarity search at query time.

I tried it. Then I ripped it out and replaced it with markdown files.

The Vector DB Problem

Vector databases solve a real problem: finding semantically similar content in a large corpus. But for AI agent memory, they introduce three problems:

1. False negatives are silent killers.

Your agent decided something important 3 days ago. At query time, the embedding similarity score is 0.71. Your threshold is 0.75. The memory doesn't surface. The agent contradicts itself. You don't find out until production breaks.

With files: if the memory is in the file and the file is loaded into context, the LLM sees it. Period. Zero false negatives.

2. You can't debug embeddings.

When your agent does something wrong, you need to ask: "what did it remember?" With a vector DB, the answer requires understanding cosine similarity scores and embedding space geometry. With files: open the file, read it. Done.

3. The infrastructure tax.

A vector DB needs: hosting, backups, an embedding model (usually an API call = cost), index management, schema versioning. For an agent with hundreds of memories, this is wildly over-engineered.

The File-Based Alternative

My system uses three markdown files and a keyword router:

brain-index.md          ← keyword → file mapping
brain/tasks/active.md   ← what needs doing  
brain/changes/active.md ← what changed
brain/decisions/active.md ← what was decided

On session start, the agent reads brain-index.md (keyword table mapping topics to files). Based on the conversation, it loads the relevant file into context.

No embeddings. No similarity search. No infrastructure.

How Memories Get Written

A 3-script pipeline runs every 10 minutes:

brain-pipe.sh — Extracts new messages. Truncates to 300 chars/msg. Caps at 2KB.
llama-categorize.sh — Local Llama 3.2 1B categorizes into JSON. ~60% filtered as noise.
brain-filer.sh — Routes to correct file. Rebuilds keyword index. Telegram notification.

Total latency: ~200ms. Total cost: $0.

The Key Insight

Do the semantic work at write time, not read time.

The LLM categorizes the memory into the right file when it's created. At read time, you just need keyword matching. The expensive semantic reasoning happens once (at write time via local Llama), not on every query.

The Keyword Router Pattern

| Keywords | Read From |
|----------|----------|
| trading, P&L, stop-loss | brain/decisions/active.md |
| API, keys, vault | brain/changes/active.md |
| error, crash, bug | brain/open/active.md |

Agent sees "trading strategies" → matches keyword table → loads decisions file. Simple.