Discussion on: How I Built a Memory System for Claude Code and Open-Sourced It

View post

Iurii • Mar 29 • Edited

Claude has reference implementation of memory MCP server.

I would very much like to see a full RAG vector memory (kind of what AnythingLLM does but for code specifically).

So that I could ask about "ORM" and Claude would retrieve "database" and "migrations" topics from the memory

Serhii Kravchenko • Mar 31

Good point, Iurii — and yeah, I'm aware of Anthropic's memory MCP server reference implementation.

The RAG/vector approach is tempting, especially for the semantic retrieval you're describing (ask about "ORM" and get "database" + "migrations" back). In theory, it's cleaner than flat files.

But here's what we found in practice: for most Claude Code workflows, the overhead of maintaining a vector DB (embeddings, indexing, retrieval pipeline) doesn't pay off. Here's why:

Claude already does semantic matching — when the agent reads MEMORY.md index and sees a topic file called database.md, it knows to pull it when you ask about ORM or migrations. The model's own understanding of semantic relationships handles 90%+ of the routing without any embeddings.
The bottleneck isn't retrieval, it's curation — the hard part isn't finding the right memory, it's keeping memory clean and useful over time. A vector DB with 2000 noisy entries retrieves noisy results. Our curated 200-line index + focused topic files stays sharp because the agent is told exactly what to save and what to skip.
Zero infrastructure — no embedding model, no vector store, no indexing step. Just markdown files in a git repo. Works offline, syncs with git, readable by humans. For a solo dev or small team, that simplicity matters a lot.

That said — for large codebases where you need to search across thousands of files semantically, a vector layer absolutely makes sense. It's more of a "what scale are you at" question. For project-level memory (decisions, patterns, preferences), flat files win. For codebase-level search across 100K+ lines, yeah, embeddings would help.

Would be cool to see someone build a hybrid — flat file memory for project context + vector search for codebase navigation. Best of both worlds.

Iurii • Mar 31

Thanks for the answer — makes total sense to me.

Claude already does semantic matching. Of course, in theory a vector DB would spare context window better. I’m curious about real-world results, but not curious enough to build it myself 😀
This is a very valid topic. I’ve observed how quickly CLAUDE.md can “rot” under rapid changes (something like upgrading a framework to a new major version is one prompt away). Human-readable mismatch is easier to spot.
I don’t see the infrastructure as the issue, but plain Markdown actually benefits teams more (everyone has the same files vs. everyone having their own unique binary database).