If you use Claude Code for anything beyond one-off scripts, you've hit the memory wall. Every session starts from zero. Context compaction destroys your working state. MEMORY.md caps out at 200 lines.
The good news: the community has built real solutions. The bad news: there are enough options that choosing one is its own time sink.
I've tested all of the major approaches. Here's an honest comparison — what works, what breaks, and which one fits your situation.
1. CLAUDE.md + MEMORY.md (Built-In)
What it is: Claude Code's native memory system. CLAUDE.md files hold project instructions; MEMORY.md (in ~/.claude/projects/) stores notes Claude writes to itself. Both load into the system prompt at session start.
Setup: Nothing. It's already there.
How it works:
-
CLAUDE.mdat your project root — team-shared instructions, conventions, architecture notes -
CLAUDE.local.md— personal notes, auto-gitignored -
~/.claude/CLAUDE.md— global preferences across all projects -
MEMORY.md— auto-generated by Claude, loaded at session start
The good:
- Zero setup, zero dependencies, zero cost
- Works offline
- CLAUDE.md is version-controlled with your project
- Simple enough to understand in 5 minutes
The bad:
- MEMORY.md has a hard 200-line cap. Content beyond line 200 is silently dropped. No warning. (Issue #25006)
- No search. Claude reads the entire file every session. With 200 lines of notes, it has no way to find specific context by meaning.
- Post-compaction amnesia. Multiple bug reports document Claude ignoring CLAUDE.md after context compaction. (Issue #4017, 20 upvotes)
- No automatic extraction. Claude has to decide what to write down. Important context slips through constantly.
- No cross-device sync. Each machine has its own disconnected MEMORY.md.
- Hidden token cost. Every message re-sends the full CLAUDE.md. One developer found cache reads consuming 99.93% of total token usage.
Best for: Small projects, quick tasks, developers who don't want to install anything.
Verdict: Fine for getting started. Inadequate for serious, multi-session development.
2. Local Vector Database Solutions (~29,700 GitHub Stars)
What it is: The most popular third-party memory approach. Automatically captures session context, compresses it with Ai, and stores it in a local database with vector search.
Setup: Typically a single init command, but requires local dependencies.
How it works:
- Hooks into Claude Code's session lifecycle
- Captures conversation context and compresses it into summaries
- Stores summaries in local databases with vector embeddings
- Injects relevant context at session start
The good:
- Massive community — tens of thousands of stars, actively maintained
- Battle-tested across thousands of developers
- Open source
- Session summaries are automatic
- Vector search finds relevant context
The bad:
- Local dependencies. Requires multiple runtimes and databases running on your machine.
- Resource consumption. Significant memory usage reported during long sessions, due to in-process vector stores.
- No cross-device sync. Your memories live on one machine. Work from a laptop and desktop? Two separate memory stores.
- Session-level granularity. Captures session summaries, not individual facts. You can't search for a specific architecture decision — you search for sessions that might have mentioned it.
Best for: Developers who want the most popular, proven approach and work from a single machine with plenty of RAM.
Verdict: The community standard. Solid choice if you don't mind local resource usage and single-machine limitations.
3. Other MCP Memory Servers (~1,200 GitHub Stars)
What it is: MCP servers providing persistent memory with knowledge graph features, semantic search, and autonomous memory consolidation.
Setup: Typically requires Python and multiple dependencies, with several configuration steps.
How it works:
- Runs as an MCP server alongside Claude Code
- Stores memories locally with vector embeddings
- Provides tools for saving, searching, and managing memories
- Includes knowledge graph relationships between memories
- Autonomous consolidation merges related memories over time
The good:
- Knowledge graph structure adds relationship context
- Semantic search with local embeddings
- Autonomous consolidation reduces memory bloat
- MCP-native — works through the standard protocol
The bad:
- Complex setup. Requires multiple local dependencies and configuration steps.
- Stability varies. Some projects have experienced significant version churn and reliability issues.
- Local-only. Same single-machine limitation as local vector database solutions.
- Smaller community. Fewer people testing edge cases compared to the most popular solutions.
- Heavy dependencies. Embedding model downloads can be hundreds of megabytes and may fail on some platforms.
Best for: Developers who want knowledge graph features and don't mind a more complex setup process.
Verdict: Ambitious architecture, but stability varies across projects. Test thoroughly before committing.
4. CogmemAi (Cloud-Based)
What it is: A cloud-first MCP server that moves all memory intelligence server-side. The local MCP server is a thin HTTP client — no databases, no vector stores, no heavy dependencies. Full disclosure: I built this one.
Setup:
npx cogmemai-mcp setup
How it works:
- 18 MCP tools: save, recall (semantic search), extract (Ai-powered), project context, analytics, knowledge graph, import/export, and more
- Memories stored with high-dimensional semantic embeddings server-side
- Ai extraction identifies important facts from conversations automatically
- Smart deduplication detects duplicate and conflicting memories
- Automatic project scoping + global preferences that follow you everywhere
- Automatic compaction recovery — context is preserved before compaction and seamlessly restored afterward
The good:
- Zero local setup. No databases, no Python, no Docker, no vector stores. One command.
- Zero RAM issues. Nothing running locally except a thin HTTP client.
- Cross-device sync. Memories are in the cloud. Work from any machine.
- Compaction recovery. Automatically saves context before compaction and restores it after.
- Semantic search. Find memories by meaning, not keywords.
- Ai extraction. Automatically identifies facts worth remembering.
- Document ingestion. Feed in READMEs or docs to quickly build project context.
- Free tier: 1,000 memories, 500 extractions/month, 5 projects.
The bad:
- Requires internet. No network, no memories. Not usable offline.
- Data in the cloud. Your memories (short factual sentences, not source code) are stored on HiFriendbot's servers. If that's a dealbreaker, go local.
- Newer project. Smaller community than the most popular local solutions. Fewer people have battle-tested it.
- Paid tiers for heavy use. Free tier is generous (1,000 memories), but Pro is $14.99/mo for 2,000 memories. Best for: Developers who want zero-config setup, cross-device sync, and compaction recovery without managing local infrastructure.
Verdict: The trade-off is cloud dependency for zero maintenance. If you're comfortable with that, it's the fastest path to persistent memory.
5. Roll Your Own
What it is: Build a custom memory system tailored to your exact needs. Popular approaches include markdown file collections, SQLite databases with FTS5, or even Neo4j knowledge graphs.
Setup: However long it takes you to build it.
Common approaches:
-
Markdown files + grep. Maintain a
/memory/directory with topic-based markdown files. Simple, version-controlled, human-readable. No semantic search. - SQLite + FTS5. Store memories in SQLite with full-text search. Good for keyword matching, misses semantic similarity.
- Custom MCP server. Build an MCP server that wraps your preferred storage backend. Full control, full responsibility.
- Obsidian vault. Some developers use Obsidian's knowledge graph as a project memory, connected via MCP servers like easy-obsidian-mcp.
The good:
- Complete control over storage, format, and retrieval
- No vendor dependency
- Can be exactly what you need and nothing more
- Educational — you learn how memory systems work
The bad:
- Time investment. Building a good memory system is a project in itself. Semantic search alone requires embedding models, vector storage, and retrieval logic.
- Maintenance burden. You own every bug, every upgrade, every edge case.
- No Ai extraction. Unless you build it, you're manually deciding what to remember.
- No compaction recovery. You'd need to build that system yourself.
Best for: Developers with specific requirements that no existing tool meets, or those who want to learn by building.
Verdict: Maximum flexibility, maximum effort. Only worth it if the existing tools genuinely don't fit.
The Comparison Table
| Feature | CLAUDE.md | Local Vector DB | Other MCP Servers | CogmemAi | DIY |
|---|---|---|---|---|---|
| Setup time | 0 min | ~5 min | ~15 min | ~1 min | Hours/days |
| Local dependencies | None | Multiple (databases, runtimes) | Multiple (Python, embeddings) | None | Varies |
| Semantic search | No | Yes (local) | Yes (local) | Yes (cloud) | If you build it |
| Ai extraction | No | Session summaries | No | Yes | If you build it |
| Compaction recovery | No | Partial | No | Yes | If you build it |
| Cross-device sync | No | No | No | Yes | If you build it |
| Works offline | Yes | Yes | Yes | No | Varies |
| RAM usage | None | Significant | Moderate | None | Varies |
| Memory capacity | 200 lines | Unlimited (local disk) | Unlimited (local disk) | 1,000 free / 50K enterprise | Unlimited |
| Project scoping | Per-directory | Per-project | Tags | Git remote + global | If you build it |
| Cost | Free | Free | Free | Free / $14.99+ | Your time |
My Recommendation
There's no universally "best" option. It depends on what you value:
-
"I don't want to install anything." → Stick with CLAUDE.md. Maximize those 200 lines. Use
.claude/rules/*.mdfor topic-scoped instructions. - "I want the most proven solution." → Local vector database solutions. Huge community, active development. Accept the resource trade-off.
- "I want zero maintenance." → CogmemAi. One command, nothing local to break, memories follow you across machines.
- "I need knowledge graphs." → Other MCP memory servers, but test the current version first.
- "I have specific requirements." → Roll your own. Start with SQLite + FTS5 and add complexity as needed.
The worst option is no memory at all. If you're still re-explaining your architecture every session, pick any solution from this list and set it up today. The 5–15 minutes of setup will save you hours every week.
I'm Scott, a network and systems engineer with 30+ years in the industry. I built CogmemAi after testing every approach on this list and wanting something with zero local infrastructure. Try whichever fits your workflow — the important thing is to stop losing context.
Top comments (0)