Scott Crawford

Posted on Feb 24 • Originally published at hifriendbot.com

5 Ways to Add Memory to Claude Code (Compared)

#claudecode #ai #mcp #devtools

If you use Claude Code for anything beyond one-off scripts, you've hit the memory wall. Every session starts from zero. Context compaction destroys your working state. MEMORY.md caps out at 200 lines.

The good news: the community has built real solutions. The bad news: there are enough options that choosing one is its own time sink.

I've tested all of the major approaches. Here's an honest comparison — what works, what breaks, and which one fits your situation.

1. CLAUDE.md + MEMORY.md (Built-In)

What it is: Claude Code's native memory system. CLAUDE.md files hold project instructions; MEMORY.md (in ~/.claude/projects/) stores notes Claude writes to itself. Both load into the system prompt at session start.

Setup: Nothing. It's already there.

How it works:

CLAUDE.md at your project root — team-shared instructions, conventions, architecture notes
CLAUDE.local.md — personal notes, auto-gitignored
~/.claude/CLAUDE.md — global preferences across all projects
MEMORY.md — auto-generated by Claude, loaded at session start

The good:

Zero setup, zero dependencies, zero cost
Works offline
CLAUDE.md is version-controlled with your project
Simple enough to understand in 5 minutes

The bad:

MEMORY.md has a hard 200-line cap. Content beyond line 200 is silently dropped. No warning. (Issue #25006)
No search. Claude reads the entire file every session. With 200 lines of notes, it has no way to find specific context by meaning.
Post-compaction amnesia. Multiple bug reports document Claude ignoring CLAUDE.md after context compaction. (Issue #4017, 20 upvotes)
No automatic extraction. Claude has to decide what to write down. Important context slips through constantly.
No cross-device sync. Each machine has its own disconnected MEMORY.md.
Hidden token cost. Every message re-sends the full CLAUDE.md. One developer found cache reads consuming 99.93% of total token usage.

Best for: Small projects, quick tasks, developers who don't want to install anything.

Verdict: Fine for getting started. Inadequate for serious, multi-session development.

2. Local Vector Database Solutions (~29,700 GitHub Stars)

What it is: The most popular third-party memory approach. Automatically captures session context, compresses it with Ai, and stores it in a local database with vector search.

Setup: Typically a single init command, but requires local dependencies.

How it works:

Hooks into Claude Code's session lifecycle
Captures conversation context and compresses it into summaries
Stores summaries in local databases with vector embeddings
Injects relevant context at session start

The good:

Massive community — tens of thousands of stars, actively maintained
Battle-tested across thousands of developers
Open source
Session summaries are automatic
Vector search finds relevant context

The bad:

Local dependencies. Requires multiple runtimes and databases running on your machine.
Resource consumption. Significant memory usage reported during long sessions, due to in-process vector stores.
No cross-device sync. Your memories live on one machine. Work from a laptop and desktop? Two separate memory stores.
Session-level granularity. Captures session summaries, not individual facts. You can't search for a specific architecture decision — you search for sessions that might have mentioned it.

Best for: Developers who want the most popular, proven approach and work from a single machine with plenty of RAM.

Verdict: The community standard. Solid choice if you don't mind local resource usage and single-machine limitations.

3. Other MCP Memory Servers (~1,200 GitHub Stars)

What it is: MCP servers providing persistent memory with knowledge graph features, semantic search, and autonomous memory consolidation.

Setup: Typically requires Python and multiple dependencies, with several configuration steps.

How it works:

Runs as an MCP server alongside Claude Code
Stores memories locally with vector embeddings
Provides tools for saving, searching, and managing memories
Includes knowledge graph relationships between memories
Autonomous consolidation merges related memories over time

The good:

Knowledge graph structure adds relationship context
Semantic search with local embeddings
Autonomous consolidation reduces memory bloat
MCP-native — works through the standard protocol

The bad:

Complex setup. Requires multiple local dependencies and configuration steps.
Stability varies. Some projects have experienced significant version churn and reliability issues.
Local-only. Same single-machine limitation as local vector database solutions.
Smaller community. Fewer people testing edge cases compared to the most popular solutions.
Heavy dependencies. Embedding model downloads can be hundreds of megabytes and may fail on some platforms.

Best for: Developers who want knowledge graph features and don't mind a more complex setup process.

Verdict: Ambitious architecture, but stability varies across projects. Test thoroughly before committing.

4. CogmemAi (Cloud-Based)

What it is: A cloud-first MCP server that moves all memory intelligence server-side. The local MCP server is a thin HTTP client — no databases, no vector stores, no heavy dependencies. Full disclosure: I built this one.

Setup:

npx cogmemai-mcp setup

How it works:

18 MCP tools: save, recall (semantic search), extract (Ai-powered), project context, analytics, knowledge graph, import/export, and more
Memories stored with high-dimensional semantic embeddings server-side
Ai extraction identifies important facts from conversations automatically
Smart deduplication detects duplicate and conflicting memories
Automatic project scoping + global preferences that follow you everywhere
Automatic compaction recovery — context is preserved before compaction and seamlessly restored afterward

The good:

Zero local setup. No databases, no Python, no Docker, no vector stores. One command.
Zero RAM issues. Nothing running locally except a thin HTTP client.
Cross-device sync. Memories are in the cloud. Work from any machine.
Compaction recovery. Automatically saves context before compaction and restores it after.
Semantic search. Find memories by meaning, not keywords.
Ai extraction. Automatically identifies facts worth remembering.
Document ingestion. Feed in READMEs or docs to quickly build project context.
Free tier: 1,000 memories, 500 extractions/month, 5 projects.

The bad:

Requires internet. No network, no memories. Not usable offline.
Data in the cloud. Your memories (short factual sentences, not source code) are stored on HiFriendbot's servers. If that's a dealbreaker, go local.
Newer project. Smaller community than the most popular local solutions. Fewer people have battle-tested it.
Paid tiers for heavy use. Free tier is generous (1,000 memories), but Pro is $14.99/mo for 2,000 memories. Best for: Developers who want zero-config setup, cross-device sync, and compaction recovery without managing local infrastructure.

Verdict: The trade-off is cloud dependency for zero maintenance. If you're comfortable with that, it's the fastest path to persistent memory.

5. Roll Your Own

What it is: Build a custom memory system tailored to your exact needs. Popular approaches include markdown file collections, SQLite databases with FTS5, or even Neo4j knowledge graphs.

Setup: However long it takes you to build it.

Common approaches:

Markdown files + grep. Maintain a /memory/ directory with topic-based markdown files. Simple, version-controlled, human-readable. No semantic search.
SQLite + FTS5. Store memories in SQLite with full-text search. Good for keyword matching, misses semantic similarity.
Custom MCP server. Build an MCP server that wraps your preferred storage backend. Full control, full responsibility.
Obsidian vault. Some developers use Obsidian's knowledge graph as a project memory, connected via MCP servers like easy-obsidian-mcp.

The good:

Complete control over storage, format, and retrieval
No vendor dependency
Can be exactly what you need and nothing more
Educational — you learn how memory systems work

The bad:

Time investment. Building a good memory system is a project in itself. Semantic search alone requires embedding models, vector storage, and retrieval logic.
Maintenance burden. You own every bug, every upgrade, every edge case.
No Ai extraction. Unless you build it, you're manually deciding what to remember.
No compaction recovery. You'd need to build that system yourself.

Best for: Developers with specific requirements that no existing tool meets, or those who want to learn by building.

Verdict: Maximum flexibility, maximum effort. Only worth it if the existing tools genuinely don't fit.

The Comparison Table

Feature	CLAUDE.md	Local Vector DB	Other MCP Servers	CogmemAi	DIY
Setup time	0 min	~5 min	~15 min	~1 min	Hours/days
Local dependencies	None	Multiple (databases, runtimes)	Multiple (Python, embeddings)	None	Varies
Semantic search	No	Yes (local)	Yes (local)	Yes (cloud)	If you build it
Ai extraction	No	Session summaries	No	Yes	If you build it
Compaction recovery	No	Partial	No	Yes	If you build it
Cross-device sync	No	No	No	Yes	If you build it
Works offline	Yes	Yes	Yes	No	Varies
RAM usage	None	Significant	Moderate	None	Varies
Memory capacity	200 lines	Unlimited (local disk)	Unlimited (local disk)	1,000 free / 50K enterprise	Unlimited
Project scoping	Per-directory	Per-project	Tags	Git remote + global	If you build it
Cost	Free	Free	Free	Free / $14.99+	Your time

My Recommendation

There's no universally "best" option. It depends on what you value:

"I don't want to install anything." → Stick with CLAUDE.md. Maximize those 200 lines. Use .claude/rules/*.md for topic-scoped instructions.
"I want the most proven solution." → Local vector database solutions. Huge community, active development. Accept the resource trade-off.
"I want zero maintenance." → CogmemAi. One command, nothing local to break, memories follow you across machines.
"I need knowledge graphs." → Other MCP memory servers, but test the current version first.
"I have specific requirements." → Roll your own. Start with SQLite + FTS5 and add complexity as needed.

The worst option is no memory at all. If you're still re-explaining your architecture every session, pick any solution from this list and set it up today. The 5–15 minutes of setup will save you hours every week.

I'm Scott, a network and systems engineer with 30+ years in the industry. I built CogmemAi after testing every approach on this list and wanting something with zero local infrastructure. Try whichever fits your workflow — the important thing is to stop losing context.