Andrey

Posted on Mar 26 • Edited on Apr 15 • Originally published at glivera.com

When CLAUDE.md Stops Working: Adding Vector Memory to Claude Code

#ai #rag #mcp #llm

Subtitle: How I replaced static markdown with semantic search after managing 180+ production workflows

TL;DR

CLAUDE.md is documentation injected into the context window at session start. It tells Claude what to do, but can't remember what Claude has done.
At scale (12+ client projects, hundreds of decisions), static markdown becomes a bottleneck: bloated files, irrelevant context, no cross-project knowledge sharing.
Vector memory via MCP server (claude-memory-mcp) stores decisions, bug fixes, and patterns in Supabase + pgvector, then surfaces only what's relevant through semantic search.
These tools complement each other. CLAUDE.md holds the rules. Vector memory holds the history.

You've explained the same architecture decision three times this week. Your CLAUDE.md is 400 lines long and Claude still asks about things you documented in month one. You added a subdirectory rules file. Then another. The file is a mess and it's only getting worse.

I hit this wall managing 180+ production n8n workflows across 12 client projects. CLAUDE.md worked fine for the first month. By month three it was a liability. I was spending more time maintaining the file than building. So I built something different.

This is what I learned.

How Does CLAUDE.md Actually Work?

CLAUDE.md is injected into Claude's system prompt at the start of every session. That's it. No intelligence, no filtering, no prioritization. Claude reads the whole file, every time, regardless of what you're working on.

The hierarchy works like this:

~/.claude/CLAUDE.md          ← global rules, loaded first
/project-root/CLAUDE.md      ← project rules, loaded second
/project-root/.claude/       ← subdirectory rules, loaded on demand

As of Claude Code v2.1.59+, there's also auto-memory. Claude can save notes to itself in ~/.claude/projects/<project>/memory/. These are still markdown files. They still load at session start. And MEMORY.md has a hard practical limit: the first 200 lines load, everything after that is silently ignored.

Topic files (like debugging.md or patterns.md in your .claude/ folder) are different. They're NOT loaded at startup. Claude reads them on demand when they seem relevant. That's actually useful, but it requires you to maintain those files manually.

Here's the key thing most people miss: shorter CLAUDE.md files produce better adherence. The docs are explicit about this. When you cram 400 lines of conventions into a single file, Claude doesn't prioritize or filter. It just gets overwhelmed and starts ignoring things.

Context Window
┌─────────────────────────────────────────┐
│ System prompt                           │
│ + ~/.claude/CLAUDE.md (full file)       │
│ + /project/CLAUDE.md (full file)        │
│ + MEMORY.md (first 200 lines only)      │
│ + Your current message                  │
│ + Conversation history                  │
└─────────────────────────────────────────┘

Everything loads at once. There's no mechanism to say "I'm working on authentication right now, skip the deployment section." The whole file goes in, every session.

This is documentation, not memory. That distinction matters more than it sounds.

Where Does CLAUDE.md Break Down at Scale?

For a single project with one developer, CLAUDE.md is fine. For 12 client projects with hundreds of accumulated decisions, it starts creating problems faster than it solves them.

The scale wall. Each client has unique API quirks, deployment patterns, naming conventions, infrastructure choices. After three months of active development, there are hundreds of important facts per project. You can't fit that into 200 lines. You can't even curate it fast enough. The file either becomes a bloated mess you stop trusting, or you start dropping things and lose institutional knowledge.

Cross-project knowledge. This one is painful. I fixed a Supabase edge case in a client project, documented it in that project's CLAUDE.md, and then hit the exact same issue two weeks later in a different project. CLAUDE.md is directory-scoped. There's no mechanism to say "check what we learned in the other project." I copy-pasted the fix manually. That's not a system, that's a workaround.

The relevance problem. Working on Stripe webhook handling. I don't need the CSS naming conventions or the Docker deployment checklist. But they're in the file, consuming context tokens, adding noise. CLAUDE.md loads everything every time. There's no relevance filtering.

Gets worse when you're near context limits on a complex task. Those irrelevant tokens aren't free.

Staleness. No timestamps. No versioning. A convention from week one might directly contradict a decision from week eight, and there's no signal which one is current. Both lines sit in the file with equal weight. Claude does its best, but conflicting instructions accumulate silently.

No semantic search. "What did we decide about error handling in the payment flow?" You can't ask that. It's Ctrl+F at best. If you didn't use the exact keyword, you're scanning manually. This is the one that finally broke me: I knew I had documented something important, I just couldn't find it.

None of these are bugs in Claude or failures of the CLAUDE.md format. It's a static file. It does what static files do. The problem is that real projects aren't static.

What Does Vector Memory Architecture Look Like?

The core idea: instead of loading everything into context at session start, store memories in a vector database and retrieve only what's semantically relevant to the current task.

I built an open-source MCP server called claude-memory-mcp that does this using Supabase + pgvector. Here's the architecture:

Claude Code
    |
    v
MCP Server (Express + MCP SDK)
    |
    |-- remember   --> OpenAI embed --> Supabase insert
    |-- recall     --> OpenAI embed --> Supabase vector search
    |-- forget     --> Supabase soft-delete (sets expires_at)
    `-- project_status --> stats query

Embeddings: text-embedding-3-small

I use OpenAI's text-embedding-3-small model. 1536 dimensions, good quality-to-cost ratio. For the volume of memories in a typical consulting project, the monthly cost is around $0.50, mostly from Supabase's free tier. That's not a typo.

Storage: Supabase + pgvector

Why not SQLite? Three reasons. First, I already run Supabase as the data layer for 180+ n8n workflows, so it's existing infrastructure with existing backups. Second, SQLite is machine-local. If I'm working from a different machine or a client needs access, SQLite breaks the model. Third, Supabase is production-grade. I don't want to debug a corrupted SQLite file at 11pm.

If your stack is different, SQLite or ChromaDB would work fine. The principle is the same.

Transport: Streamable HTTP on port 3101

Standard HTTP transport means multiple Claude Code sessions can connect to the same server simultaneously. You're not locked to a single terminal window. This matters when you're context-switching between projects.

Memory types

Memories aren't free text. There are 8 structured categories:

Type	What it stores
`decision`	Architecture or design choices made
`bug_fix`	Bugs found and resolved
`pattern`	Code patterns that work well
`context`	Session summaries, ongoing state
`blocker`	Things that blocked progress
`learning`	New discoveries about tools/APIs
`convention`	Project-specific style rules
`dependency`	Library choices and version notes

Forcing structure here was a deliberate choice. Free-text memories are hard to recall precisely. Structured types let you filter queries: "recall decisions about authentication" returns a different result set than a generic search.

Token cap on recall

Recall responses are capped at 2,000 tokens. This prevents the same problem that breaks CLAUDE.md: dumping too much into context. Semantic search surfaces the most relevant results, the token cap keeps it manageable.

Soft delete

The forget tool never hard-deletes. It sets expires_at. If Claude forgets something it shouldn't have, you can recover it. This has saved me twice.

The four tools Claude has access to:

remember(title, content, memory_type, project_id?, tags?)
  → Embeds and stores a memory

recall(query, project_id?, memory_type?, limit?)
  → Semantic search, returns top matches within token cap

forget(memory_id)
  → Soft-delete by ID

project_status(project_id?)
  → Returns memory count by type, recent activity

Simple interface. Claude decides when to call these based on instructions in CLAUDE.md (more on that below).

How Does a Typical Session Work With Vector Memory?

The workflow has three phases. It sounds like overhead. In practice it takes about 30 seconds.

Session start

First thing: project_status to see what's been stored. Then recall with the current task as the query. Sometimes I also run a cross-project recall (no project_id filter) to catch relevant knowledge from other clients.

Concrete example: I'm implementing Stripe webhook handling for a new client. Claude recalls that in a previous project, I discovered Stripe sends duplicate webhook events under load, and the fix was idempotency keys stored in a PostgreSQL table with a unique constraint on the event ID. That knowledge surfaces automatically through vector similarity. I never wrote it in any CLAUDE.md. It was stored as a bug_fix memory from the original project.

That's the moment the architecture clicked for me.

During work

Claude calls remember after significant decisions, bug fixes, or pattern discoveries. I put instructions for this in CLAUDE.md:

## Memory Instructions

At session start:
1. Call project_status to check memory state
2. Call recall with current task description
3. Check for cross-project patterns (recall without project_id)

During work, call remember when:
- You make an architecture decision
- You fix a non-obvious bug
- You discover a pattern worth reusing
- We hit a blocker and resolve it

At session end:
- Save a context summary (memory_type: "context", tags: ["session-summary"])

Claude is good at following these instructions. Probably because they're short and specific, not 400 lines of mixed conventions.

Session end

A brief context summary with memory_type: "context" and a session-summary tag. Next session starts with a recall of recent session summaries. Continuity without me re-explaining anything.

CLAUDE.md vs Vector Memory: Which Should You Use?

Both. That's the actual answer.

Dimension	CLAUDE.md	Vector Memory MCP
Storage	Flat markdown files	PostgreSQL + pgvector
Search	Full load into context	Semantic similarity search
Capacity	~200 lines effective (MEMORY.md)	Thousands of memories
Cross-project	No (directory-scoped)	Yes (omit project_id filter)
Cost	Zero	~$0.50/month
Latency	Zero	200-500ms per recall
Setup	Create a file	Docker + Supabase + OpenAI key
Relevance filtering	None, all or nothing	Similarity threshold + token cap
Staleness handling	Manual review	Timestamps + soft-delete + TTL

These aren't competing tools. CLAUDE.md is the project's constitution: immutable rules, non-negotiable conventions, things that should always be true. Vector memory is institutional knowledge: what was learned, decided, discovered, fixed.

If you only use CLAUDE.md, you hit the scale wall. If you only use vector memory, Claude has no stable rules to work from. The combination is what makes this work.

CLAUDE.md should be short, stable, and authoritative. Vector memory should be long, growing, and searchable. They do different jobs.

What Other Memory Solutions Exist?

I'm not the only person who hit this problem. There are 30+ memory MCP servers listed on PulseMCP as of mid-2025. A few worth knowing:

Anthropic's Memory Tool is built into the Claude API. It's client-side file storage, not server-side vector search. Simpler to set up, less powerful for semantic retrieval.

memsearch ccplugin takes a hooks-based approach, no MCP overhead. Interesting architecture for lower-latency use cases.

Mem0 offers hosted and self-hosted options with a graph memory variant. More infrastructure to manage, but graph relationships between memories could be useful for complex knowledge bases.

episodic-memory uses SQLite with conversation archives. Good if you want everything local and don't need cross-machine access.

I chose Supabase because it's already in my stack. 180+ n8n workflows already depend on it. Adding memory storage was one database table, not a new infrastructure component. If you're running everything locally, SQLite-based options make more sense. If you're on a different cloud provider, the vector database choice might change. The principle, storing structured memories and retrieving by semantic similarity, stays the same regardless of backend.

How Do You Set Up Vector Memory in 15 Minutes?

This is the quick path. Full setup docs are in the claude-memory-mcp GitHub repo.

Prerequisites

Docker running locally
Supabase project (free tier is fine)
OpenAI API key (for embeddings)

Step 1: Clone and configure

git clone https://github.com/[your-username]/claude-memory-mcp
cd claude-memory-mcp
cp .env.example .env

Edit .env with your Supabase URL, Supabase anon key, and OpenAI API key.

Step 2: Start the server

docker compose up -d

Server runs on port 3101. Supabase migrations run automatically on first start.

Step 3: Add to Claude Code MCP config

Edit ~/.claude/mcp.json:

{
  "mcpServers": {
    "memory": {
      "type": "http",
      "url": "http://localhost:3101/mcp"
    }
  }
}

Step 4: Add memory instructions to CLAUDE.md

Paste the memory instructions block from Section 4 above into your global ~/.claude/CLAUDE.md.

Step 5: Verify

Start a Claude Code session and ask: "What MCP tools do you have?" Claude should list remember, recall, forget, and project_status.

First real test: tell Claude something specific about your current project, then start a new session and ask it to recall what it knows. If it surfaces what you stored, the setup is working.

Full installation guide, SQL schema, and troubleshooting in the GitHub repo.

Six months ago I was re-explaining my architecture every Monday morning. Context from last week's session, gone. Decisions from month two, gone. I was the memory. That's not scalable.

Now I start a session, Claude recalls what matters, and we pick up where we left off. CLAUDE.md still lives at the root of every project. It holds the rules. But the memory of what we've built together lives in vectors.

The repo is open source. Star it, fork it, or tell me what I should add next.

FAQ

Does vector memory replace CLAUDE.md?

No. They serve different purposes. CLAUDE.md holds stable rules and project conventions that should always apply. Vector memory stores accumulated knowledge: decisions made, bugs fixed, patterns discovered. Use both together for the best results.

How much does it cost to run claude-memory-mcp?

Approximately $0.50 per month for a typical consulting workload. Supabase free tier covers the database storage, and OpenAI's text-embedding-3-small is cheap enough that embedding costs stay minimal unless you're storing thousands of memories per day.

Can multiple Claude Code sessions share the same memory?

Yes. The server uses streamable HTTP transport on port 3101, so multiple sessions connect to the same server simultaneously. This also means memories stored in one terminal window are immediately available in another.

What happens if OpenAI embeddings are unavailable?

The server degrades gracefully. Memories can still be stored and retrieved by project or type. Semantic similarity search requires embeddings, so recall quality drops, but the system doesn't break.

Is my code or project data sent to OpenAI?

Only the content of what you explicitly store as memories, the title and description you pass to remember. Your codebase, file contents, and conversation history are not transmitted. You control exactly what gets embedded.

Can I use this with Cursor or other MCP-compatible clients?

Yes. The server implements the standard MCP protocol. Any client that supports MCP over HTTP can connect to it, including Cursor and other editors adding MCP support.

How is this different from Anthropic's built-in Memory Tool?

Anthropic's Memory Tool is client-side file storage, similar in concept to CLAUDE.md's auto-memory feature. claude-memory-mcp is a server-side vector database with semantic search. The practical difference: built-in memory loads everything at session start (same limitation as CLAUDE.md), while vector memory retrieves only what's semantically relevant to your current task.

DEV Community