Luís Monteiro

Posted on Apr 26

CLAUDE.md is not enough: why I built a local-first memory MCP for Claude Code

#ai #mcp #claude #opensource

I use Claude Code a lot.

One thing kept annoying me: not the mistakes, not the occasional wrong assumption, not even the weird confidence.

The annoying part was having to re-explain the same project context in new sessions.

Things like:

this module looks legacy but still supports a critical flow
this query already caused performance issues
this test failed before because the hook returned formatted text, not an array
this architecture decision looks strange, but it exists for a reason
this project separates bugs from improvements in release notes
do not touch this config unless you understand the install flow

Some of that belongs in CLAUDE.md.

But not all of it.

The problem with putting everything in CLAUDE.md

CLAUDE.md is great for stable project instructions:

how to run the project
how to run tests
coding conventions
architecture guidelines
commands the agent should know
repo-specific workflows

That kind of context is stable and broadly useful.

The problem starts when CLAUDE.md becomes the place for every pitfall, debugging note, warning, workaround, decision, and preference.

At that point, it stops being onboarding context and becomes a giant context dump.

That creates two problems.

First, every new session pays for it in tokens, even when the current task only needs one small detail.

Second, the more context you throw in, the easier it is for the important bit to get ignored.

More context is not always better context.

Sometimes it is just a bigger haystack with the same needle inside it.

The split that made more sense to me

I started thinking about project context as two different things:

CLAUDE.md = stable onboarding instructions
working memory = retrieved project-specific notes

CLAUDE.md should explain how the project works.

Working memory should remember what happened while working on the project.

That includes things like:

decisions
facts
patterns
pitfalls
architecture notes
project preferences
session summaries

The key difference is that working memory should not be dumped into every prompt.

It should be searched, ranked, and injected only when relevant.

So I built Memento MCP

Memento MCP is a local-first MCP server that gives Claude Code and other stdio-MCP clients persistent project memory.

The basic idea is simple:

Store useful project knowledge as typed memories.
Search and rank memories for the current task.
Inject only the relevant memory into the agent.
Avoid turning every new session into a giant repeated context paste.

Default setup:

local SQLite
FTS5 search
no mandatory cloud account
no hosted vector DB

It also supports:

optional embeddings
team memory sync through git
Obsidian vault indexing
privacy controls
local web inspector

Example

Instead of adding this to CLAUDE.md forever:

The scheduling module looks legacy but still supports a critical production flow.
Do not rewrite it casually.
The pagination query caused performance issues before.
The release notes must separate bugs from improvements.

Those notes can live as typed memories.

Then, when the agent is working on the scheduling module or release notes, the relevant memory is retrieved.

When the agent is working on something unrelated, that context stays out of the prompt.

That is the part I care about most: reducing repeated context without losing important project knowledge.

GitHub:

https://github.com/lfrmonteiro99/memento-mcp

Docs:

https://lfrmonteiro99.github.io/memento-mcp

What I want feedback on

I am mainly trying to validate the workflow.

The questions I care about:

Does the CLAUDE.md vs working memory split make sense?
Would you trust an MCP server to inject memory into Claude Code?
What kind of project memory would you actually want an agent to remember?
What would make this annoying, unsafe, or too noisy?
Should memory be mostly explicit/manual, or should the agent be allowed to suggest memories automatically?

I built this because I got tired of re-explaining the same project context over and over again.

Not because agents need more magic.

They mostly need better memory, fewer repeated instructions, and less context shoved into every prompt like we are packing for the apocalypse.