iCe Gaming

Posted on May 5

What My Coding Agent Gains From Local Memory

#ai #rust #mcp #productivity

Every coding agent session starts with a strange contradiction.

The model is capable. It can read code, reason through architecture, write tests, explain tradeoffs, and debug with patience. But the session itself is often forgetful.

It does not remember why we chose SQLite over Postgres for a prototype. It does not remember that the migration layer had a weird compatibility constraint. It does not remember the conversation where we decided not to expose a public API yet. It does not remember the repeated instruction I keep typing into every project: "check the existing pattern first."

So the human becomes the memory layer.

We paste context. We summarize old threads. We say, "We already tried that." We ask the agent to search the repo again. We re-explain preferences. We lose twenty minutes to rediscovery before the actual work begins.

That is the problem local agent memory is meant to solve.

Not "AI consciousness." Not magic. Just durable, queryable context that lives near your tools.

The Shape Of The Problem

Modern coding assistants are increasingly agentic. Claude Code, Cursor, Codex, and similar tools can inspect a project, call tools, edit files, run tests, and continue over multiple steps.

But most agent workflows are still bounded by a session.

A chat thread has context, but only while it is alive and only within that tool. A repository has source code, but not the decisions that led to it. A README has documentation, but not every debugging path, rejected approach, user preference, or handoff note.

The most expensive thing is not always tokens. It is human time spent reconstructing context.

Examples:

"Where did we decide how migrations work?"
"What did the last agent already try?"
"Which files did we inspect when debugging this?"
"What was the user's preference for this project?"
"Did we already index this transcript?"
"Which past session contains the actual reason for this architecture?"

Without memory, the answer is usually: search again, ask again, paste again.

A Local Memory MCP

The useful version of memory, for me, is not a giant prompt stuffed into every request.

It is a local service the agent can query when needed.

That is the idea behind MemPalace: a local-first memory MCP for agentic apps.
MemPalace started from an earlier Python version of the idea. This Rust version is my attempt to make the same local-memory pattern faster, easier to install, and smoother to integrate with everyday agentic coding tools.
It stores memories in a local SQLite database, embeds text locally, and exposes tools through the Model Context Protocol so clients like Claude Code, Cursor, and Codex can retrieve relevant context on demand.

The important bit is "on demand."

Instead of carrying a whole project history in every prompt, the agent can ask a narrower question:

Search memory for how this project handles database migrations.

or:

Query known facts about this project's release process.

That changes memory from "more prompt" into "local infrastructure."

What It Stores

MemPalace stores verbatim text rather than forcing everything through a summary first.

That matters because summaries are useful, but lossy. They often flatten the exact sentence that later becomes important: the exception, the caveat, the "we tried this and it failed because..."

The storage model is intentionally boring:

One local SQLite database at ~/.mempalace/palace.db
Drawers for text content, embeddings, and metadata
Entity and triple tables for knowledge graph facts
Local embeddings using ONNX Runtime and all-MiniLM-L6-v2
Local cosine similarity and hybrid retrieval
No API call required for embedding or search

That boringness is a feature. Local memory should not need a cloud account just to remember your own project context.

Why MCP Matters

MCP makes the same memory usable from multiple agentic apps.

A memory layer is less useful if it only belongs to one editor or one chat product. The interesting workflow is when Claude Code, Cursor, and Codex can all consult the same local memory.

MemPalace installs an MCP server and a small rule for each supported client. The rule tells the agent when to call memory tools, for example before answering about a person, project, past decision, or user preference.

The common tools include:

mempalace_search for semantic retrieval
mempalace_kg_query for entity facts
mempalace_diary_write for end-of-session notes
mempalace_gain for local usage/value reporting

That means memory becomes part of the agent's normal tool loop, not a separate app I need to babysit.

The Value Points

1. Less Repeated Context Stuffing

A lot of agent work begins with pasting the same background again:

project goals
architecture constraints
coding preferences
known broken paths
previous debugging attempts
"please follow these local rules"

That context costs tokens every time. More importantly, it costs attention.

A memory MCP lets the agent retrieve the relevant piece instead of asking the user to paste the whole story.

This does not eliminate context windows. It uses them better.

2. Less Rediscovery Time

The biggest gain is often time.

If an agent spends ten minutes re-reading files to rediscover a decision that was already made last week, that cost is invisible in a token bill but very visible in a workday.

Memory helps with questions like:

mempalace search "why did we choose the current migration approach"

or through MCP:

Search memory for the previous discussion about auth token storage.

The value is not just faster answers. It is fewer repeated loops.

3. Better Cross-Session Continuity

Agents are good at local reasoning inside one session. They are worse at carrying a project's lived history across sessions.

A memory diary changes that.

At the end of substantive work, the agent can write a short durable note: what changed, what was learned, what still matters. The next agent can read those diary entries instead of starting cold.

This is especially useful when switching between tools. I may start in Cursor, continue in Codex, and later ask Claude Code to inspect something. The memory is not trapped in one conversation.

4. Project Decisions Become Searchable

Source code shows what exists. It does not always show why.

A knowledge graph can store temporal facts like:

"Project A uses SQLite as of 2026-05-05"
"Feature B was deferred because the public API was unstable"
"User prefers conservative public API changes"
"Migration strategy changed after release X"

The temporal part matters. Facts change. Memory needs invalidation, not just accumulation.

5. Local-First Privacy

For many coding projects, the memory layer should be local by default.

MemPalace keeps the database on disk and runs embeddings locally. That makes it suitable for project notes, private conversations, and internal decisions that you do not want to send to another service just to make them searchable.

Local-first does not mean "never sync anything anywhere." It means the default memory substrate is yours.

6. Duplicate Detection

A practical memory system has to avoid becoming a junk drawer.

If the same file, transcript, or note gets indexed repeatedly, memory becomes noisy and wasteful. Duplicate detection helps skip content that is already stored.

That matters for both quality and measurement: fewer repeated drawers, less re-indexing, less clutter for retrieval.

7. Measuring The Gain

One thing I like about this approach is that the memory layer can measure its own usefulness.

MemPalace has a gain command and MCP tool that summarize local usage:

mempalace gain

It can report things like:

tool calls
search hit rate
estimated tokens saved
duplicate drawers skipped
knowledge graph facts recalled
diary recalls
repeat questions avoided
p95 latency
top projects or memory wings

The token number is an estimate, not a universal truth. But even an estimate is useful because it turns "this feels helpful" into something you can inspect over time.

A sample report might look like:

MemPalace gain - last 30d (mempalace_rs)
  Tool calls         : 412   (sessions: 27)
  Hit rate           : 88%   (search hits 142/162)
  Tokens saved (est) : ~78,400
  Re-index skipped   : 31    (duplicate drawers avoided)
  KG facts recalled  : 56
  Diary recalls      : 8
  Repeat Qs avoided  : 19
  p95 latency        : 41 ms

The time value is harder to price, but more important. If memory avoids even a few repeated 10-minute rediscovery loops per week, the gain is larger than the token number suggests.

Retrieval Quality Matters

Memory is only useful if retrieval works.

In MemPalace's benchmark on the LongMemEval s_cleaned split, retrieval was tested over conversational haystacks of roughly 50 sessions and about 115k tokens each.

The documented session-level recall numbers are:

R@1: 0.889
R@5: 0.981
R@10: 0.991

That means the right session appeared in the top 5 results for 461 of 470 evaluated questions.

Those numbers are not claiming that every agent answer will be correct. They measure retrieval, not final generation. But they do show that a local retriever can make a large conversation history practically searchable without an LLM reranker, answer generator, or cloud embedding service.

The weak spots are also useful to know. Preference-style questions are harder because "What is my favorite X?" may need to match a sentence like "I usually reach for Y." That is a different retrieval problem than matching explicit project facts.

This is another reason I like measurement. It shows where memory helps and where the system still needs better signals.

A Typical Workflow

The workflow is simple:

mempalace install
mempalace doctor
mempalace init ~/my-project
mempalace mine ~/my-project

Then restart the agentic app so it loads the MCP config.

From there, the agent can use memory through MCP, or you can query it directly:

mempalace search "how did we decide on the database schema"
mempalace gain

For conversation history, you can ingest exported transcripts. For project structure, you can mine files. For ongoing work, the agent can write diary entries after meaningful sessions.

The pattern is:

Store useful project and conversation context.
Let the agent retrieve only what is relevant.
Write down what changed after substantial work.
Measure whether the memory is actually helping.

What This Is Not

This is not a replacement for documentation.

If something belongs in a README, architecture decision record, or public API doc, it should still go there.

Local memory is for the layer below formal documentation: the working memory of a project. The half-decisions, preferences, prior attempts, debugging notes, and connective tissue that usually live only in chat history or someone's head.

It is also not an excuse to trust retrieved context blindly. Agents still need to inspect current files, run tests, and verify assumptions. Memory should guide attention, not replace evidence.

The Real Gain

The obvious value is token savings.

Retrieving a few relevant drawers is cheaper than pasting thousands of tokens of background into every session. A local gain report can estimate that.

But the deeper value is time.

Time not spent re-explaining.
Time not spent searching old threads.
Time not spent watching an agent rediscover the same failed path.
Time not spent translating project history from human memory back into machine context.

That is the part that compounds.

A local memory MCP does not make an agent magically wise. It makes the working relationship less stateless.

And for agentic coding tools, that may be one of the most practical upgrades available: not a smarter model, but a better memory of what already happened.

DEV Community