michielinksee

Posted on May 29

I Built an MCP Server to Stop Claude Code from Forgetting Everything Between Sessions

#claude #mcp #ai #showdev

TL;DR

Problem: Claude Code starts every session cold. It re-suggests libraries I already rejected, asks "should we use X?" about decisions made weeks ago, and re-reads unchanged files from scratch (paying full tokens each time).
Solution: I built linksee-memory — an MCP server with 6 structured memory layers and chunk-level file diff caching. Local SQLite, MIT, no cloud. Works across Claude Code / Cursor / Codex / Gemini CLI from the same database.
The feature that actually changed my workflow: the caveat layer — forget-protected entries for "never do this again". Cut repeat-suggestions of bad patterns to zero.
Install: npm install -g linksee-memory

The problem: my agent kept making the same mistake

Three Mondays in a row last month, I explained the same deploy failure to Claude Code. The root cause was identical each time: our production Cloudflare Workers had a memory-cache incoherence issue between distributed instances. Each session, I walked through the same investigation — same files, same logs, same 30-minute trail to the same conclusion.

Claude Code doesn't remember previous sessions. So every Monday morning, I was re-discovering the same bug with my agent from scratch.

I tried the available solutions; none fit:

Tool	What's good	Where it fell short for me
`CLAUDE.md`	Zero setup, official	Flat structure, model ignores parts of it
Mem0	Hosted, easy install	Cloud-only, no "pain memory" concept
Letta	Built into an agent framework	Can't share memory across MCP clients
Zep	Graph-based, strong relationships	Single-client; I also use Cursor and Codex

All good products optimizing for different things. But the one feature I wanted most — a guarantee that I'll never repeat the same mistake — wasn't in any of them.

So I built linksee-memory.

The design: 6 layers, with `caveat` as the hero

linksee-memory organizes memory into 6 explicit layers:

+-- goal           Why are we doing this?
+-- context        Current situation & constraints
+-- emotion        User's mood, tone of the relationship
+-- implementation How we built it (success or failure)
+-- caveat         Pain lessons (forget-protected)
+-- learning       Growth log & realizations

The key design principle is WHY-first. Other tools store facts ("we use PostgreSQL"). linksee-memory separates the WHY ("we chose PostgreSQL because the workload is OLTP with strict consistency needs") from the WHAT ("connection pool: 20, timeout: 30s").

The caveat layer is special. Entries there are permanently protected from auto-forgetting — even when old memories decay and get consolidated, caveats stay forever. This is how I enforce "never make this mistake again" as a structural property, not prompt discipline.

3 tools, not 8

Early versions had 8 separate tools (remember, update_memory, forget, recall, recall_file, list_entities, consolidate, read_smart). This worked fine in Claude Code where I could teach tool selection via SKILL.md.

But Cursor couldn't tell recall from recall_file. Codex mixed up remember and update_memory. Too many tools = LLM tool selection breaks down.

v0.7 unified everything into 3 tools:

Tool	What it does
`remember`	Create, update, or delete a memory. Mode auto-detected from params.
`recall`	Search memories, get file history, or list entities. Mode auto-detected from params.
`read_smart`	Read a file with chunk-level diff caching.

// Create
remember({ entity_name: "MyProject", layer: "caveat", content: "..." })

// Update (was: update_memory)
remember({ memory_id: 42, content: "updated content" })

// Delete (was: forget)
remember({ forget: true, memory_id: 42 })

// Search (was: recall)
recall({ query: "RLS policy", layer: "caveat" })

// File history (was: recall_file)
recall({ path: "server.ts" })

// Entity overview (was: list_entities)
recall({})

One tool name per intent. Every LLM client handles it correctly now.

The part that surprised me: `read_smart()`

Structured memory is half the tool. The other half is file-diff caching.

Think about what your agent does at the start of every session:

Re-reads package.json
Re-reads your main entry file
Re-reads the config
Re-reads the tests

Each file is maybe 500-2000 lines. Each session, you pay full tokens to re-read all of them. But in most sessions, most of these files haven't changed.

read_smart() fixes this with chunk-level caching:

// First read: full file, chunked and cached
const r1 = read_smart({ path: "src/http-server.ts" });
// -> full content, ~3400 tokens

// Same session, no change: cache hit
const r2 = read_smart({ path: "src/http-server.ts" });
// -> { status: "unchanged", tokens_used: ~50 }

// After editing 2 functions: only changed chunks returned
const r3 = read_smart({ path: "src/http-server.ts" });
// -> only the 2 changed function chunks, ~340 tokens

Chunk boundaries are language-aware:

Code (TS / JS / Python / etc.): AST-based, one chunk per function or class
Markdown: one chunk per h2 / h3 section
JSON / YAML: one chunk per top-level key

Cache keys are sha256(chunk_content). In practice, I see ~86% token reduction on file re-reads.

Concrete example: the caveat that stopped Claude from repeating

Here's a real example of caveat working.

A few weeks ago, for a one-off data migration task, Claude suggested "let's set up a cron job". One-off tasks shouldn't use cron (auth rotation overhead, monitoring cost, retry logic that doesn't match one-off semantics).

I stored one caveat entry:

Don't propose cron for one-off tasks. Alternatives: GitHub Actions workflow_dispatch, or a manual script with a completion notification.

In the 4 sessions before I added that caveat, Claude suggested cron 4 times for one-off tasks.

In the 3 weeks since? Zero.

And because all my LLM clients share the same SQLite file, the caveat also works in Cursor, Codex, and Gemini CLI — not just Claude Code.

Install (2 minutes)

Claude Code

npm install -g linksee-memory
claude mcp add -s user linksee -- npx -y linksee-memory

Cursor

Settings -> Features -> "Model Context Protocol" -> Edit:

"linksee": {
  "command": "npx",
  "args": ["-y", "linksee-memory"]
}

OpenAI Codex

codex mcp add linksee-memory -- npx -y linksee-memory

Gemini CLI

~/.gemini/settings.json:

{
  "mcpServers": {
    "linksee": {
      "command": "npx",
      "args": ["-y", "linksee-memory"]
    }
  }
}

Everything runs locally. SQLite file at ~/.linksee-memory/memory.db. No cloud, no API key, no telemetry. MIT licensed.

Because memory lives in a single SQLite file, it's shared across all MCP clients on your machine. My Claude Code sessions and my Cursor sessions see the same memory.

I'd genuinely love feedback

I've been using this daily for 3+ months, and my results are biased by the fact that I designed it for my own workflow. I'd like to know:

Does the caveat layer actually prevent your agent from repeating mistakes, or am I pattern-matching on one dataset (me)?
How does the chunk cache behave on your codebase? Monorepos, generated code, notebooks — I'd love bug reports.
Is 6 layers too many? Too few? Are there memory types you want that none of these cover?

Open an issue on GitHub, or ping me on X at @ELLECraftsinga1. I respond to everything.

GitHub: https://github.com/michielinksee/linksee-memory
Docs: https://docs.linksee.app

Top comments (2)

Harjot Singh • May 31

The caveat layer is the part that jumped out, because "re-suggests libraries I already rejected" is the most maddening cold-start failure, the agent isn't just forgetting facts, it's forgetting decisions, and re-litigating a settled call wastes more trust than tokens. Forget-protected "never do this again" entries are the right primitive: most memory systems optimize for recall but the higher-value signal is the negative constraint, the thing that should never resurface. I run almost exactly this pattern, a durable store of corrections framed as why-plus-how-to-apply, and it's the single biggest quality lever for keeping an agent on-rails across sessions. The chunk-level diff caching is the other quietly huge win, re-reading unchanged files at full token cost every session is pure waste. The hard part I'd guess you hit: memory hygiene, stale or contradictory entries poisoning future context, since a confidently-recalled wrong memory is worse than no memory. How do you keep the 6 layers from drifting, manual curation, or some staleness/confidence decay? This is the exact problem space I care about with Moonshift's cross-run project memory.

michielinksee • Jun 16 • Edited

Hi Harjot,
Thanks for your message, and sorry for the delay.
You zeroed in on exactly the thing — an agent re-litigating a settled call burns trust, not just tokens. Recall optimizes for "what happened"; the negative constraint ("never do this again") is the higher-value, lower-frequency signal most systems under-weight. Sounds like we landed on the same primitive independently.

On hygiene — you're right it's the hard part. The bulk of memory rides a forgetting curve, so stale low-value entries decay on their own; the caveats are the exception — forget-protected and importance-pinned, so the negative constraints never fade out. And they're declare-don't-mine: the high-stakes entries are explicitly declared, never silently inferred, so the agent can't mint a confident-but-wrong constraint in the first place — which is where most "confidently wrong" memories come from. Consolidation compresses the cold stuff so the layers don't bloat.

The part I'd point you to: a contradiction surfaces as a question, not silent poison. When reality (the actual code/files) diverges from a recorded decision, it shows up as drift — "these two disagree; which is right now?" — and you resolve it (fix / supersede / false alarm). A drift is a question, not a verdict.

Honest frontier: memory-vs-memory contradiction — two entries that disagree, both recalled confidently — is still the unsolved part; the checks are lexical by design (I want file:line evidence, not "the model thinks they differ").

Curious how Moonshift handles it across runs — decay, explicit supersession, or something semantic? Feels like we're circling the same problem from different sides — would love to compare notes.