A cheap, persistent memory that learns your repo so your agent stops re-reading it

#claude #ai #programming #mcp

Live Memory: stop re-reading your repo on every Claude Code session

Every Claude Code session re-discovers your codebase from scratch — re-reading files, re-grepping, re-paying for the same context. live-memory runs a separate, cheaper large-context model in a long-running MCP server whose only job is to accumulate knowledge of your repository over time. Your main agent asks it questions through one read-only tool, ask_live_memory, instead of reloading the repo itself.

It learns passively, for free

PostToolUse and FileChanged hooks tee the content of every file your agent reads or edits into the memory, so it learns the code as a side effect of real work — without paying to re-read it — and stays current as the repo changes: an observed edit is authoritative and applied immediately, while an out-of-band change (external editor, git checkout) is flagged stale and re-read on next use. ask_live_memory is the active fallback for anything the passive layer hasn't seen yet.

One singleton, per-workspace, append-only

The server is a singleton that serves every Claude Code session — many agents, concurrently and over time — keying state per workspace (cwd). Its window is append-only between compactions; compaction is a neutral, query-agnostic summarization into a knowledge ledger (a high/low-watermark keeps it rare and batched) — never front-truncation — so older knowledge is distilled, not dropped. A background keep-warm loop holds the KV/prompt cache hot to cut latency and cost.

Provider-pluggable and zero-config

It's provider-pluggable (Anthropic Messages with cache_control, or any OpenAI-compatible endpoint — DeepSeek, gateways) and zero-config: with no API key but a Claude subscription, it uses the subscription OAuth token (auto-refreshed) on Haiku. A two-tier timeout gives the model a budget and returns a best-effort answer before the hard MCP timeout. Everything it can touch is read-only and path-jailed. Human-facing status is a /live-memory-stats slash command, kept off the agent's tool surface.

Does it pay off?

In a claude -p A/B on a real repo, on understanding-heavy work the building (premium) model offloaded ~93% of its codebase-reading tokens to live-memory, cost ~61% less per task and finished ~22% faster — and its cost got more predictable (the without-memory arm occasionally spiraled into long re-reading loops). Honest scope: pure edit/execution work is roughly break-even, and on realistic hybrid (understand-then-edit) tasks the all-in saving is a smaller ~11% per task — this makes understanding cheaper, not typing.

The companion runs on a cheap model — and it can be very cheap

The premium-model saving above is what you keep regardless; the only cost added back is running the small memory model, and that model is pluggable. Default is Haiku, but on our accuracy set (15 Q × 3 reps) deepseek-v4-flash matched — and slightly edged — Haiku (98% vs 91% correct, fewer hallucinations, both perfect on the negative traps) at ~8× lower token price — or point it at a local model for ≈ free. The cheaper the companion, the closer the all-in cost gets to the full premium saving: −25% all-in on Haiku → −57% on deepseek-v4-flash, versus the −61% building-model number.

Install

Live Memory is an HTTP MCP server you run once, plus a plugin that wires it in — start the server first:

# zero-config on a Claude subscription → Haiku (or: cd ../server && pip install -e . && python -m live_memory)
git clone https://github.com/shofer-dev/claude-code-live-memory
cd claude-code-live-memory/deploy && ./install-service.sh

Then, inside a Claude Code session:

/plugin marketplace add shofer-dev/claude-code-live-memory
/plugin install live-memory@shofer-live-memory

Ask your agent a whole-repo question — it'll call ask_live_memory instead of reading files.

Standalone Claude Code plugin — Python, Apache-2.0. Source: github.com/shofer-dev/claude-code-live-memory

Top comments (2)

Ponsubash Raj R • Jul 6

Interesting tool. Compaction seems to handle knowledge consolidation well, but how do you think about stale facts already incorporated into the ledger or retained from previous Live Memory answers after an out-of-band file change? Would source-linked validation like Copilot's approach help, or would that undermine the simplicity of Live Memory's free-text ledger?

Alexandros Stergiakis • Jul 6

Great question — this is exactly the seam, and it's handled in two tiers.

Raw file contexts are hash-pinned and self-healing: a watcher invalidates them on out-of-band edits/deletes, and on reload they're re-hashed against disk, so stale raw evidence can't survive.

The compacted ledger is where your concern would otherwise bite — a distilled free-text fact could keep asserting something after its source moved on. So each ledger fact carries a small sources: {path → content_hash} sidecar. When a cited file changes — in-session, or re-validated against disk on the next load — the fact is demoted under a "⚠ possibly out of date, re-verify" heading rather than silently trusted.

That's essentially the source-linked validation you're describing, and it doesn't cost the free-text simplicity: the prose stays prose, the hashes are metadata the model never trusts — they only trigger demotion. It reuses hashing already done, so it's cheap and additive: no schema, no per-sentence citation index (which would undermine the "reason over current context, not an index you trust" bet).

The one piece still open: a demoted fact today is flagged, not auto-refreshed — background re-derivation of just the stale facts is the natural next step.