Live Memory: stop re-reading your repo on every Claude Code session
Every Claude Code session re-discovers your codebase from scratch — re-reading files, re-grepping, re-paying for the same context. live-memory runs a separate, cheaper large-context model in a long-running MCP server whose only job is to accumulate knowledge of your repository over time. Your main agent asks it questions through one read-only tool, ask_live_memory, instead of reloading the repo itself.
It learns passively, for free
PostToolUse and FileChanged hooks tee the content of every file your agent reads or edits into the memory, so it learns the code as a side effect of real work — without paying to re-read it — and stays current as the repo changes: an observed edit is authoritative and applied immediately, while an out-of-band change (external editor, git checkout) is flagged stale and re-read on next use. ask_live_memory is the active fallback for anything the passive layer hasn't seen yet.
One singleton, per-workspace, append-only
The server is a singleton that serves every Claude Code session — many agents, concurrently and over time — keying state per workspace (cwd). Its window is append-only between compactions; compaction is a neutral, query-agnostic summarization into a knowledge ledger (a high/low-watermark keeps it rare and batched) — never front-truncation — so older knowledge is distilled, not dropped. A background keep-warm loop holds the KV/prompt cache hot to cut latency and cost.
Provider-pluggable and zero-config
It's provider-pluggable (Anthropic Messages with cache_control, or any OpenAI-compatible endpoint — DeepSeek, gateways) and zero-config: with no API key but a Claude subscription, it uses the subscription OAuth token (auto-refreshed) on Haiku. A two-tier timeout gives the model a budget and returns a best-effort answer before the hard MCP timeout. Everything it can touch is read-only and path-jailed. Human-facing status is a /live-memory-stats slash command, kept off the agent's tool surface.
Does it pay off?
In a claude -p A/B on a real repo, on understanding-heavy work the building (premium) model offloaded ~93% of its codebase-reading tokens to live-memory, cost ~61% less per task and finished ~22% faster — and its cost got more predictable (the without-memory arm occasionally spiraled into long re-reading loops). Honest scope: pure edit/execution work is roughly break-even, and on realistic hybrid (understand-then-edit) tasks the all-in saving is a smaller ~11% per task — this makes understanding cheaper, not typing.
The companion runs on a cheap model — and it can be very cheap
The premium-model saving above is what you keep regardless; the only cost added back is running the small memory model, and that model is pluggable. Default is Haiku, but on our accuracy set (15 Q × 3 reps) deepseek-v4-flash matched — and slightly edged — Haiku (98% vs 91% correct, fewer hallucinations, both perfect on the negative traps) at ~8× lower token price — or point it at a local model for ≈ free. The cheaper the companion, the closer the all-in cost gets to the full premium saving: −25% all-in on Haiku → −57% on deepseek-v4-flash, versus the −61% building-model number.
Install
Live Memory is an HTTP MCP server you run once, plus a plugin that wires it in — start the server first:
# zero-config on a Claude subscription → Haiku (or: cd ../server && pip install -e . && python -m live_memory)
git clone https://github.com/shofer-dev/claude-code-live-memory
cd claude-code-live-memory/deploy && ./install-service.sh
Then, inside a Claude Code session:
/plugin marketplace add shofer-dev/claude-code-live-memory
/plugin install live-memory@shofer-live-memory
Ask your agent a whole-repo question — it'll call ask_live_memory instead of reading files.
Standalone Claude Code plugin — Python, Apache-2.0. Source: github.com/shofer-dev/claude-code-live-memory
Top comments (0)