DEV Community

Cover image for A cheap, persistent memory that learns your repo so your agent stops re-reading it
Alexandros Stergiakis
Alexandros Stergiakis

Posted on • Originally published at github.com

A cheap, persistent memory that learns your repo so your agent stops re-reading it

Live Memory: stop re-reading your repo on every Claude Code session

Every Claude Code session re-discovers your codebase from scratch — re-reading files, re-grepping, re-paying for the same context. live-memory runs a separate, cheaper large-context model in a long-running MCP server whose only job is to accumulate knowledge of your repository over time. Your main agent asks it questions through one read-only tool, ask_live_memory, instead of reloading the repo itself.

It learns passively, for free

PostToolUse and FileChanged hooks tee the content of every file your agent reads or edits into the memory, so it learns the code as a side effect of real work — without paying to re-read it — and stays current as the repo changes: an observed edit is authoritative and applied immediately, while an out-of-band change (external editor, git checkout) is flagged stale and re-read on next use. ask_live_memory is the active fallback for anything the passive layer hasn't seen yet.

One singleton, per-workspace, append-only

The server is a singleton that serves every Claude Code session — many agents, concurrently and over time — keying state per workspace (cwd). Its window is append-only between compactions; compaction is a neutral, query-agnostic summarization into a knowledge ledger (a high/low-watermark keeps it rare and batched) — never front-truncation — so older knowledge is distilled, not dropped. A background keep-warm loop holds the KV/prompt cache hot to cut latency and cost.

Provider-pluggable and zero-config

It's provider-pluggable (Anthropic Messages with cache_control, or any OpenAI-compatible endpoint — DeepSeek, gateways) and zero-config: with no API key but a Claude subscription, it uses the subscription OAuth token (auto-refreshed) on Haiku. A two-tier timeout gives the model a budget and returns a best-effort answer before the hard MCP timeout. Everything it can touch is read-only and path-jailed. Human-facing status is a /live-memory-stats slash command, kept off the agent's tool surface.

Does it pay off?

In a claude -p A/B on a real repo, on understanding-heavy work the building (premium) model offloaded ~93% of its codebase-reading tokens to live-memory, cost ~61% less per task and finished ~22% faster — and its cost got more predictable (the without-memory arm occasionally spiraled into long re-reading loops). Honest scope: pure edit/execution work is roughly break-even, and on realistic hybrid (understand-then-edit) tasks the all-in saving is a smaller ~11% per task — this makes understanding cheaper, not typing.

The companion runs on a cheap model — and it can be very cheap

The premium-model saving above is what you keep regardless; the only cost added back is running the small memory model, and that model is pluggable. Default is Haiku, but on our accuracy set (15 Q × 3 reps) deepseek-v4-flash matched — and slightly edged — Haiku (98% vs 91% correct, fewer hallucinations, both perfect on the negative traps) at ~8× lower token price — or point it at a local model for ≈ free. The cheaper the companion, the closer the all-in cost gets to the full premium saving: −25% all-in on Haiku → −57% on deepseek-v4-flash, versus the −61% building-model number.

Install

Live Memory is an HTTP MCP server you run once, plus a plugin that wires it in — start the server first:

# zero-config on a Claude subscription → Haiku (or: cd ../server && pip install -e . && python -m live_memory)
git clone https://github.com/shofer-dev/claude-code-live-memory
cd claude-code-live-memory/deploy && ./install-service.sh
Enter fullscreen mode Exit fullscreen mode

Then, inside a Claude Code session:

/plugin marketplace add shofer-dev/claude-code-live-memory
/plugin install live-memory@shofer-live-memory
Enter fullscreen mode Exit fullscreen mode

Ask your agent a whole-repo question — it'll call ask_live_memory instead of reading files.


Standalone Claude Code plugin — Python, Apache-2.0. Source: github.com/shofer-dev/claude-code-live-memory

Top comments (0)