<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alexandros Stergiakis</title>
    <description>The latest articles on DEV Community by Alexandros Stergiakis (@alsterg).</description>
    <link>https://dev.to/alsterg</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4016383%2F7769a916-b097-4890-89b5-390da834d22c.png</url>
      <title>DEV Community: Alexandros Stergiakis</title>
      <link>https://dev.to/alsterg</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/alsterg"/>
    <language>en</language>
    <item>
      <title>A cheap, persistent memory that learns your repo so your agent stops re-reading it</title>
      <dc:creator>Alexandros Stergiakis</dc:creator>
      <pubDate>Sun, 05 Jul 2026 15:54:43 +0000</pubDate>
      <link>https://dev.to/alsterg/a-cheap-persistent-memory-that-learns-your-repo-so-your-agent-stops-re-reading-it-cah</link>
      <guid>https://dev.to/alsterg/a-cheap-persistent-memory-that-learns-your-repo-so-your-agent-stops-re-reading-it-cah</guid>
      <description>&lt;h1&gt;
  
  
  Live Memory: stop re-reading your repo on every Claude Code session
&lt;/h1&gt;

&lt;p&gt;Every Claude Code session re-discovers your codebase from scratch — re-reading files, re-grepping, re-paying for the same context. &lt;strong&gt;live-memory&lt;/strong&gt; runs a &lt;em&gt;separate, cheaper&lt;/em&gt; large-context model in a long-running MCP server whose only job is to &lt;strong&gt;accumulate knowledge of your repository over time&lt;/strong&gt;. Your main agent asks it questions through one read-only tool, &lt;code&gt;ask_live_memory&lt;/code&gt;, instead of reloading the repo itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  It learns passively, for free
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;PostToolUse&lt;/code&gt; and &lt;code&gt;FileChanged&lt;/code&gt; hooks &lt;strong&gt;tee the content&lt;/strong&gt; of every file your agent reads or edits into the memory, so it learns the code as a side effect of real work — without paying to re-read it — and stays current as the repo changes: an observed edit is authoritative and applied immediately, while an out-of-band change (external editor, &lt;code&gt;git checkout&lt;/code&gt;) is flagged stale and re-read on next use. &lt;code&gt;ask_live_memory&lt;/code&gt; is the active fallback for anything the passive layer hasn't seen yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  One singleton, per-workspace, append-only
&lt;/h2&gt;

&lt;p&gt;The server is a &lt;strong&gt;singleton&lt;/strong&gt; that serves every Claude Code session — many agents, concurrently and over time — keying state &lt;strong&gt;per workspace&lt;/strong&gt; (&lt;code&gt;cwd&lt;/code&gt;). Its window is &lt;strong&gt;append-only between compactions&lt;/strong&gt;; compaction is a &lt;em&gt;neutral, query-agnostic&lt;/em&gt; summarization into a knowledge ledger (a high/low-watermark keeps it rare and batched) — never front-truncation — so older knowledge is distilled, not dropped. A background keep-warm loop holds the KV/prompt cache hot to cut latency and cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Provider-pluggable and zero-config
&lt;/h2&gt;

&lt;p&gt;It's &lt;strong&gt;provider-pluggable&lt;/strong&gt; (Anthropic Messages with &lt;code&gt;cache_control&lt;/code&gt;, or any OpenAI-compatible endpoint — DeepSeek, gateways) and &lt;strong&gt;zero-config&lt;/strong&gt;: with no API key but a Claude subscription, it uses the subscription OAuth token (auto-refreshed) on Haiku. A two-tier timeout gives the model a budget and returns a best-effort answer before the hard MCP timeout. Everything it can touch is &lt;strong&gt;read-only and path-jailed&lt;/strong&gt;. Human-facing status is a &lt;code&gt;/live-memory-stats&lt;/code&gt; slash command, kept off the agent's tool surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does it pay off?
&lt;/h2&gt;

&lt;p&gt;In a &lt;code&gt;claude -p&lt;/code&gt; A/B on a real repo, on understanding-heavy work the building (premium) model offloaded &lt;strong&gt;~93%&lt;/strong&gt; of its codebase-reading tokens to live-memory, cost &lt;strong&gt;~61% less per task&lt;/strong&gt; and finished &lt;strong&gt;~22% faster&lt;/strong&gt; — and its cost got &lt;em&gt;more predictable&lt;/em&gt; (the without-memory arm occasionally spiraled into long re-reading loops). Honest scope: pure edit/execution work is roughly break-even, and on realistic hybrid (understand-then-edit) tasks the all-in saving is a smaller &lt;strong&gt;~11% per task&lt;/strong&gt; — this makes &lt;em&gt;understanding&lt;/em&gt; cheaper, not &lt;em&gt;typing&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The companion runs on a cheap model — and it can be &lt;em&gt;very&lt;/em&gt; cheap
&lt;/h2&gt;

&lt;p&gt;The premium-model saving above is what you keep regardless; the only cost added back is running the small memory model, and that model is pluggable. Default is Haiku, but on our accuracy set (15 Q × 3 reps) &lt;strong&gt;deepseek-v4-flash matched — and slightly edged — Haiku&lt;/strong&gt; (98% vs 91% correct, fewer hallucinations, both perfect on the negative traps) at &lt;strong&gt;~8× lower token price&lt;/strong&gt; — or point it at a &lt;strong&gt;local model&lt;/strong&gt; for ≈ free. The cheaper the companion, the closer the all-in cost gets to the full premium saving: &lt;strong&gt;−25% all-in on Haiku → −57% on deepseek-v4-flash&lt;/strong&gt;, versus the &lt;strong&gt;−61%&lt;/strong&gt; building-model number.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;Live Memory is an HTTP MCP server you run once, plus a plugin that wires it in — &lt;strong&gt;start the server first&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# zero-config on a Claude subscription → Haiku (or: cd ../server &amp;amp;&amp;amp; pip install -e . &amp;amp;&amp;amp; python -m live_memory)&lt;/span&gt;
git clone https://github.com/shofer-dev/claude-code-live-memory
&lt;span class="nb"&gt;cd &lt;/span&gt;claude-code-live-memory/deploy &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; ./install-service.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, inside a Claude Code session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/plugin marketplace add shofer-dev/claude-code-live-memory
/plugin install live-memory@shofer-live-memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ask your agent a whole-repo question — it'll call &lt;code&gt;ask_live_memory&lt;/code&gt; instead of reading files.&lt;/p&gt;




&lt;p&gt;Standalone Claude Code plugin — Python, Apache-2.0. Source: &lt;a href="https://github.com/shofer-dev/claude-code-live-memory" rel="noopener noreferrer"&gt;github.com/shofer-dev/claude-code-live-memory&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>claude</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
