<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Patdolitse</title>
    <description>The latest articles on DEV Community by Patdolitse (@patdolitse).</description>
    <link>https://dev.to/patdolitse</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3991857%2F7441daea-7e33-4994-b09f-045d265c2763.png</url>
      <title>DEV Community: Patdolitse</title>
      <link>https://dev.to/patdolitse</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/patdolitse"/>
    <language>en</language>
    <item>
      <title>Your agent remembers. that's the problem</title>
      <dc:creator>Patdolitse</dc:creator>
      <pubDate>Fri, 19 Jun 2026 05:45:24 +0000</pubDate>
      <link>https://dev.to/patdolitse/your-agent-remembers-thats-the-problem-27ci</link>
      <guid>https://dev.to/patdolitse/your-agent-remembers-thats-the-problem-27ci</guid>
      <description>&lt;p&gt;We spent a year teaching coding agents to remember things. Remembering turned out to be the easy part. The hard part is knowing which of those memories are still &lt;em&gt;true&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Here's the failure that finally got me.&lt;/p&gt;

&lt;h2&gt;
  
  
  A guess, promoted to a fact
&lt;/h2&gt;

&lt;p&gt;I told Cursor my project had switched from npm to &lt;strong&gt;bun&lt;/strong&gt;. Later, over in Claude Code, the agent confidently ran &lt;code&gt;npm install&lt;/code&gt; and spent a while debugging a build that was never going to work. The "we use npm" belief had outlived the truth — and nothing flagged it, because to the agent it looked exactly like every other fact in memory.&lt;/p&gt;

&lt;p&gt;Same shape as the classic one: "we use Jest" survives a migration to Vitest, and three sessions later an agent is cheerfully writing Jest tests for a repo that doesn't have Jest anymore.&lt;/p&gt;

&lt;p&gt;Neither of these is a &lt;em&gt;forgetting&lt;/em&gt; problem. The agent remembered fine. It remembered something that used to be true and treated it as if it still was.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sync isn't the fix
&lt;/h2&gt;

&lt;p&gt;Most "AI memory" tools are really sync tools: capture what happened in one tool, make it available in the next. That's genuinely useful — but it quietly makes the above &lt;em&gt;worse&lt;/em&gt;, because now your stale guess travels with you.&lt;/p&gt;

&lt;p&gt;And the dangerous memories aren't the obscure ones. They're the ones referenced constantly — your stack, your conventions, your "always do X." High-frequency reads like high-confidence, so those get re-injected into every new session first. If one of them is wrong, importance scoring will lovingly protect it.&lt;/p&gt;

&lt;p&gt;So the question I got stuck on stopped being &lt;em&gt;what should the agent remember&lt;/em&gt;. It became &lt;em&gt;what has the agent actually earned the right to call true&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Treat memory as a trust problem
&lt;/h2&gt;

&lt;p&gt;The model I landed on: nothing an agent writes is trusted by default.&lt;/p&gt;

&lt;p&gt;Every fact an agent records lands as &lt;strong&gt;unverified&lt;/strong&gt; — no matter how confident it sounded writing it. It only graduates to "confirmed" when something &lt;em&gt;outside the agent&lt;/em&gt; says so:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;I sign off on it&lt;/strong&gt; (a human decision), or&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;a hard signal does&lt;/strong&gt; — a passing test, a command result, or a repo anchor (a dependency, a file that actually exists).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The part that matters most: when an anchor breaks — the dependency is gone, the file moved — the fact doesn't sit on a staleness timer waiting to expire. It &lt;strong&gt;drops itself back to a guess&lt;/strong&gt; immediately. The truth went away, so the trust goes with it.&lt;/p&gt;

&lt;p&gt;Which means the agent never grades its own homework. A confident wrong inference and a correct one start life identical — both unverified — and only one of them ever earns "fact" status. You don't have to catch the lie. You just never hand it the badge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The unglamorous parts (where the time actually went)
&lt;/h2&gt;

&lt;p&gt;If you build one of these, the interesting design ideas are about 10% of the work. The rest is the boring stuff that decides whether you can trust the thing at all:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Atomic writes.&lt;/strong&gt; What bit me wasn't a lost write — it was a half-written file after a crash mid-flush, which reads as valid until it suddenly doesn't. Write to a temp file, rename over the target, keep a last-known-good copy so a corrupt write falls back instead of taking the whole store down.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redaction before persistence.&lt;/strong&gt; Memory is a fantastic place to accidentally store a secret forever. Anything fetched, logged, or pasted gets sanitized before it lands.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provenance on every record.&lt;/strong&gt; Source tool, when it was written, how it earned trust. Cleanup and trust decisions are basically impossible later without it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of it is glamorous. All of it is the difference between a memory layer you can lean on and a JSON file that occasionally lies to you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Being fair about it
&lt;/h2&gt;

&lt;p&gt;Trust isn't the only axis. Importance/recency pruning — keep what matters, drop the noise — is a real, useful idea, and a lot of good memory tools lead with it. I just think it answers a different question (&lt;em&gt;what's worth keeping&lt;/em&gt;) than the one that kept burning me (&lt;em&gt;what's actually true&lt;/em&gt;). They're complementary; I'd want both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building this in the open
&lt;/h2&gt;

&lt;p&gt;I packed this into &lt;a href="https://github.com/Patdolitse/piia-engram" rel="noopener noreferrer"&gt;piia-engram&lt;/a&gt; — a local-first, content-blind memory layer for MCP clients (Claude Code, Cursor, Codex). Open source, &lt;code&gt;pip install piia-engram&lt;/code&gt;. It's local-first mostly because the cleanest answer to "is my memory private?" is "nothing left your machine in the first place."&lt;/p&gt;

&lt;p&gt;But the tool is honestly secondary to the idea. If you're building or using cross-tool memory, I'd genuinely like to know: &lt;strong&gt;how do you decide what your agent is allowed to trust?&lt;/strong&gt; Importance? Recency? Provenance? Something else entirely?&lt;/p&gt;

&lt;p&gt;Tell me where this breaks.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Building in public. Local-first. An AI's guess isn't a fact until something proves it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>productivity</category>
      <category>buildinpublic</category>
    </item>
  </channel>
</rss>
