Your agent remembers. that's the problem

#ai #mcp #productivity #buildinpublic

We spent a year teaching coding agents to remember things. Remembering turned out to be the easy part. The hard part is knowing which of those memories are still true.

Here's the failure that finally got me.

A guess, promoted to a fact

I told Cursor my project had switched from npm to bun. Later, over in Claude Code, the agent confidently ran npm install and spent a while debugging a build that was never going to work. The "we use npm" belief had outlived the truth — and nothing flagged it, because to the agent it looked exactly like every other fact in memory.

Same shape as the classic one: "we use Jest" survives a migration to Vitest, and three sessions later an agent is cheerfully writing Jest tests for a repo that doesn't have Jest anymore.

Neither of these is a forgetting problem. The agent remembered fine. It remembered something that used to be true and treated it as if it still was.

Sync isn't the fix

Most "AI memory" tools are really sync tools: capture what happened in one tool, make it available in the next. That's genuinely useful — but it quietly makes the above worse, because now your stale guess travels with you.

And the dangerous memories aren't the obscure ones. They're the ones referenced constantly — your stack, your conventions, your "always do X." High-frequency reads like high-confidence, so those get re-injected into every new session first. If one of them is wrong, importance scoring will lovingly protect it.

So the question I got stuck on stopped being what should the agent remember. It became what has the agent actually earned the right to call true.

Treat memory as a trust problem

The model I landed on: nothing an agent writes is trusted by default.

Every fact an agent records lands as unverified — no matter how confident it sounded writing it. It only graduates to "confirmed" when something outside the agent says so:

I sign off on it (a human decision), or
a hard signal does — a passing test, a command result, or a repo anchor (a dependency, a file that actually exists).

The part that matters most: when an anchor breaks — the dependency is gone, the file moved — the fact doesn't sit on a staleness timer waiting to expire. It drops itself back to a guess immediately. The truth went away, so the trust goes with it.

Which means the agent never grades its own homework. A confident wrong inference and a correct one start life identical — both unverified — and only one of them ever earns "fact" status. You don't have to catch the lie. You just never hand it the badge.

The unglamorous parts (where the time actually went)

If you build one of these, the interesting design ideas are about 10% of the work. The rest is the boring stuff that decides whether you can trust the thing at all:

Atomic writes. What bit me wasn't a lost write — it was a half-written file after a crash mid-flush, which reads as valid until it suddenly doesn't. Write to a temp file, rename over the target, keep a last-known-good copy so a corrupt write falls back instead of taking the whole store down.
Redaction before persistence. Memory is a fantastic place to accidentally store a secret forever. Anything fetched, logged, or pasted gets sanitized before it lands.
Provenance on every record. Source tool, when it was written, how it earned trust. Cleanup and trust decisions are basically impossible later without it.

None of it is glamorous. All of it is the difference between a memory layer you can lean on and a JSON file that occasionally lies to you.

Being fair about it

Trust isn't the only axis. Importance/recency pruning — keep what matters, drop the noise — is a real, useful idea, and a lot of good memory tools lead with it. I just think it answers a different question (what's worth keeping) than the one that kept burning me (what's actually true). They're complementary; I'd want both.

Building this in the open

I packed this into piia-engram — a local-first, content-blind memory layer for MCP clients (Claude Code, Cursor, Codex). Open source, pip install piia-engram. It's local-first mostly because the cleanest answer to "is my memory private?" is "nothing left your machine in the first place."

But the tool is honestly secondary to the idea. If you're building or using cross-tool memory, I'd genuinely like to know: how do you decide what your agent is allowed to trust? Importance? Recency? Provenance? Something else entirely?

Tell me where this breaks.

Building in public. Local-first. An AI's guess isn't a fact until something proves it.