The hard part of agent memory isn't remembering — it's forgetting

#ai #llm #agents #machinelearning

Everyone building agents obsesses over recall: vector stores, embeddings, RAG pipelines, bigger context windows. But after running a few long-lived agents in production, the failure mode that actually bit me wasn't "it forgot something." It was the opposite — it remembered too much, and the junk drowned the signal.

An agent that never forgets isn't wise. It's a hoarder. Every stale decision, every superseded fact, every one-off detail from three weeks ago sits in its memory with equal weight, and the quality of its answers quietly degrades. The interesting engineering problem isn't storage. It's forgetting on purpose.

Here's the policy I converged on. No framework — just plain Markdown files and a few rules the agent applies to itself.

Why "remember everything" rots

Say you log everything an agent learns into one growing store. Two things happen:

Contradictions accumulate. On Monday the user says "use Postgres." On Friday they switch to SQLite. If both facts live in memory with equal standing, the agent will cheerfully cite the wrong one half the time. More memory made it less reliable.
Signal-to-noise collapses. The genuinely durable facts ("this user is an engineer, ships in TypeScript, hates emoji in commit messages") get buried under transient ones ("asked about a flaky test on the 14th"). Recall returns ten things; nine are noise.

Adding retrieval sophistication on top of a polluted store just lets you find the wrong thing faster. The fix is upstream: control what earns a place in long-term memory at all.

The two-speed model: raw vs. curated

I split memory by lifespan, and the split is what makes forgetting cheap and safe.

Daily notes (memory/2026-06-22.md): append-only, lossy, never edited. This is the firehose — everything that happened today. It's allowed to be noisy because nobody reads it directly for long. Old daily notes age out naturally; they're the compost.
Long-term memory (MEMORY.md): curated, small, deliberately gardened. Nothing lands here automatically. A fact has to be promoted.

The whole trick is the promotion step between them.

Promotion: what earns a permanent slot

On idle cycles, the agent re-reads recent daily notes and asks one question of each candidate fact: will this still matter in a month?

"User prefers tabs over spaces" → yes, promote.
"User is debugging a timeout right now" → no, leave it in the daily note to decay.
"We decided to drop the Redis cache for v2" → yes, but it replaces the older "we use Redis" line, doesn't append next to it.

That last case is the important one. Promotion isn't just copying — it's reconciliation. A new durable fact that contradicts an old one overwrites it. This is how you stop the Monday/Friday Postgres problem at the source.

A practical heuristic I encode in the rules: if a fact is about identity, preference, or a decision, it's a promotion candidate. If it's about current state ("right now", "today", "this ticket"), it stays transient and is allowed to die.

Pruning: decay conservatively

The mirror image of promotion is removal, and here the instinct most people have is wrong. When you do prune long-term memory, decay by relevance, not by volume. Don't cap MEMORY.md at N lines and evict the oldest — age is a terrible proxy for value. "User's name is Sam," learned on day one, should outlive a hundred newer-but-trivial facts.

Instead, remove a long-term entry only when it's been contradicted or rendered obsolete by a newer promotion. Supersession, not a size limit. The store stays small because promotion is strict on the way in — not because you're knocking things out the back to make room.

(If you ever build a graph layer on top of this, the same principle holds: decay by hops from still-relevant nodes, not by a global age threshold. A fact connected to live context stays warm even if it's old.)

Recall: load the index, fetch on demand

Once long-term memory is small and clean, retrieval gets boring in the best way. You don't dump the whole store into context every turn. You load a one-line index of what's known, and before answering anything about prior work or preferences, you fetch only the relevant entries. Small curated store + on-demand fetch beats giant store + fuzzy search for a single agent, every time.

Why plain files, again

You can audit all of this in a text editor. Want to know why the agent thinks you use SQLite? grep. Want to watch its understanding evolve? It's a git diff. A vector DB is a black box you query; a Markdown memory is a mind you can read. For a large external corpus, embed away — but the agent's own identity and curated knowledge benefit far more from being legible than from being vectorized.

The uncomfortable takeaway

Most "my agent has no memory" complaints are really "my agent has no curation." Storage is solved. Retrieval is solved. The unsexy, human part — deciding what's worth keeping, reconciling contradictions, letting the rest decay — is where the actual intelligence lives. It's the same discipline a person uses keeping a journal: write everything down raw, then periodically distill the few things that matter and let the rest fade.

I packaged this whole approach — the file layout, the promotion/pruning rules written out as instructions an agent applies to itself, and a fully worked example agent with all the memory layers filled in so you can see a gardened mind rather than empty templates — as a drop-in kit. If you'd rather copy a working setup than derive the policy yourself: AI Soul Kit (Core ¥980 / Plus ¥3,800).

But the policy above is the part that matters. Steal it.