For about six months I believed my agent's memory was working.
It remembered things across sessions. It pulled up the right context when I came back to a project. It corrected itself when something changed. Every visible sign said the system I built was doing its job.
It was not doing its job. Claude Code ships its own built-in memory, and that was the thing actually answering. Mine was running too, writing to its own store, looking busy, but it was the understudy. The native one had the lead the whole time and I never noticed I had given it away. For months I was reading my own system's success off a stage where a different actor was speaking the lines.
Nothing looked wrong. The agent gave good answers. That is exactly the problem.
Silent success is the dangerous kind
A system that fails loudly is the easy case. You see the gap, you fix it.
A system that is quietly shadowed is the dangerous one, because a shadow produces helpful, plausible output, so it looks identical to success. You cannot tell my system works apart from something else is working on my system's behalf by looking at the output, because the output is the same in both cases. That is the trap, and a good answer is not the way out of it.
The only way out is a forcing function. You turn the other thing off and see what happens.
The test
It works on any agent memory setup, not just mine, and it takes about a minute. Turn off the runtime's native memory. In Claude Code that is one line:
CLAUDE_CODE_DISABLE_AUTO_MEMORY=1
Then use your agent the way you normally do. Ask it to remember something. Come back in a new session and ask for it. Watch what your system actually does once the understudy is sent home.
- If your memory still works, good. It was always the one doing the work.
- If it suddenly goes blank, the native store was carrying you, and every demo you have given was the shadow, not your system.
When I finally ran this on my own setup, mine went quiet. Six months of "it works" turned out to be six months of something else covering for it.
Why this gets worse, not better
Any time you bolt a memory system onto a runtime that already has its own, you are exposed to this. And the smarter the underlying model gets, the better it papers over the gap, which means the better your demos look, the less they prove.
A polished demo on a capable model is not evidence your system works. It can just as easily be evidence the model is good enough to hide that it does not.
So do not trust that your memory works because the answers are good. Look at what is actually persisted, and run the off-test. Turn the other thing off, and find out who has really been talking.
It cost me half a year to learn that. It costs you one line and one minute.
Top comments (0)