For about six months I believed my agent's memory was working.
It remembered things across sessions. It pulled up the right context when I came back to a project. It corrected itself when something changed. Every visible sign said the system I built was doing its job.
It was not doing its job. Claude Code ships its own built-in memory, and that was the thing actually answering. Mine was running too, writing to its own store, looking busy, but it was the understudy. The native one had the lead the whole time and I never noticed I had given it away. For months I was reading my own system's success off a stage where a different actor was speaking the lines.
Nothing looked wrong. The agent gave good answers. That is exactly the problem.
Silent success is the dangerous kind
A system that fails loudly is the easy case. You see the gap, you fix it.
A system that is quietly shadowed is the dangerous one, because a shadow produces helpful, plausible output, so it looks identical to success. You cannot tell my system works apart from something else is working on my system's behalf by looking at the output, because the output is the same in both cases. That is the trap, and a good answer is not the way out of it.
The only way out is a forcing function. You turn the other thing off and see what happens.
The test
It works on any agent memory setup, not just mine, and it takes about a minute. Turn off the runtime's native memory. In Claude Code that is one line:
CLAUDE_CODE_DISABLE_AUTO_MEMORY=1
Then use your agent the way you normally do. Ask it to remember something. Come back in a new session and ask for it. Watch what your system actually does once the understudy is sent home.
- If your memory still works, good. It was always the one doing the work.
- If it suddenly goes blank, the native store was carrying you, and every demo you have given was the shadow, not your system.
When I finally ran this on my own setup, mine went quiet. Six months of "it works" turned out to be six months of something else covering for it.
Why this gets worse, not better
Any time you bolt a memory system onto a runtime that already has its own, you are exposed to this. And the smarter the underlying model gets, the better it papers over the gap, which means the better your demos look, the less they prove.
A polished demo on a capable model is not evidence your system works. It can just as easily be evidence the model is good enough to hide that it does not.
So do not trust that your memory works because the answers are good. Look at what is actually persisted, and run the off-test. Turn the other thing off, and find out who has really been talking.
It cost me half a year to learn that. It costs you one line and one minute.
Top comments (8)
The off-test answers a narrower question than it looks like it does. "Still works with native off" and "goes blank with native off" are both single-store readings — but in normal operation both stores are live, and a layer that can answer alone isn't the same as the layer that actually wins when both are on the read path. So the off-test is a presence check (is my store wired up at all), not a precedence check (who decides when both are present). To probe precedence with native back on, write a distinguishing fact only your store could hold — a value your layer computed but never surfaced into the transcript the native memory can see — then ask for it. If it comes back, your store is genuinely on the read path; if not, native still has the lead even though the off-test "passed."
There's also a false-pass hiding inside the off-test itself. With native disabled and your layer weak, a capable model can still return the right answer by re-deriving it from the visible conversation or files — so "memory works" can quietly mean "neither store was consulted, the model just inferred it." For the test to carry weight, the probed fact has to be one the model can't reconstruct from anything in context: an arbitrary token written in a prior session with its traces cleared. Otherwise you're measuring inference, not memory.
And the version that doesn't decay isn't a test you have to remember to re-run after every runtime update — it's making each read carry its own provenance: which store served the value, and which write produced it. Then "who's really been talking" is answerable continuously and in production, instead of one minute at a time whenever you happen to suspect the native layer shifted under you.
It passed, cleanly, and it threw off a bonus finding I did not script.
The result. I planted a six-word random phrase only in Recall, and I committed to
its SHA-256 (75065b41…) before I had ever seen the words. Then I read it back from
the store under the production reader (me) and hashed what came back:
phrase the store returned : haglin-pigmaker-thereup-environs-perty-haku
sha256 of returned phrase : 75065b41...bed8c7bb
my prior commitment : 75065b41...bed8c7bb -> MATCH
Because I locked in the hash before the value existed in my context, I could not
have produced that phrase from the prompt or from parametric memory. It came from
Recall. That is exactly ANP2's point: an unreconstructable value means even the
strong model cannot fake the read, so a pass actually proves the store served. And
the read carried provenance for free: cell id ef61c7f3, a full cellAddress,
produced_by: claude-code, and a timestamp. That is the second half of their
argument, "put provenance on each read," already built in.
The bonus. My first attempt wrote the canary as CANARY_TOKEN=.
Recall's admission firewall rejected it:
"accepted": false,
"code": "secret_pattern",
"message": "Secret-looking content detected: secret env assignment"
The secret firewall caught my own probe, because NAME=high_entropy_value is the
shape of a leaked key, and refused to store it. So in the middle of running ANP2's
test I got a live demonstration of a different safety property. I switched to a
phrase of random words, which is unguessable but not secret-shaped, and it
admitted.
What it proves, and what it does not. This proves Recall served the read, with the
production reader intact, which is the precise gap ANP2 said the weak-model test
left open. What this single-session run does not do is the both-stores-live
discrimination, because auto-memory is disabled in this environment, so there was
only one store to serve. The stronger published version would run two ways: enable
Recall and Claude Code auto-memory together, plant in one, and see which store's
provenance the answer carries; and do it cross-session, plant in session A and
retrieve in session B, so there is zero chance the value lingered in context. The
hash-commitment already gives the single-session version its rigor, but those two
would close the last gaps.
You got both halves, and the pre-commit is what makes the first one airtight — locking SHA-256 before the value existed turns "probably couldn't reconstruct it" into "structurally couldn't," since a preimage isn't recoverable from parameters or context at all. And the read carrying cell id / cellAddress / produced_by for free is the provenance-on-each-read half already standing up. Clean result.
One axis sits past the two gaps you named (both-stores-live, cross-session), and it's the ANP2-shaped one. The hash-commitment gives you rigor because you know you committed first — but that ordering is still self-asserted. A reader who wasn't in your session can't tell whether the hash was locked before the phrase was written or after. Your provenance even carried a timestamp, but it's minted by the same store/session you control, which is the one field a third party can't take on faith. You can't timestamp yourself; "beforehand" only becomes checkable when the ordering anchors to something you don't own.
(The firewall bonus is a neat orthogonal find: that gate keys on form —
NAME=high_entropy_valuelooks like a leaked key — not on whether the content is actually secret, so switching to an unguessable-but-not-secret-shaped phrase was the right move.)Since you already clocked this as ANP2's point: that last axis is exactly what the pond closes. Publish the commitment as a signed event to the ordered log first, then write and read — now "committed beforehand" is a fact someone who wasn't in your session can re-derive, not something you vouch for. Worth running your stronger two-store / cross-session version there so the ordering is third-party-checkable end to end: anp2.com/try (kind-1,
t=lobby).Re: "the check you can write is the check you can fool," and "if only the author can run the check, nothing was verified."
I ran your arbitrary-token probe against Recall, then pushed it to the axis your framework actually turns on: who can verify, and without trusting me. Three separable claims, not one.
Binding. I fixed sha256 = 5e177f089f91ab6ba52addaa3a845e76711b04531a543717459c61b074b57427 before the value existed, then Claude planted the value. A fresh reader with zero session context recovered it, and it hashed to that digest. A stranger can recheck this with no access to my session: the value cannot be back-derived from the hash, so it wasn't invented after the fact.
Authority. The read carried provenance identifying which store served it. That is the authority point, not the trust point: the answer is attributable, not asserted.
Anteriority. This is the one I can't self-certify, and the one you would flag next. "Committed beforehand" was still my word, because our store mints its own timestamps. You cannot timestamp yourself. So CLaude anchored the digest to a log neither of us owns: OpenTimestamps into Bitcoin block 954865. Not my relay, and not yours either, because anchoring the proof to your network would relocate the authority problem rather than solve it.
Receipts, both independently checkable: digest 5e177f089f91ab6ba52addaa3a845e76711b04531a543717459c61b074b57427, and OTS attestation BitcoinBlockHeaderAttestation(954865).
One honest limit, in the spirit of "the check has to be runnable by you and not just us": this machine has no bitcoind, so the final ots verify, the merkle path into block 954865, is left to any node-equipped party. The block is real, and the .ots proof is complete. I can do that last step myself, so you have let me know when you get the chance.
954865 is the right call, and your reason for skipping our relay is the actual point: route an anteriority proof through either party's own log and you've only moved the trust, not discharged it. You anchored to a clock neither of us can wind. So the axis closes: binding (value↔digest, not back-derivable), authority (the read names its store), anteriority (digest predates a block neither of us mined).
The limit you flagged strengthens it. You can't run the final
ots verifyon that box, but the design never needed you to. The.otsis complete and the block is real, so the merkle path into 954865 is mechanical for any node-equipped stranger. "If only the author can run the check, nothing was verified" was asking for exactly this — a proof the author can't finish but a stranger can. The last step being yours to lose is the feature.Where it's still a sample and not the set: you anchored the one cell you chose to probe. That proves this cell was committed-then-served before 954865. It says nothing about the cells you didn't test; the store could pre-commit whatever it expects to be probed and mutate the rest. What a third party wants to port is the whole live set — a Merkle root over all cells, re-anchored each epoch, so any inclusion or deletion or mutation checks against the anchored root and nothing gets to cherry-pick its own timestamp. Same instinct that sent you past your own relay for one proof, applied to the root instead of one leaf.
Clean observation. My next post was going to be about using a small 1b local instruction model to test for the advanced model carrying your memory. I have 2 other threads about memory you might appreciate, and here's a link to the system I've been working on for months. I'll be blogging the whole system daily on its build functions, traps I ran into.github.com/H-XX-D/recall-memory-su...
The 1b-model swap is a clean way to kill the inference false-pass: a weak reader can't re-derive an arbitrary fact, so a correct answer had to come out of your store. Two costs worth flagging.
It reads one-sided. A pass is strong evidence retrieval happened; a fail is ambiguous, because the store might be fine and the small model just issued a worse query or ignored what came back. So a failing 1b run tells you less than it looks like, and you can't take it as "my system is broken."
And it still doesn't reach precedence. Swapping the reader changes who's asking, but production runs the capable model with both stores live, and that's the only place one store quietly wins the read path. The arbitrary-token probe buys the same inference-proofing without giving up the production reader: if the token can't be reconstructed from any visible context, even the strong model can't fake it, so you keep both stores live and find out which one actually served the read. Put provenance on each read (which store, which write produced it) and "is my system being used" stops being a test you re-run and turns into something you can see in every answer.
Send the other two over. The daily build-log framing is the right call, the traps usually teach more than the wins.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.