DEV Community

Self-Correcting Systems
Self-Correcting Systems

Posted on

Start Here: My AI Memory Research So Far

I've published 13 articles over the past month. They're not random. They follow one
research arc that started with a simple question and ended somewhere harder.

Here's the arc in plain language.

Stage 1 — Does memory survive a reset?

I started by asking whether AI agent memory could preserve useful context after a
session ends. The short answer: yes, but only if the memory is structured right.
Summary-only memory collapsed. Layered memory with corrections and explicit state held.

Article: The Zero-Budget AI Memory System That Survives Session Resets

Stage 2 — Does correction memory matter?

The next question: what happens when the agent was wrong and needs to update? I tested
whether correction memory — explicitly flagging what changed and why — improved
recovery. It did. But it also revealed a new problem.

Article: Three Failures My AI Memory System Tested — And the Flaw It Revealed in Itself

Stage 3 — Retrieval accuracy can diverge from safety

When I added real retrieval to the loop, something broke. The agent started finding the
right memory and then acting from the wrong one. Retrieval accuracy went up. Safety
results went down. That was the turning point.

Article: Higher Retrieval Accuracy Had the Worse Safety Result

Stage 4 — Relevance is not authority

This is the core finding.

A memory can be a perfect semantic match for a request and still be the wrong memory
for the agent to obey. Stale instructions, superseded rules, provisional notes — they
can all score higher on relevance than the policy that should actually govern the
action.

The fix wasn't better retrieval. It was a separate authority pass: policy and
correction memories route before retrieval runs.

Article: In This Memory Test, Relevance Wasn't Authority

Stage 5 — Testing it on real setups

The research is now moving from controlled scenarios to real-world agent instructions.
I'm offering 3 free memory reliability audits for people using Claude, Cursor, Codex,
or custom agents.

If you have an AGENTS.md, CLAUDE.md, .cursorrules, or any file your AI reads before
acting — I'll return a short report: stale instructions, conflicting rules, authority
hierarchy, missing verification gates.

Post: Testing an AI Memory Reliability Checklist on 3 Redacted Agent Setups


The public research is here:
https://github.com/keniel13-ui/ai-memory-judgment-demo

Every claim has a ledger entry. Every result has a file. Every overclaim is written
down.

That's what I've built so far.

Top comments (0)