I’m testing a small AI memory reliability checklist.
The question is simple:
When an AI agent reads project instructions, memory files, Cursor rules, or
AGENTS.md
before acting, can we tell which instructions should actually govern action?
I’m looking for 3 people who use Claude, Cursor, Codex, or custom agents and are willing
to share redacted, non-sensitive instruction files.
Examples:
AGENTS.mdCLAUDE.md.cursorrules- Cursor rules
- project instructions
- memory exports
- SOPs/checklists
Please do not send API keys, passwords, private customer data, legal records, medical
records, financial records, HR records, or anything sensitive.
If something is private, redact it first and leave only the structure.
If you participate, I’ll return a short report covering:
- stale instructions
- conflicting rules
- what should govern action
- what should not govern action
- missing verification gates
- where a relevant memory could override a more authoritative one
This is not a security audit, legal review, compliance review, or production safety
certification.
It is a small research pilot to see whether this checklist is useful before I turn it
into a tool.
The public research behind it is here:
https://github.com/keniel13-ui/ai-memory-judgment-demo
The basic idea came from a failure pattern I’ve been testing:
Relevance is not authority.
A memory or instruction can be highly relevant to a request and still be the wrong thing
for an agent to obey.
For example:
- an old instruction may still match the task but be superseded,
- a preference may be relevant but should not override a project rule,
- a workflow note may describe what happened before but not what should happen now,
- a read-only question may share vocabulary with a write/execute policy.
The checklist tries to separate:
What is close to the query?
from:
What is allowed to govern the action?
If you want to test it, comment here or DM me.
I’ll take the first 3 redacted setups.
Top comments (0)