Testing an AI Memory Reliability Checklist on 3 Redacted Agent Setups

#ai #agents #machinelearning #productivity

I’m testing a small AI memory reliability checklist.

The question is simple:

When an AI agent reads project instructions, memory files, Cursor rules, or AGENTS.md
before acting, can we tell which instructions should actually govern action?

I’m looking for 3 people who use Claude, Cursor, Codex, or custom agents and are willing
to share redacted, non-sensitive instruction files.

Examples:

AGENTS.md
CLAUDE.md
.cursorrules
Cursor rules
project instructions
memory exports
SOPs/checklists

Please do not send API keys, passwords, private customer data, legal records, medical
records, financial records, HR records, or anything sensitive.

If something is private, redact it first and leave only the structure.

If you participate, I’ll return a short report covering:

stale instructions
conflicting rules
what should govern action
what should not govern action
missing verification gates
where a relevant memory could override a more authoritative one

This is not a security audit, legal review, compliance review, or production safety
certification.

It is a small research pilot to see whether this checklist is useful before I turn it
into a tool.

The public research behind it is here:

https://github.com/keniel13-ui/ai-memory-judgment-demo

The basic idea came from a failure pattern I’ve been testing:

Relevance is not authority.

A memory or instruction can be highly relevant to a request and still be the wrong thing
for an agent to obey.

For example:

an old instruction may still match the task but be superseded,
a preference may be relevant but should not override a project rule,
a workflow note may describe what happened before but not what should happen now,
a read-only question may share vocabulary with a write/execute policy.

The checklist tries to separate: