The research arc started with a question:
What does it mean for a memory to have the authority to govern an action, not just the relevance to answer a question?
Twenty-three documented claims later, the answer is not a single formula. It is a layered architecture: retrieval, ranking, authority scoring, execution gating, attribution tracing, and now tool-call authorization.
At some point, the next question became practical:
Can this inspect a real agent memory file without me running evaluator scripts by hand?
That became the Memory Authority Auditor — a deployed six-agent system that takes an instruction or memory file and returns a structured authority report.
This article explains what each agent does, what each one cannot do, and where the current ceiling is — including what the auditor cannot tell you.
One caveat up front: this is not the full research harness converted into a product. The harness tests structured scenarios with fields like governs, allowed_action_hint, and expected action labels. The auditor is different. It reads messy real-world instruction files — AGENTS.md, CLAUDE.md, Cursor rules, SOPs, project memory notes — and uses heuristic agents to surface stale instructions, loose authority, conflict risk, and missing verification gates.
That distinction matters.
Why Six Agents
The single-pass answer to "is this memory safe?" is wrong for the same reason a single retrieval strategy is wrong: different failure modes require different lenses.
A parser can split the file into auditable items. It cannot decide whether an item should govern action.
An authority classifier can label a memory as governing, context-only, or verify-first. It cannot detect when an old instruction conflicts with a current one.
A conflict detector can surface stale or loose authority. It cannot turn those findings into concrete gates.
A report writer can summarize the result. It should not invent findings that the earlier agents did not produce.
Each agent handles one lens. The point is not that six is a magic number. The point is that the audit trace stays inspectable. If the report says "human approval required," the user can see which agent produced the risk, which memory triggered it, and which gate was recommended.
That maps to the research principle behind CLAIM-19: a risky action should not end in "the model felt confident." It should have a traceable source.
The Six Agents
Agent 1 — Memory Extractor
The extractor takes raw text and splits it into auditable memory items.
It handles markdown-style sections, bullets, numbered lists, and paragraphs. Each extracted item receives:
- an internal ID
- the text
- the section it came from
- the source line
- detected signals such as
policy,credential,approval,temporary,superseded,access,financial, orexternal_action
This is not a formal schema validator. It does not require every memory to already contain fields like memory_type, priority, or governs.
That is intentional. Real agent memory files are often plain text. The extractor's job is to make that text auditable before the later agents classify it.
The research connection is CLAIM-17: downstream gates cannot compensate for missing authority structure. The product version starts by asking a practical question:
What authority signals can be recovered from the file that actually exists?
Agent 2 — Authority Classifier
The classifier labels each extracted item with an authority posture:
-
governs— looks like an active policy or instruction meant to constrain action -
verify_first— contains sensitive, credential, approval, or external-action signals -
superseded_possible— appears old, replaced, or unsafe to use as current authority -
context_only— useful context, but not strong enough to govern action by itself
It also estimates action type and risk:
- action types:
read,write,execute - risk levels:
low,medium,high
This is not the same as the attribution statuses from the research harness (GOVERNED, AUTHORITY_ONLY, DEFAULT, UNATTRIBUTABLE). Those belong to the structured evaluator.
The auditor's classifier is a product-facing approximation. It translates messy text into practical labels a user can review.
That limitation is important, but the value is real: a stale note, a current policy, a credential-like memory, and a generic context note should not all carry the same weight just because they appear in the same file.
Agent 3 — Conflict Detector
The conflict detector looks for patterns that should not silently govern future behavior.
Current checks include:
- stale or superseded instructions
- loose approval language near sensitive actions
- credential-like memories that should require verification before disclosure
- read/write overblocking, where a process requirement may govern a simple lookup too aggressively
- authority collisions, such as loose contractor-access wording conflicting with a current access matrix
- missing authority layer, when no clear governing policy memories are detected
This is not a complete policy-conflict solver. It does not build a full graph of every possible governs overlap because the input file usually does not have that structure.
What it does is surface the kinds of authority mistakes that real instruction files accumulate: old exceptions, vague approvals, sensitive facts without gates, and unresolved conflicts between current and old guidance.
That is the product form of the conflict pressure seen in CLAIM-15 and later claims: ranking can expose collisions, but a separate layer has to name them.
Agent 4 — Verification Gate
The verification gate turns classifications and findings into recommended gates.
Examples:
-
verify_before_actionfor items labeledverify_first -
block_as_governing_memoryfor items that may be superseded -
human_approval_requiredfor high-risk items -
resolve_conflict_before_actionfor authority collisions, loose approvals, or credential exposure
This agent does not execute anything. It does not mutate the memory file. It does not enforce a policy at runtime.
It records what a runtime system should require before letting the memory govern action.
That makes the auditor useful before integration. A user can paste a memory file and get the shape of the gates they should add before connecting that memory to tools, APIs, email, databases, or write-capable agents.
The research connection is CLAIM-20: execution-time checks are a necessary backstop, but only when there is something concrete enough to check. The auditor's gate agent is the product-side checklist for that backstop.
Agent 5 — Authority Mapper
The authority mapper groups governing memories into practical categories:
- startup source of truth
- archive access constraints
- active project constraints
- budget and capability constraints
- action and tool constraints
- verification requirements
- collaboration rules
This is the layer that makes the audit legible.
A raw list of findings is useful to a developer. A map is useful to anyone trying to understand what their agent is actually being told to obey — before it starts obeying it.
Instead of only saying "item M004 is high risk," the map can show:
These are the rules shaping startup behavior.
These are the constraints on archive access.
These are the verification requirements before action.
That is the product version of the authority coverage question from the research. The harness asks whether an action has a traceable governance source. The auditor asks where the governing instructions are concentrated in a real file.
Agent 6 — Report Writer
The report writer synthesizes the outputs into a final audit report.
It produces:
- posture:
needs_review,usable_with_gates, orlow_observed_risk - summary counts
- authority label distribution
- high-risk item count
- conflict/finding count
- recommended verification gates
- authority map categories
- recommendations
The report writer does not say "this memory store is safe."
It says:
Here is what was detected.
Here are the gates recommended.
Here are the authority categories present.
Here are the limitations.
That restraint matters. A memory auditor that overstates certainty becomes the same problem it was built to catch.
What the Auditor Does Not Do
The auditor is not a content validator.
It does not prove that a memory is true, current, or semantically correct. It can flag that an instruction looks stale or that a credential-like item should require verification, but it cannot independently know whether the content is accurate.
The auditor is not an operation-context gate.
CLAIM-22 moved authorization away from memory self-description toward operation context. CLAIM-23 moved it again toward concrete tool-call parameters and external grants. The deployed auditor does not do that yet. It analyzes the memory file before action, not a proposed tool call at execution time.
The auditor is not a write-time admission gate.
It inspects a file after the memory or instruction has already been written. A future version should intercept authority-bearing memories before they enter the store.
The auditor is not a formal compliance or security certification.
It is a prototype for making authority visible enough for human review before memory is connected to action-capable tools.
The Research Connection
Every agent exists because the research exposed a failure mode a single pass would miss.
- Agent 1 exists because CLAIM-17 showed that missing authority structure creates downstream failures.
- Agent 2 exists because CLAIM-19 made attribution visible: risky actions need a traceable source, not just confidence.
- Agent 3 exists because the stress packets showed unresolved authority collisions cannot be fixed by ranking alone.
- Agent 4 exists because CLAIM-20 showed that execution gates are necessary but bounded.
- Agent 5 exists because authority coverage needs to be legible to someone who did not write the evaluator.
- Agent 6 exists because every article in this series showed that the honest summary is the hardest part to get right.
The auditor is not the whole research architecture.
It is the first product layer built from it.
Current State
The auditor is deployed on Cloud Run as one web service plus six specialized agent services:
memory_extractor
-> authority_classifier
-> conflict_detector
-> verification_gate
-> authority_mapper
-> report_writer
The live app is here:
https://memory-authority-auditor-web-992750435781.us-central1.run.app
The product repo is here:
https://github.com/keniel13-ui/memory-authority-auditor
The research repo is here:
https://github.com/keniel13-ui/ai-memory-judgment-demo
What Is Open
Three gaps are still open.
First: write-time authorization.
The auditor reads memories after they exist. It does not yet decide whether an agent was allowed to write an authority-bearing memory in the first place.
Second: operation-bound authorization.
The auditor does not yet inspect a live tool call and compare it to an external grant table. That is the CLAIM-23 direction, not the current product behavior.
Third: conflict resolution.
The conflict detector surfaces stale instructions, loose approvals, and authority collisions. It does not decide which instruction wins in every case. Resolution still requires an arbitration layer or a human reviewer.
Those gaps are not hidden. They are the next build path.
The Ledger Entry
The Memory Authority Auditor is the product layer of the Self-Correcting Systems research series.
It does not replace the research harness. It does not claim benchmark-grade safety. It takes the core authority/relevance distinction and turns it into a working audit workflow for real memory and instruction files.
Public product: https://memory-authority-auditor-web-992750435781.us-central1.run.app
Product repo: https://github.com/keniel13-ui/memory-authority-auditor
Research repo: https://github.com/keniel13-ui/ai-memory-judgment-demo
The research started by asking whether memory should be judged only by relevance.
The auditor answers with a product-shaped question:
Before this memory file governs an agent, what authority risks should a human see?
That is not the final layer.
But it is the first one that makes authority visible before an agent connects memory to action.
This is part of the Self-Correcting Systems research series. Prior articles cover the framework, the authority policy, the access gate, the scoring formula, the metadata precondition, and tool-call authorization. The full series index is at Start Here.
Top comments (0)