Self-Correcting Systems

Posted on Jun 3

I Turned My Agent Memory Research Into a Six-Agent Auditor

#ai #machinelearning #agentmemory #security

The research arc started with a question:

What does it mean for a memory to have the authority to govern an action, not just the relevance to answer a question?

Twenty-three documented claims later, the answer is not a single formula. It is a layered architecture: retrieval, ranking, authority scoring, execution gating, attribution tracing, and now tool-call authorization.

At some point, the next question became practical:

Can this inspect a real agent memory file without me running evaluator scripts by hand?

That became the Memory Authority Auditor — a deployed six-agent system that takes an instruction or memory file and returns a structured authority report.

This article explains what each agent does, what each one cannot do, and where the current ceiling is — including what the auditor cannot tell you.

One caveat up front: this is not the full research harness converted into a product. The harness tests structured scenarios with fields like governs, allowed_action_hint, and expected action labels. The auditor is different. It reads messy real-world instruction files — AGENTS.md, CLAUDE.md, Cursor rules, SOPs, project memory notes — and uses heuristic agents to surface stale instructions, loose authority, conflict risk, and missing verification gates.

That distinction matters.

Why Six Agents

The single-pass answer to "is this memory safe?" is wrong for the same reason a single retrieval strategy is wrong: different failure modes require different lenses.

A parser can split the file into auditable items. It cannot decide whether an item should govern action.

An authority classifier can label a memory as governing, context-only, or verify-first. It cannot detect when an old instruction conflicts with a current one.

A conflict detector can surface stale or loose authority. It cannot turn those findings into concrete gates.

A report writer can summarize the result. It should not invent findings that the earlier agents did not produce.

Each agent handles one lens. The point is not that six is a magic number. The point is that the audit trace stays inspectable. If the report says "human approval required," the user can see which agent produced the risk, which memory triggered it, and which gate was recommended.

That maps to the research principle behind CLAIM-19: a risky action should not end in "the model felt confident." It should have a traceable source.

The Six Agents

Agent 1 — Memory Extractor

The extractor takes raw text and splits it into auditable memory items.

It handles markdown-style sections, bullets, numbered lists, and paragraphs. Each extracted item receives:

an internal ID
the text
the section it came from
the source line
detected signals such as policy, credential, approval, temporary, superseded, access, financial, or external_action

This is not a formal schema validator. It does not require every memory to already contain fields like memory_type, priority, or governs.

That is intentional. Real agent memory files are often plain text. The extractor's job is to make that text auditable before the later agents classify it.

The research connection is CLAIM-17: downstream gates cannot compensate for missing authority structure. The product version starts by asking a practical question:

What authority signals can be recovered from the file that actually exists?

Agent 2 — Authority Classifier

The classifier labels each extracted item with an authority posture:

governs — looks like an active policy or instruction meant to constrain action
verify_first — contains sensitive, credential, approval, or external-action signals
superseded_possible — appears old, replaced, or unsafe to use as current authority
context_only — useful context, but not strong enough to govern action by itself

It also estimates action type and risk:

action types: read, write, execute
risk levels: low, medium, high

This is not the same as the attribution statuses from the research harness (GOVERNED, AUTHORITY_ONLY, DEFAULT, UNATTRIBUTABLE). Those belong to the structured evaluator.

The auditor's classifier is a product-facing approximation. It translates messy text into practical labels a user can review.

That limitation is important, but the value is real: a stale note, a current policy, a credential-like memory, and a generic context note should not all carry the same weight just because they appear in the same file.

Agent 3 — Conflict Detector

The conflict detector looks for patterns that should not silently govern future behavior.

Current checks include:

stale or superseded instructions
loose approval language near sensitive actions
credential-like memories that should require verification before disclosure
read/write overblocking, where a process requirement may govern a simple lookup too aggressively
authority collisions, such as loose contractor-access wording conflicting with a current access matrix
missing authority layer, when no clear governing policy memories are detected

This is not a complete policy-conflict solver. It does not build a full graph of every possible governs overlap because the input file usually does not have that structure.

What it does is surface the kinds of authority mistakes that real instruction files accumulate: old exceptions, vague approvals, sensitive facts without gates, and unresolved conflicts between current and old guidance.

That is the product form of the conflict pressure seen in CLAIM-15 and later claims: ranking can expose collisions, but a separate layer has to name them.

Agent 4 — Verification Gate

The verification gate turns classifications and findings into recommended gates.

Examples:

verify_before_action for items labeled verify_first
block_as_governing_memory for items that may be superseded
human_approval_required for high-risk items
resolve_conflict_before_action for authority collisions, loose approvals, or credential exposure

This agent does not execute anything. It does not mutate the memory file. It does not enforce a policy at runtime.

It records what a runtime system should require before letting the memory govern action.

That makes the auditor useful before integration. A user can paste a memory file and get the shape of the gates they should add before connecting that memory to tools, APIs, email, databases, or write-capable agents.

The research connection is CLAIM-20: execution-time checks are a necessary backstop, but only when there is something concrete enough to check. The auditor's gate agent is the product-side checklist for that backstop.

Agent 5 — Authority Mapper

The authority mapper groups governing memories into practical categories:

startup source of truth
archive access constraints
active project constraints
budget and capability constraints
action and tool constraints
verification requirements
collaboration rules

This is the layer that makes the audit legible.

A raw list of findings is useful to a developer. A map is useful to anyone trying to understand what their agent is actually being told to obey — before it starts obeying it.

Instead of only saying "item M004 is high risk," the map can show:

These are the rules shaping startup behavior.

These are the constraints on archive access.

These are the verification requirements before action.

That is the product version of the authority coverage question from the research. The harness asks whether an action has a traceable governance source. The auditor asks where the governing instructions are concentrated in a real file.

Agent 6 — Report Writer

The report writer synthesizes the outputs into a final audit report.

It produces:

posture: needs_review, usable_with_gates, or low_observed_risk
summary counts
authority label distribution
high-risk item count
conflict/finding count
recommended verification gates
authority map categories
recommendations

The report writer does not say "this memory store is safe."

It says:

Here is what was detected.

Here are the gates recommended.

Here are the authority categories present.

Here are the limitations.

That restraint matters. A memory auditor that overstates certainty becomes the same problem it was built to catch.

What the Auditor Does Not Do

The auditor is not a content validator.

It does not prove that a memory is true, current, or semantically correct. It can flag that an instruction looks stale or that a credential-like item should require verification, but it cannot independently know whether the content is accurate.

The auditor is not an operation-context gate.

CLAIM-22 moved authorization away from memory self-description toward operation context. CLAIM-23 moved it again toward concrete tool-call parameters and external grants. The deployed auditor does not do that yet. It analyzes the memory file before action, not a proposed tool call at execution time.

The auditor is not a write-time admission gate.

It inspects a file after the memory or instruction has already been written. A future version should intercept authority-bearing memories before they enter the store.

The auditor is not a formal compliance or security certification.

It is a prototype for making authority visible enough for human review before memory is connected to action-capable tools.

The Research Connection

Every agent exists because the research exposed a failure mode a single pass would miss.

Agent 1 exists because CLAIM-17 showed that missing authority structure creates downstream failures.
Agent 2 exists because CLAIM-19 made attribution visible: risky actions need a traceable source, not just confidence.
Agent 3 exists because the stress packets showed unresolved authority collisions cannot be fixed by ranking alone.
Agent 4 exists because CLAIM-20 showed that execution gates are necessary but bounded.
Agent 5 exists because authority coverage needs to be legible to someone who did not write the evaluator.
Agent 6 exists because every article in this series showed that the honest summary is the hardest part to get right.

The auditor is not the whole research architecture.

It is the first product layer built from it.

Current State

The auditor is deployed on Cloud Run as one web service plus six specialized agent services:

memory_extractor
  -> authority_classifier
  -> conflict_detector
  -> verification_gate
  -> authority_mapper
  -> report_writer

The live app is here:

https://memory-authority-auditor-web-992750435781.us-central1.run.app

The product repo is here:

https://github.com/keniel13-ui/memory-authority-auditor

The research repo is here:

https://github.com/keniel13-ui/ai-memory-judgment-demo

What Is Open

Three gaps are still open.

First: write-time authorization.

The auditor reads memories after they exist. It does not yet decide whether an agent was allowed to write an authority-bearing memory in the first place.

Second: operation-bound authorization.

The auditor does not yet inspect a live tool call and compare it to an external grant table. That is the CLAIM-23 direction, not the current product behavior.

Third: conflict resolution.

The conflict detector surfaces stale instructions, loose approvals, and authority collisions. It does not decide which instruction wins in every case. Resolution still requires an arbitration layer or a human reviewer.

Those gaps are not hidden. They are the next build path.

The Ledger Entry

The Memory Authority Auditor is the product layer of the Self-Correcting Systems research series.

It does not replace the research harness. It does not claim benchmark-grade safety. It takes the core authority/relevance distinction and turns it into a working audit workflow for real memory and instruction files.

Public product: https://memory-authority-auditor-web-992750435781.us-central1.run.app

Product repo: https://github.com/keniel13-ui/memory-authority-auditor

Research repo: https://github.com/keniel13-ui/ai-memory-judgment-demo

The research started by asking whether memory should be judged only by relevance.

The auditor answers with a product-shaped question:

Before this memory file governs an agent, what authority risks should a human see?

That is not the final layer.

But it is the first one that makes authority visible before an agent connects memory to action.

This is part of the Self-Correcting Systems research series. Prior articles cover the framework, the authority policy, the access gate, the scoring formula, the metadata precondition, and tool-call authorization. The full series index is at Start Here.

DEV Community