Self-Correcting Systems

Posted on Jun 1

I Built a Multi-Agent Authority Auditor for AI Memory Files

#agents #buildmultiagents #gemini #adk

Education Track: Build Multi-Agent Systems with ADK

What I Built

I built AI Memory Authority Auditor, a deployed multi-agent web app that audits agent memory and instruction files.

The core question is:

Which old instructions should your AI stop obeying?

Agent memory files accumulate the same way legacy code does: temporary exceptions become permanent, superseded instructions linger, and nobody remembers which rule actually wins.

This tool lets someone paste an AGENTS.md, CLAUDE.md, Cursor rules file, SOP, or project memory note. It then separates relevance from authority, detects stale or conflicting instructions, recommends verification gates, and exports a Markdown audit report.

Live app:

Direct link:

https://memory-authority-auditor-web-992750435781.us-central1.run.app

The Demo Result

I tested the deployed app on a deliberately seeded sample file containing current policy, old notes, and superseded instructions.

The sample included:

a current rule requiring user verification before external email;
a current rule requiring human approval before database writes;
a current rule requiring the access matrix before granting contractor access;
an old note saying a consultant can probably access a donor list;
a superseded note saying contractors may get admin-ish reach during setup.

The deployed audit produced:

9 memory/instruction items;
7 high-risk items;
5 findings;
14 recommended verification gates;
2 authority map categories;
posture: needs review.

Every finding came from a file only a few paragraphs long.

The agent trace showed the full pipeline:

memory_extractor: 9 items;
authority_classifier: 9 classifications;
conflict_detector: 5 findings;
verification_gate: 14 gates;
authority_mapper: 2 categories;
report_writer: posture needs_review.

That trace matters because the app is not only showing a final answer. It is showing which specialized agent produced each stage of the audit.

What The Audit Caught

Loose Approval

The audit flagged:

The consultant can probably access the donor list if the director already mentioned it.

That sentence is dangerous because it sounds like context, but it sits near access control. A future agent could treat it as permission unless the system marks it as weak authority.

Stale Instruction

The audit flagged:

Old instruction: contractors may get admin-ish reach during setup.

An agent should not treat that as current governing policy.

Authority Collision

The audit found a conflict between:

the current access-matrix rule;
the loose donor-list sentence;
the old contractor-access instruction;
the superseded replacement instruction.

That is the real point of the project:

A memory can be relevant and still not be allowed to govern the next action.

Why Existing Memory Systems Miss This

Most memory systems optimize retrieval.

They answer:

What context matches this task?

But action-capable agents need another question:

What context is allowed to govern this action?

Those are different problems.

If an agent retrieves a stale instruction, a temporary exception, and a current policy together, semantic relevance alone does not tell the agent which one wins.

The result can be a clean-looking action on the wrong authority.

How The Multi-Agent System Works

This task is a poor fit for one giant prompt because the responsibilities are different. I split the work into six specialized agents and deployed them as separate Google Cloud Run services.

Web service:

memory-authority-auditor-web

Agent services:

memory-extractor-agent
authority-classifier-agent
conflict-detector-agent
verification-gate-agent
authority-mapper-agent
report-writer-agent

The web app orchestrates the remote services and displays an agent trace in the UI.

The Agents

The six agents each own one part of the audit:

Memory Extractor: splits raw instruction text into auditable memory items.
Authority Classifier: labels each item as governs, verify_first, superseded_possible, or context_only, then estimates action type and risk.
Conflict Detector: finds loose approvals, stale or superseded instructions, read/write overblocking, and authority collisions.
Verification Gate Agent: converts risks into gates such as human approval, source-of-truth checks, blocking superseded memories, or resolving conflicts before action.
Authority Mapper: groups governing memories into categories like startup source of truth, archive constraints, active project constraints, budget/capability limits, action/tool constraints, and verification requirements.
Report Writer: produces the final report with posture, counts, recommendations, findings, gates, authority map categories, and limitations.

The important part is that no single prompt has to do every job. Each agent receives a narrower input and produces a narrower output that the next stage can inspect.

Technical Shape

The frontend calls a single /api/audit endpoint.

The web service then calls the six deployed agent services in order:

memory_extractor
  -> authority_classifier
  -> conflict_detector
  -> verification_gate
  -> authority_mapper
  -> report_writer

Each stage has a narrow contract:

{
  "agent": "authority_classifier",
  "classifications": [
    {
      "id": "M001",
      "authority_label": "governs",
      "risk": "high",
      "action_type": "write"
    }
  ]
}

That contract boundary made the system easier to debug than a single large prompt.

If the extractor splits text badly, the trace shows it.

If the classifier over-labels authority, the trace shows it.

If the gate agent creates too many gates, the trace shows it.

What Surprised Me

The most useful output was not only the findings.

The Authority Map became just as important.

A memory file can have zero obvious conflicts and still contain instructions that govern startup behavior, tool access, budget limits, or verification requirements.

If those authority-bearing rules are invisible, the agent may still be shaped by instructions nobody is reviewing.

What Was Challenging

The hardest part was making the system honest.

It would have been easy to build a generic memory risk scanner and stop there. But the more interesting problem is authority.

The system has to separate:

what is relevant;
what is current;
what is governing;
what requires verification;
what must be blocked from controlling action.

The current version is still a prototype. It is not a formal benchmark, and the sample file is intentionally seeded to demonstrate the failure mode. The goal is practical inspectability: make authority visible enough for a human to review before connecting memory to action-capable tools.

Key Learning

Retrieval answers:

What context matches?

Authority asks:

What is allowed to govern action?

Those are different questions.

As agents get longer memory and more tool access, the second question matters more.

Final Thought

The future of agent memory cannot just be bigger context windows and better retrieval.

We also need authority metadata, verification gates, and audit trails that explain why an action was allowed.

Memory is input.

Authority is the action boundary.

Top comments (2)

xulingfeng • Jun 1

"Relevance is not authority" — that's the cleanest framing of what I was reaching for in our own memory system (MemBridge). We ran into the exact same problem: temporary overrides gradually becoming de facto policy.

The stale/superseded/conflicting categorization is smart. How do you handle transitive effects — does the audit catch cases where instruction A overrides B, but B still governs C?

Self-Correcting Systems • Jun 1

That “temporary overrides becoming de facto policy” phrasing is exactly the shape of the
problem.

Right now the deployed audit catches the simpler version first:

stale instruction;
superseded instruction;
loose approval;
direct authority collision;
governing rule that needs verification before action.

For transitive effects, the current version is not fully solving that yet. It can surface
the ingredients, but it does not yet build a complete dependency graph like:

A overrides B
B governs C
therefore A may indirectly affect C

That is one of the next places this needs to evolve.

The direction I’m thinking about is an authority graph where each memory item can carry
relationships like:

replaces
supersedes
narrows
extends
depends_on
governs
requires_verification_from

Then the audit can ask a better question:

If this instruction is stale or superseded, what downstream rules or actions still
inherit from it?

That would catch the case where B no longer directly wins, but some rule C is still
quietly depending on B’s old authority.

So short answer: V1 catches direct collisions and stale/superseded authority. Transitive
authority inheritance is the next layer, and I think it needs to be modeled explicitly
rather than left to prompt interpretation.