Self-Correcting Systems

Posted on May 25

AI Memory Needs an Authority Policy, Not Just More Context

#agents #ai #llm #programming

When records conflict, the agent needs explicit rules for which one is allowed to steer the answer.

Long-running AI systems eventually retrieve conflicting but individually valid memories:

an old summary,
a newer source file,
a remembered preference,
an explicit correction,
an unresolved question,
a draft labeled "final,"
an archive note that is useful historically but stale operationally.

Without an authority policy, the agent falls back to implicit heuristics: recency, semantic similarity, confidence of wording, or whatever fits the current prompt.

That is how stale facts start wearing the mask of current truth.

The solution is not just more memory. It is governed memory: a rule for which record wins when retrieved memories disagree.

Principles before layers

I do not think the exact hierarchy should be universal.

A coding agent, research assistant, legal agent, BI agent, and solo writing workflow should not all use the same authority order.

But the policy should follow a few principles.

1. Observed state beats remembered state

Live checks outrank stored memory when the claim depends on current reality: server status, API reachability, latest logs, current price, CI status, or local model availability.

Memory can say what to check. It should not replace the check.

2. Primary records beat compressed records

Summaries are useful because they are fast. They are also lossy.

If a current source file conflicts with a summary, the source file wins. If a raw query result conflicts with a dashboard summary, the query/provenance path deserves inspection before the summary becomes organizational memory.

Compressed memory should orient. Primary records should decide.

3. Negative constraints beat positive preferences

Preferences say what usually works.

Corrections say what must not be repeated.

If memory says:

The user likes vivid language.

but a correction says:

Do not use mythic language in local-business copy.

the correction wins in that project.

Corrections are not permanent truth. They are active constraints until reviewed or superseded.

4. Current operating state beats historical plans

"We planned to do X" is not the same as "X is next."

Old plans are useful history. They should not drive current action unless the current state reactivates them.

5. Unresolved claims block premature certainty

Unresolved memory is not low-value memory.

It is a gate.

If one record says:

The agent is wired to the current brain.

and another says:

Local model availability still needs a live check.

the agent can say the wiring exists. It cannot say the full local fallback is available until verified.

The unresolved record does not replace the decision. It limits what certainty is allowed.

A default order for solo agent work

For my current file-based workflow, this is the v0.1 order:

1. Live external checks and current reality
2. Current source files
3. Explicit corrections
4. Current state files
5. Active decisions and priorities
6. Unresolved questions and verification gates
7. Recent summaries
8. Preferences and style guidance
9. Archive / historical memory

This is not a universal law. It is a default for one class of work: multi-session solo building, writing, and agent operation.

For other domains, I would change the order.

Coding agent:
1. tests / compiler / runtime output
2. current source files
3. issue or PR requirements
4. explicit corrections
5. recent implementation notes
6. summaries
7. archive

Research agent:
1. primary sources / source documents
2. current notes with citations
3. explicit corrections
4. unresolved hypotheses
5. summaries
6. preferences
7. archive

Production incident:
1. live system state
2. active incident document
3. current logs / metrics
4. runbook
5. normal roadmap / plans
6. summaries
7. archive

The point is not to worship one ordering. The point is to make the ordering explicit enough to inspect, test, and correct.

Enforcement: assign authority before generation

A hierarchy written in prose is easy for a model to ignore.

The agent needs an application step before it writes the answer:

For each retrieved memory:
1. classify source type
2. classify epistemic status
3. check whether it is current, archived, superseded, unresolved, or verification-required
4. compare it against higher-authority records
5. assign one allowed action:
   - answer
   - answer as context
   - warn
   - verify first
   - block
   - archive only

The key field is allowed_action.

A memory should not automatically steer the answer just because it was retrieved.

Example:

summary says: Article 01 needs a CTA
current source file says: CTA was intentionally removed
correction says: do not re-add sales CTA to Article 01

policy result:
- summary = context only
- source file = answer
- correction = block CTA reintroduction

The agent should not average those records into "maybe add a softer CTA." The correction blocks the action.

For higher-risk tasks, I would require the model to produce a small authority trace before the final answer:

claim: "Article 01 should not include a sales CTA"
winning_source: "current source file + active correction"
conflicting_sources:
  - "old summary saying CTA was needed"
allowed_action: "answer"
blocked_actions:
  - "re-add Gumroad CTA"
verification_required: false

That trace can be hidden from the user in a product, but it should exist somewhere for audit.

File and metadata support

I would not rely on prompt text alone. Agents can ignore or reinterpret subtle instructions.

In a file-based setup, index.md can make the same policy visible:

# Memory Authority Policy

## Order
1. live checks
2. source/
3. corrections.md
4. current_state.md
5. decisions.md
6. unresolved.md
7. summaries.md
8. preferences.md
9. archive/

## Enforcement Notes
- Prefer current source files over summaries.
- Corrections override preferences in their scoped context.
- Verification-required items must be checked before being used as settled facts.
- Archive supports historical answers, not current-state claims.

For structured memory, the same idea becomes metadata:

source_type: source_file | correction | current_state | decision | unresolved | summary | preference | archive
epistemic_status: settled | inferred | contested | unresolved | verification_required | superseded
status: active | ready | parked | blocked | archived | superseded
priority: next | active | later | none
verification_required: true | false
allowed_action: computed_at_runtime

allowed_action should be computed at retrieval time. A memory that could answer yesterday may need verification today.

Dynamic authority needs scope

Sometimes a lower layer should temporarily outrank a higher one.

During a production incident, an active incident file may outrank the normal roadmap. During legal review, approved policy language may outrank product preference. During a live deployment, current logs may outrank yesterday's state file.

Temporary authority needs scope:

temporary_authority:
  mode: production_incident
  elevated_source: incident_current.md
  outranks: roadmap.md
  scope: payment outage response
  expires_when: incident closed

If elevation has no scope or expiry, it becomes a loophole.

Tradeoffs

Authority policies are not free.

They add:

more careful prompting,
more metadata,
more token use when checking conflicts,
more chances for the agent to become overcautious,
more maintenance when the policy itself fails.

A flatter memory setup may be better for short projects, creative exploration, low-stakes drafting, live debugging, or pair programming where the human is actively supervising.

The hierarchy earns its cost when work is long-running, multi-session, expensive to correct, or easy to corrupt with stale context.

What I have measured so far

This is not benchmark-grade.

But I did run a small deterministic calibration pass against the authority-policy logic:

12 scenarios
35 memory objects
3 threshold settings in v0.1
1 risk-adjusted threshold in v0.2
no model generation; policy logic only

The result:

strict:                 29/35 correct, 0 false-certainty errors, 6 overblocking errors
balanced:               34/35 correct, 0 false-certainty errors, 1 overblocking error
permissive:             35/35 correct, 0 false-certainty errors, 0 overblocking errors
balanced_risk_adjusted: 35/35 correct, 0 false-certainty errors, 0 overblocking errors

The useful finding was not "this is proven."

The useful finding was the tradeoff: strict gating prevented false certainty but overblocked legitimate low-risk answers. A risk-adjusted threshold fixed that edge case on this scenario set.

The next test is harder: external scenarios, external scoring, and an actual retrieval-plus-generation loop.

Failure modes

An authority policy can fail in both directions.

Too loose: stale summaries beat current records.

Example: process health is mistaken for full local-model availability because a stale summary says "agent healthy."

Fix: split process health from model availability and mark model availability verification_required.

Too rigid: the agent overblocks low-risk settled tasks.

Example: a simple design choice is downgraded to a warning because the score is barely below the answer threshold.

Fix: allow low-risk, settled, verification-free memories to answer under a lower threshold.

Dynamic override abuse: temporary elevation becomes permanent.

Fix: require mode, scope, elevated source, and expiry.

Version the policy itself

The authority policy is also memory.

It needs versioning.

authority_policy:
  version: "0.2"
  default_profile: "solo_agent_work"
  active_since: "2026-05-25"
  last_changed_because: "balanced threshold overblocked low-risk settled memory"
  review_trigger: "two repeated hierarchy failures or one high-risk failure"

If the same class of failure repeats, update the policy.

If the hierarchy blocks too many correct answers, loosen the relevant threshold or add a scoped exception.

The policy itself should be corrected by the same system it governs.

Practical check

Before acting on memory:

1. What claim am I making?
2. What source supports it?
3. Is there a conflicting higher-authority source?
4. Is there an active correction?
5. Is the claim unresolved, stale, contested, or verification-required?
6. Is this memory allowed to answer, warn, block, verify first, or only provide history?
7. What would change this answer?

That is the core habit.

Not more memory. Governed memory.

This is part of a short series on AI memory as judgment infrastructure: the zero-budget foundation, correction memory, preserving unresolved questions, and three failures that tested the system.