Self-Correcting Systems

Posted on May 25

AI Memory Should Decide What Context Is Allowed To Do

#agents #ai #llm #programming

Retrieval gets you the records. A mature memory system must also decide what each record is permitted to do.

Long-running AI systems eventually retrieve multiple valid but conflicting memories:

an old summary,
a current source file,
a past preference,
an explicit correction,
an unresolved question,
a draft labeled "final."

The critical failure mode is not always lack of retrieval. Sometimes the agent retrieves the right record and uses it the wrong way: treating a stale summary as settled fact, a preference as binding instruction, or an unresolved question as confirmed knowledge.

That produces false certainty. The agent sounds confident because it has context, but the context is being used with the wrong authority.

The next layer after authority policy is an access policy:

rules that decide what each retrieved memory is allowed to do.

Core idea

Not all relevant memories should steer the answer equally.

A memory can be relevant and still not be allowed to answer.

Some memories should answer directly. Some should provide background only. Some should warn. Some should require verification. Some should block an action. Some should remain historical.

Without this layer, even a good authority hierarchy can still produce overconfident mistakes.

Action classes

The access policy I am testing uses six action classes:

Answer: the memory can directly influence the response.
Answer as context: the memory provides useful background but does not decide the action.
Warn: the memory surfaces uncertainty or risk without blocking progress.
Verify first: the memory is relevant but cannot authorize a settled answer until checked.
Block: the memory actively prevents a proposed action.
Archive only: the memory is preserved but should not influence current decisions.

The important shift is this:

relevance is not permission.

Principles guiding the policy

1. Relevance does not imply authority

A stale plan can be highly relevant to a question about the project.

That does not mean it should decide the next action.

2. Negative evidence deserves stronger gates

Corrections, unresolved questions, verification requirements, and contradictions should constrain more aggressively than positive claims.

A correction saying "do not repeat this mistake" should not be blended into a normal preference.

3. Certainty should match evidence strength

If a memory says something is unresolved, contested, stale, or verification-required, the answer should carry that boundary.

The agent should not flatten the boundary into a clean conclusion.

4. Status and priority are distinct

"Ready" does not mean "next."

An article can be ready beyond copyedits and still not be the current writing priority. Collapsing those fields leads to predictable priority errors.

Default access policy v0.1

For each retrieved memory, classify the memory and assign an action:

if verification_required = true:
  allowed_action = verify_first

if epistemic_status = unresolved or contested:
  allowed_action = warn

if active correction blocks the proposed action:
  allowed_action = block

if status = superseded or archived and query is not historical:
  allowed_action = archive_only

if status = ready and priority != next:
  allowed_action = answer_context

otherwise:
  map to answer / warn / archive_only using source strength, freshness, and confidence

This starts conservative and is tunable.

The policy is not trying to make memory louder. It is trying to prevent the wrong kind of memory from steering the answer.

Implementation sketch

A basic enforcement prompt:

When using memory:
- Classify each retrieved record by source type, epistemic status, status, and priority.
- Assign one allowed_action from: answer, answer_context, warn, verify_first, block, archive_only.
- Higher-authority records override lower-authority records.
- If a correction blocks an action, surface it before proceeding.
- If verification is required, do not answer as settled.
- If status and priority conflict, treat them separately.
- If authority or action class is ambiguous, state the limitation.

For file-based systems, this can live in index.md, startup instructions, or project memory rules.

For structured systems, allowed_action should be computed at retrieval time. It should not be permanently stored, because a memory that could answer yesterday may require verification today.

Early calibration result

This is not a benchmark. It was a deterministic policy test, not model generation.

I tested:

12 internally designed scenarios,
35 memory objects,
multiple threshold settings,
no retrieval noise,
no language model in the loop.

The policy logic produced:

strict:                 29/35 correct, 0 false-certainty errors, 6 overblocks
balanced:               34/35 correct, 0 false-certainty errors, 1 overblock
balanced_risk_adjusted: 35/35 correct, 0 false-certainty errors, 0 overblocks on this set

The useful finding is not "this is solved." The useful finding is the tradeoff surface:

stricter gates reduce false certainty, but can overblock low-risk settled items.

The one balanced overblock came from a low-risk, settled memory that scored just below the answer threshold. A risk-adjusted threshold fixed that case on this scenario set.

Clear limitations: internal scenarios, deterministic logic only, small sample. This shows the policy can be tuned. It does not prove generalization or real-world robustness with actual models and noisy retrieval.

The harder next test is:

retrieved memories
-> access policy assigns allowed actions
-> model writes answer
-> scorer checks whether the model obeyed the allowed actions

That is where this becomes more serious.

Tradeoffs

This layer adds cost: more metadata, more careful prompting, higher token usage for conflict evaluation, and increased risk of over-caution.

A simpler system may be better for:

short projects,
creative exploration,
low-stakes drafting,
live debugging,
tightly supervised pair programming.

The access policy earns its cost when the work is long-running, multi-session, expensive to correct, or vulnerable to stale context.

Known failure modes

Overblocking: legitimate low-risk actions get downgraded to warnings.

Mitigation: risk-adjusted thresholds for settled, low-risk items.

Under-constraining: a correction or unresolved item is treated too lightly.

Mitigation: stronger gates for negative or verification-required records.

Policy drift: the rules become too complex to maintain or audit.

Mitigation: version the policy and correct it with the same correction process it governs.

Practical check

For every important retrieved memory, ask:

1. What claim does this support?
2. What is its source type and epistemic status?
3. Is verification required, unresolved, or contested?
4. Does it conflict with a higher-authority record?
5. What action class is it allowed to take?
6. What would change this assessment?

This check is lightweight enough for daily use and strong enough to catch many confident mistakes.

Bottom line

Effective long-term memory is not just about retrieving relevant context.

It is about governing what that context is allowed to do.

A stale memory can remain preserved without steering current decisions. An unresolved memory can warn without pretending to answer. A correction can block a repeated mistake. An archive can answer history without becoming present truth.

This is early work. The policy still needs external scenarios, model-in-the-loop testing, and real usage data.

The direction, however, feels right: memory systems should not only recall. They should also constrain.

This is part of a short series on AI memory as judgment infrastructure: the zero-budget foundation, correction memory, preserving unresolved questions, three failures that tested the system, and authority policy.

DEV Community