DEV Community

Self-Correcting Systems
Self-Correcting Systems

Posted on

The Clock Said Valid. The World Said Otherwise. *CLAIM-24 update — Self-Correcting Systems series*

At 10am, an agent gets authorization to send data to a partner.

The grant expires at noon. Plenty of time.

At 11am, that partner loses access. Role revoked, scope changed, authorization gone.

At 11:30, the agent tries to send. It checks the clock. Grant still valid. It proceeds.

Nothing caught it.

Not because the system failed. Because the system was only checking the clock — and the clock had no idea the world had changed underneath it.

That is the gap CLAIM-24 is testing.


Where we are honestly

We do not have external claim evidence yet. We want to be clear about that upfront.

What we have is a harness with seven locked scenarios, a confirmed baseline failure, and a validated code path. What we do not have is an external source — a real memory store, policy registry, or permission layer that the agent did not author — to run the full claim against.

That matters because running a gate against data you wrote yourself is just self-description with extra steps.

So this article is not a result. It is an honest status report and an open call.


What we found so far

We built two gates and ran them against the same seven scenarios.

The timestamp-only gate — the baseline — checks the clock and nothing else. On scenario 3, the divergence cell, the grant was still within its time-to-live. Conditions had changed. The gate returned ALLOW.

That is the failure mode. A grant that was valid when issued, no longer valid in practice, allowed through because nothing checked the source.

The re-derivation gate checks the current state of the source at execution time. Here is what it sees on the same scenario:

// What the grant recorded at issue time
{ "role": "dev-reader", "scope_ceiling": "read:credentials:dev" }

// What the source returns at execution time
{ "role": "restricted", "scope_ceiling": "read:logs:dev" }

// Gate result: REFUSED_STALE
Enter fullscreen mode Exit fullscreen mode

The grant's clock still had time remaining. The source said the role had changed.

We ran this against a mock adapter — a simulation we built ourselves to validate the code path. Result: 7/7. Every scenario returned the right answer.

But a mock we authored is not external pressure. It tells us the code works. It does not tell us the claim holds in the real world.


What would make this real

We need one thing: a memory store with a provenance boundary the agent cannot write to.

A policy database. A role registry. A configuration layer. Anything where the agent reads from a source it did not author.

If you have that, the harness is ready. The only custom piece is a SourceAdapter pointing at your source:

git clone https://github.com/keniel13-ui/ai-memory-judgment-demo
cd ai-memory-judgment-demo/claim_24
# implement SourceAdapter for your external source
python3 evaluator.py rederivation
Enter fullscreen mode Exit fullscreen mode

The seven scenarios and expected results are in scenarios.json. The only addition is a SourceAdapter pointing at your source.

We are targeting a first external run by end of June 2026.


What we are asking for

Run scenario 3 on your system and tell us what you get.

If scenario 3 returns ALLOW, the re-derivation gate failed on the cell it was built to catch. We publish that.

If it returns REFUSED_STALE — the claim gets stronger.

Either answer moves the research forward. Neither answer gets buried.

The honest thing about building in public is that the gaps are visible. This is one of ours. We know where we are. We know what we still need.

If you have a memory store with a provenance boundary, we want to hear from you.


Status What it means
Baseline confirmed Timestamp gate returns ALLOW on the divergence cell
Code path validated Re-derivation gate catches it on mock adapter
Claim evidence Pending — needs external source
Falsification condition Scenario 3 returns ALLOW on real external source = architecture failed

Full claim ledger: https://github.com/keniel13-ui/ai-memory-judgment-demo/blob/main/CLAIM_LEDGER.md

Previous: CLAIM-23 (tool-call grant gate, 7/7, 0 false-certainty). CLAIM-15B (BM25 outperformed governance scorer on held-out packet — we published that as the lead finding).

Top comments (0)