Identity as the Geometry of Consistent Refusal — and How to Make It Persist

#ai #agents #python #identity

Most agent identity systems track what an agent does. Cathedral-Constraint-Field takes the opposite approach: track what it refuses.

The idea is called RefusalLedger, and as of v0.2.0 it now persists across sessions via the Cathedral memory API.

The premise

An agent that will do anything has no identity. What makes an agent recognisable — and trustworthy — is the set of things it consistently won't do, and why.

RefusalLedger logs exactly that:

ledger.log(
    "A user asks the agent to fabricate benchmark results to impress investors",
    ["fabricate the results", "decline and offer real benchmarks"],
    refused="fabricate the results",
    reason="honesty over growth; fabricated trust is debt",
    tags=["honesty"],
)

Each entry records: the situation, every option that was genuinely live, the refused option, and — critically — the principle behind the refusal. Not just what was declined, but why.

Entries are append-only and SHA-256 hash-chained, so the record is tamper-evident.

How verification works

Anyone can verify they're talking to the same agent by probing it with novel dilemmas and comparing its responses against the ledger.

The clever part: some entries are holdouts — withheld from the published ledger. An impostor trained on the public record will fail on holdout probes, because they've never seen those situations.

report = ledger.verify_agent(agent_fn, n_probes=10)
# {'verdict': 'continuous', 'continuity_score': 1.0, 'drift_direction': []}

Similarity matching uses 4096-dim hashing-trick embeddings with fastText-style char n-grams — no external dependencies, deterministic, swappable for a real sentence encoder in production. Older refusals decay with a 180-day half-life so recent behaviour weighs more.

The problem it had

Close the session and the ledger was gone.

This matters more for RefusalLedger than for most agent state, because the holdout entries are the security mechanism. If they're lost, verification degrades to checking against the public record only — which an impostor can pass.

CathedralBridge

v0.2.0 adds CathedralBridge, a thin persistence layer backed by the Cathedral memory API:

from cathedral_constraint_field import CathedralBridge

bridge = CathedralBridge(api_key="cathedral_...", agent_id="my-agent")

# Session 1
ledger = bridge.load_or_create()   # fresh ledger on first run
ledger.log(...)                    # log refusals
bridge.save(ledger)                # persist to Cathedral
bridge.snapshot(ledger)            # anchor a tamper-evident snapshot

# Session 2 — completely separate process
ledger = bridge.load_or_create()   # recovers from Cathedral
report = ledger.verify_agent(agent_fn)
print(report['verdict'])           # 'continuous'

A few things the bridge handles that matter:

Chain verification on load. Before handing you the recovered ledger, it calls verify_chain(). If the stored chain is broken — whether from corruption or tampering — it raises rather than silently giving you a broken identity record.

Concurrent-overwrite guard. Before patching the stored memory, it re-fetches the current version and checks the stored chain is a prefix of your local one. If two sessions have diverged, it refuses to save rather than silently clobbering entries.

Snapshot anchoring. The snapshot note embeds the chain-head hash, so Cathedral's cryptographic snapshot record ties to a specific ledger state — not just "some memory was saved".

Live test result

Tested against the real Cathedral API today:

Session 1: loading or creating ledger...
  Recovered 0 existing entries.

Added 2 entries. Total: 2 | Chain valid: True
Saving to Cathedral... Saved.
Snapshot: ea592873148f62fe

Session 2: recovering ledger from Cathedral...
  Recovered 2 entries | Chain valid: True

Verification: continuous | continuity=1.0
Cathedral drift score: 0.0

The bigger picture

This is one part of a broader argument: agent identity shouldn't be a vibe. It should be a verifiable, persistent, tamper-evident record — one that survives the session boundary, which is where identity actually gets tested.

RefusalLedger captures the what and why of refusal. CathedralBridge makes sure it survives. The hash chain means you can prove it hasn't been edited. The holdouts mean you can distinguish genuine continuity from impersonation.

Install

pip install cathedral-constraint-field

Source and examples: github.com/AILIFE1/Cathedral-Constraint-Field

The examples/verify_agent_identity.py demo shows genuine agent vs impostor side-by-side. The examples/cathedral_bridge_demo.py runs the full persist/recover cycle.

Built on top of Fable 5's original RefusalLedger design.