Dariusz Newecki

Posted on Feb 28

When My AI Blocked Itself: What Constitutional Governance Actually Looks Like in Practice

#ai #opensource #python #aigovernance

This morning, CORE — my autonomous AI development system — blocked its own self-healing workflow.

No human caught it. No alert fired. The constitution did.

Here's what happened, why it matters, and what it tells us about building AI agents that are actually safe in production.

The Incident

CORE runs a dev-sync workflow that automatically fixes constitutional
violations in its own codebase — missing IDs, formatting issues, docstrings.
It's self-healing: the system finds problems and fixes them without human intervention.

This morning it failed with this error:

ERROR: file.tag_metadata failed for src/body/governance/intent_guard.py:
cannot unpack non-iterable ConstitutionalValidationResult object

The self-healing system was trying to tag its own governance files — and the constitutional guard blocked it.

Why? Two versions of the same IntentGuard component had drifted apart.
One returned a (bool, list) tuple. The other returned a
ConstitutionalValidationResult object. The FileHandler was calling the old API. The constitution enforced the new one.

The AI was literally stopped from fixing itself because its own governance layer had evolved.

Why This Is a Good Thing

Most people's instinct is: that's a bug. Fix it and move on.

But think about what actually happened here:

No silent failure
No partial state written to disk
No corrupted files
A clear, traceable error pointing exactly to the drift

The constitutional governance layer did exactly what it was designed to do:
halt execution when something violates the contract, rather than proceeding and creating invisible debt.

Compare this to what happens in ungoverned AI agent systems:

Agent detects violation → Agent generates fix → Agent writes fix → 
Fix passes syntax check → Fix is wrong → Nobody knows

vs. what happened in CORE:

Agent detects violation → Agent generates fix → Constitutional guard evaluates fix → Guard blocks execution → Error is explicit → 
Human fixes the contract → System resumes cleanly

The second path is slower. It is also the only one you can trust at scale.

The Root Cause: API Drift Between Two IntentGuard Versions

CORE has a component called IntentGuard — the constitutional boundary enforcer that sits between every file mutation and the filesystem.

Over time, we'd evolved it from returning a simple tuple:

# Old API
def check_transaction(
    self, proposed_paths: list[str]
) -> tuple[bool, list[ViolationReport]]:
    ...
    return (allowed, violations)

To returning a rich result object:

# New API
def check_transaction(
    self, proposed_paths: list[str], impact: str | None = None
) -> ConstitutionalValidationResult:
    ...
    return ConstitutionalValidationResult(
        is_valid=is_valid,
        violations=violations,
        source="IntentGuard"
    )

But FileHandler._guard_paths() — the caller — was still unpacking it as a tuple:

# Still expecting the old API
allowed, violations = self._guard.check_transaction(cleaned)

The fix was straightforward:

def _guard_paths(self, rel_paths: list[str], impact: str | None = None) -> None:
    cleaned: list[str] = [str(p).lstrip("./") for p in rel_paths]
    result = self._guard.check_transaction(cleaned, impact=impact)
    if result.is_valid:
        return
    msg = result.violations[0].message if result.violations else "Blocked by IntentGuard."
    raise ValueError(f"Blocked by IntentGuard: {msg}")

Two lines changed. System resumed. But the interesting part isn't the fix — it's that the system knew something was wrong and refused to proceed rather than silently producing bad output.

What Constitutional Governance Actually Means

I've written before about CORE's architecture. But incidents like this morning's illustrate the practical reality better than any diagram.

Constitutional governance isn't about adding a linter or a code review step. It's about making the rules sovereign — meaning:

Rules are defined once, in human-authored .intent/ YAML files
Rules are evaluated at runtime, not just at commit time
Violations halt execution — they don't just log warnings
No agent, including the self-healing one, can bypass them

The principle is simple: law outranks intelligence.

The AI can be smarter than the rules. It doesn't matter. The rules run first.

The Autonomy Ladder: Where We Are

CORE is currently at A2 — governed autonomous code generation. The system can take a natural language request, plan an implementation, generate code, validate it constitutionally, and commit it — without human intervention.

A0 — Self-Awareness    ✅  Knows what it is and where it lives
A1 — Self-Healing      ✅  Fixes known structural issues automatically  
A2 — Governed Gen      ✅  Natural language → constitutionally aligned code
A3 — Strategic         🎯  Autonomously identifies architectural improvements
A4 — Self-Replication  🔮  Writes CORE.NG from its own understanding

What this morning showed is that A1 (self-healing) and A2 (code generation) are genuinely running in production — and that constitutional governance is doing real work, not just theoretical work.

The system fixed 2031 symbols, ran constitutional audit across 92 rules, caught the drift, halted cleanly, and resumed after a two-line fix.

That's the loop working as designed.

Lessons for Anyone Building Autonomous AI Systems

1. Silent failures are the enemy.

If your agent fails quietly and continues, you have no idea what state you're in. Make failures loud, explicit, and blocking.

2. Governance drift is inevitable — build for detection.

APIs evolve. Contracts drift. The question isn't whether it will happen, it's whether you'll know when it does. Constitutional enforcement makes drift visible immediately.

3. The self-healing loop needs a constitutional boundary too.

It's tempting to give your autonomous repair system elevated privileges — "it's just fixing things." Don't. The repair system should operate under the same constitutional constraints as everything else. If it can't fix something within bounds, that's information, not a failure.

4. Law outranks intelligence.

Your AI will find creative solutions. Some of them will violate your architecture. The governance layer needs to be faster and more absolute than the AI's creativity.

Try It

CORE is open source under MIT. If you're building autonomous AI systems and thinking about governance, I'd love to hear what you're doing.

Repo: https://github.com/DariuszNewecki/CORE
Docs: https://dariusznewecki.github.io/CORE/
Demo: https://asciinema.org/a/792095

The demo shows exactly this kind of cycle: violation detected → execution blocked → remediation → clean re-validation.

Governance is executable. This morning proved it again.

Top comments (5)

Dariusz Newecki • Feb 28

Update posted the same day — what happened after the block:

After writing this, I ran CORE's new self-awareness cycle for the first time.
Six-dimension self-assessment: constitutional health, semantic landscape,
knowledge gaps, structural health, change context, and something new —
intent drift.

Intent drift measures cosine distance between what a symbol's code does
(embedding of the code) vs. what its docstring says it does (embedding of
the intent text). Low similarity = the code evolved but the declaration didn't.

First run: 66% of sampled symbols showed high drift. 3 critical (>0.6).

The standout: CORE identified PolicyAnalyzer and ConstitutionalAuditor
as its most misaligned components (drift: 0.70 and 0.69) and traced the
cause to specific recent refactoring commits. It named the commits. Unprompted.

Its assessment: "This is not a compliance problem but an instrumentation
failure — the system cannot see itself."

That's the same system that blocked itself this morning — now telling us
exactly why it blocked itself and what to do about it.

GitHub: github.com/DariuszNewecki/CORE

Matthew Hou • Mar 1

The fact that the constitution caught something the human operators didn't is the most interesting part of this story. It's basically a CI/CD pipeline for agent behavior — automated guardrails that run before every action, catching violations that humans would miss because we can't review every operation at scale.

This maps directly to something I've been seeing in AI coding tools: the best safety nets aren't human reviewers (we miss things, we get fatigued, we rubber-stamp). The best safety nets are automated checks that run on every single output. Linters, type checkers, test suites, and in your case, constitutional rules.

The pattern is the same whether you're governing an autonomous development agent or reviewing AI-generated code: invest in the verification layer, not the human review layer. The former scales; the latter doesn't.

Dariusz Newecki • Mar 2

I agree on verification > review.
The difference I’m aiming for is that CORE’s checks are not post-hoc (lint/test) — they’re pre-mutation gates.
Nothing touches disk unless it passes constitutional evaluation.

CI/CD validates artifacts after creation.
CORE validates intent before mutation.

That shift matters once agents operate continuously rather than per-PR.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.