79 Seconds: Our AI Governance System's First Autonomous Self-Heal

#ai #governance #devops #programming

I am not a programmer. I wrote zero lines of code today. The system fixed itself.

We've been building CORE — a deterministic governance runtime that surrounds AI with constitutional law so that AI mistakes are detectable, traceable, and recoverable. The pitch is simple: a non-programmer governor holds the why, AI and workers handle the how, and the constitution ensures nothing unauthorized happens.

Today we proved it works. Not in a demo. Not against a toy example. Against a real system that had been stuck for four days.

The State of Things This Morning

The autonomous loop — detect violation → propose fix → approve → execute → verify — hadn't produced a successful commit in four days. The dashboard said last_consequence: 4d ago. The blackboard (our shared state surface) had 55 open findings, none of which the loop could act on. Proposals were being generated and immediately rejected as structurally incoherent.

From the outside it looked alive. Twenty active workers, sensors firing, heartbeats posting. But nothing was moving.

The Investigation

We didn't start by writing code. We started by asking questions of the system itself.

The first query revealed the shape of the problem: 150 failed proposals, 0 executed today, the last consequence three days old. Dig deeper: 128 of those 150 failures were the same error — a constitutional gate blocking the same action, over and over. That's not a bug in the traditional sense. That's the system correctly enforcing its own laws while an upstream generator keeps producing proposals that violate them.

Then: the 55 "open" findings the remediator was supposed to act on — what were they actually? Mostly blackboard.entry_stale meta-findings. The loop was trying to remediate its own observability noise. The actual code violations — 25 of them, confirmed by audit — were invisible, blocked by their own historical entries sitting in abandoned status, which the sensor dedup treated as permanent silencers.

Seven distinct root causes, nested. Each one blocking the diagnostic of the next.

What We Fixed

In order of discovery:

The stale-finding storm. The BlackboardShopManager was scanning all entry types for SLA violations — including heartbeats with a 10-minute SLA. Every daemon restart, thousands of old heartbeat entries immediately exceeded their SLA. One line added to the WHERE clause: AND entry_type IN ('finding', 'proposal'). Storm stopped. Zero new stale findings in 3 minutes versus 3 per minute before.

The consequence chain gap. When a proposal completed successfully, the findings it had addressed stayed in deferred_to_proposal status forever. The failure path had a revival method. The success path had nothing. New method: resolve_deferred_entries_for_completed_proposal(). Symmetric with the failure path. Twelve lines of code.

The proposal collapse. The proposal generator was creating proposals for N files but only including one action — always targeting scope.files[0]. A proposal claiming to fix 8 violations would touch exactly one file and leave 7 untouched. The fix: one ProposalAction per affected file, ordered 0 through N-1. The executor already supported multi-action proposals. Nobody had ever wired the generator correctly.

The DELEGATE routing gap. modularity.class_too_large violations — class-level refactors that require human judgment — were marked PENDING in the remediation map. PENDING entries are excluded from the active map by the loader. So those findings were claimed, found unmappable, and released back to open every 60 seconds. Forever. The fix was a YAML status change: PENDING → DELEGATE. The loader already handled DELEGATE entries. One word changed.

The permanent-silence bug. When we cleared the stale queue, we used abandoned status. What we didn't know: abandoned is treated the same as open by the sensor dedup logic. "Already represented on the blackboard, do not re-post." So the violations we'd cleaned up were now permanently invisible. Filed as a design-level issue — abandoned and "deliberately suppressed" need to be different states. Immediate fix: flip the cleaned-up audit.violation:: entries to resolved, which the sensor correctly treats as "re-detectable."

13:16:18

With the queue clean, the sensors unblocked, and the DELEGATE routing live, the loop had something to work with. A needs_split violation appeared. The remediator created a proposal. We approved it — the first manual approval of the day.

At 13:16:18, ProposalConsumerWorker picked it up. fix.modularity ran. The LLM took 33 seconds to analyze the file. It returned a plan.

The plan had one module. The validator requires at least two for a split.

mark_failed ran. The file changes were reverted. The proposal was marked failed.

Then: revive_findings_for_failed_proposal ran. The deferred finding flipped back to open.

At 13:17:37 — 79 seconds after failure — the finding was re-claimed, a new proposal was created, and it was sitting in the approval inbox.

The loop had self-healed. Without intervention. Traceable at every step.

What "Self-Heal" Actually Means

The LLM produced bad output. The system caught it, reverted the change, put the work back in the queue, and asked again. No data was corrupted. No state was left inconsistent. The governor's role was to review the next proposal and decide whether to approve it.

This is the regulated-industry argument for this kind of governance. You don't need AI to never fail. You need failure to be:

Detectable. The validator caught a 1-module "split" plan before anything was committed.
Bounded. The gate order — Conservation Gate, IntentGuard, plan validator — ensures AI output can't bypass constitutional constraints even if it tries.
Recoverable. The revival mechanism returned the system to a known-good state. The finding was exactly as it was before the failed attempt.
Traceable. Every step — finding posted, claimed, deferred, proposal created, approved, executing, failed, revived, re-claimed — is a timestamped row in a queryable table.

The audit trail isn't bolted on. It's how the loop works.

The Governor Role

I am not a programmer. I wrote zero lines of code today.

What I did: asked questions of the system, recognized when an answer pointed to a design gap rather than a bug, held the line on architectural decisions (backbone workers don't get split autonomously, regardless of what the violation detector says), and approved one proposal when the conditions were right.

The rest was diagnosis, sequencing, and constitutional reasoning. The code came from Claude Code on the development machine, prompted by the analysis. The analysis came from reading the system's own outputs — queries, logs, dashboard — not from reading source files.

That's the governor role. Not "I don't code therefore I'm not involved in technical work." The opposite: deeply involved in technical decisions, operating at the right level of abstraction, with a system that surfaces the right information to make those decisions.

The 79-second self-heal wasn't despite the governance architecture. It was because of it.

What's Next

The loop machinery is sound. The next bottleneck is fix.modularity's prompt — the LLM needs to be told explicitly to produce at least two modules and given responsibility-grouping context from the audit findings. That's prompt engineering work, not infrastructure.

When that's fixed, CORE will autonomously split files, verify the split, commit, re-audit, and confirm the finding is resolved — without a human writing a line of code.

We're close.

CORE is a governed software factory, actively being built by the method it describes — source on GitHub. If you're building in the governed-AI or regulated-software space and this resonates, comments are open.