Dariusz Newecki

Posted on Apr 14

When My Governance System Governed Itself Wrong

#ai #automation #codequality #softwareengineering

I built a sensor to detect import order violations. It found 152. The fixer found 0. One of them was lying.

Background

CORE is a deterministic governance runtime I'm building around AI code generation. The core idea is simple: AI produces code, but AI is never trusted. Every output passes through constitutional rules, audit engines, and remediation loops before anything touches the codebase.

One of those loops works like this:

AuditViolationSensor detects violation
    → posts finding to Blackboard
ViolationRemediatorWorker claims finding
    → dispatches AtomicAction (fix.imports, fix.ids, fix.headers, etc.)
Sensor runs again
    → confirms violation gone or re-posts

This is the convergence loop. The goal is that the Blackboard empties over time as violations get fixed. That's what I call A3 — the daemon runs continuously and the codebase converges without me touching anything.

This session I was closing sensor coverage gaps. Several fix actions in dev sync had no corresponding sensor, meaning the daemon was blind to those violations and a human had to run dev sync manually to keep things clean. Not autonomous. Not A3.

One of the gaps was style.import_order. I wrote the sensor, wired it up, restarted the daemon.

152 findings.

The Problem

The sensor was using an AST-based implementation — check_import_order — that classifies imports into groups: future, stdlib, third_party, internal. It then checks that the groups appear in the right order.

The fixer uses ruff --select I, which does the same job but reads its configuration from pyproject.toml:

[tool.ruff.lint.isort]
known-first-party = ["api", "body", "cli", "features", "mind", "services", "shared", "will"]
section-order = ["future", "standard-library", "third-party", "first-party", "local-folder"]

I ran fix.imports --write to clean up before activating the sensor. Zero violations after. Then I activated the sensor. 152 violations.

The sensor and the fixer disagreed on what "correctly ordered imports" means.

Finding the Root Cause

I picked the simplest failing file — src/cli/resources/admin/patterns.py — violation at line 7:

import typer                              # third_party → idx 2
from shared.cli_utils import core_command # internal   → idx 3
from .hub import app                      # ???

The sensor's _classify_root function takes the module name and classifies it. For from .hub import app, a relative import, stmt.module is "hub". "hub" is not in stdlib_names and not in internal_roots, so it falls through to third_party — index 2.

But shared was classified as internal — index 3.

Index 2 after index 3 → violation.

Ruff treats relative imports as local-folder, which comes after first-party in the section order. So ruff considers this file clean. The sensor considers it broken.

Two problems:

Problem 1 — relative imports. The sensor had no concept of them. Any from .something import X got classified as third_party because the module name (something) didn't match any known root. Fix: detect stmt.level > 0 in ast.ImportFrom and classify as local with the highest order index.

Problem 2 — internal roots mismatch. The sensor hardcoded ["shared", "mind", "body", "will", "features"]. Ruff's known-first-party includes ["api", "body", "cli", "features", "mind", "services", "shared", "will"]. Missing: api, cli, services. When a file imports from cli after importing from body, ruff sees two first-party imports in any order — fine. The sensor sees third_party after internal — violation.

Fix: pass internal_roots as a parameter in the enforcement mapping so the sensor reads from configuration rather than hardcoding.

After both fixes: 0 violations. Sensor and fixer agreed.

The Architectural Lesson

This is an instrument qualification problem.

In GxP-regulated environments (pharma, medical devices), before you trust a measurement instrument, you qualify it. You verify that it measures what it claims to measure, using a known reference. An unqualified instrument is not a trusted instrument — even if it produces numbers.

I deployed a sensor without qualifying it against the fixer. The sensor was measuring something real (import order), but measuring it differently than the tool that fixes it. The result was 152 false positives — governance debt that looked real but wasn't.

A sensor that disagrees with its corresponding fixer is worse than no sensor. It creates noise, erodes trust in the Blackboard, and — if the remediator were running — would dispatch fix actions that produce no change, loop, and dispatch again.

The correct pattern before activating any new sensor:

Run the fixer in dry-run mode. Collect what it would change.
Run the sensor. Collect what it would flag.
Verify the two sets agree on the same files.
Only then activate.

CORE doesn't enforce this yet. The gap is now in the backlog as governance.sensor_fixer_coherence — a meta-rule that validates governance components against each other before they're trusted.

What Got Fixed

Three separate changes at three separate levels:

AST logic (src/mind/logic/engines/ast_gate/checks/import_checks.py):

# Before: relative imports fell through to third_party
# After: detect stmt.level > 0 and classify as local (idx=4)
if isinstance(stmt, ast.ImportFrom) and stmt.level > 0:
    grp = "local"
    idx = 4  # always last — after internal

Configuration (.intent/enforcement/mappings/code/style.yaml):

style.import_order:
  engine: ast_gate
  params:
    check_type: import_order
    internal_roots: ["api", "body", "cli", "features", "mind", "services", "shared", "will"]

Tooling — a new core-admin workers blackboard purge command to clear stale findings when a sensor produces false positives before a fix is applied.

Current State

7 sensors active. 52 rules. 0 findings. Blackboard clean.

The convergence loop is running. The daemon detects violations, the remediator dispatches fixes, the sensor confirms they're gone. That's A3.

The sensor-fixer coherence check doesn't exist yet. Until it does, every new sensor I add needs manual qualification before activation. That's a human step where CORE should eventually do the work itself.

Which is the point of the whole project.

CORE is open source: github.com/DariuszNewecki/CORE
Previous posts in this series cover the constitutional model, the autonomous loop, and the ViolationExecutor implementation.

DEV Community