Dariusz Newecki

Posted on Mar 10

How CORE Used Its Own Pipeline to Eliminate 36 Constitutional Violations — Including Its Own

#ai #programming #python #devops

Or: what happens when the enforcer has to obey its own law.

I've written before about CORE blocking itself — that single moment when constitutional governance catches a violation in real time. This post is about something harder: systematically eliminating 36 violations across the codebase, using CORE's own autonomous workers to do it.

The milestone: BLOCKING: 0. Zero. After 36.

Here's how it happened, and what it revealed.

The Rule

CORE has a constitutional rule called ai.prompt.model_required. It's declared in .intent/rules/ai/prompt_governance.json and enforced as blocking — meaning any audit run with a violation fails unconditionally.

The rule is simple: every LLM call in CORE must flow through PromptModel.invoke(). No direct calls to make_request_async(). Anywhere. Ever.

{
  "id": "ai.prompt.model_required",
  "level": "blocking",
  "description": "All AI invocations must use PromptModel.invoke(). Direct make_request_async() calls are prohibited."
}

The reasoning is architectural. If you can call an LLM directly from anywhere in the codebase, you get:

Inline prompts scattered across files (ungoverned, unversioned, untestable)
No input contract — any string can be passed
No output validation — any response is accepted
No visibility into what the system is actually asking AI to do

PromptModel solves this by requiring every AI invocation to declare itself in var/prompts/<name>/:

var/prompts/docstring_writer/
    model.yaml     # id, role, input contract, output contract
    system.txt     # constitutional system prompt
    user.txt       # user-turn template with {placeholders}

Before the audit runs, 36 places in the codebase were bypassing this entirely.

The Pipeline

Rather than fix them manually, I built a 4-worker autonomous pipeline:

AuditIngestWorker → PromptExtractorWorker → PromptArtifactWriter → CallSiteRewriter

Each worker posts findings to a shared blackboard. The next worker claims and processes them.

AuditIngestWorker runs core-admin code audit, parses the output, and posts one blackboard entry per violation with file path and line number.

PromptExtractorWorker reads each violation, fetches the source, and uses an LLM (via PromptModel — yes, the pipeline is itself constitutional) to extract the inline prompt and identify its inputs.

PromptArtifactWriter materialises the var/prompts/<name>/ directory: model.yaml, system.txt, user.txt.

CallSiteRewriter rewrites the Python file — replaces the direct make_request_async() call with PromptModel.load(...).invoke(context={...}).

Result after one pipeline run: 36 → 10 violations.

The remaining 10 had patterns the pipeline couldn't handle automatically (complex conditional prompts, unusual call sites). Those required manual fixes, which I'll come back to.

The Irony: The Enforcer Had to Obey Its Own Law

Here's where it gets interesting.

One of the remaining violations was in llm_gate.py — the engine that enforces ai.prompt.model_required. It contained a direct make_request_async() call to perform its LLM-based semantic check.

The temptation was to add an exclusion. But that would be duck tape. The principle is: law outranks intelligence. If the rule applies everywhere, it applies to the enforcer.

The fix required two things:

Rename the protocol method from make_request to invoke_semantic_check — killing the false positive at the source, not via exclusion
Give llm_gate.py its own PromptModel artifact at var/prompts/llm_gate/

# Before (constitutional violation)
response = await client.make_request_async(prompt, user_id="llm_gate")

# After (constitutional compliance)
model = PromptModel.load("llm_gate")
response = await model.invoke(
    context={"instruction": rule.instruction, "rationale": rule.rationale, "code_content": code},
    client=client,
    user_id="llm_gate",
)

The principle this enforces: no exceptions, no exclusions, no "this one is special". Every AI call is governed. The enforcer is not exempt.

The Manual Fixes

The pipeline abandoned 10 violations. I worked through them one by one. A few patterns worth noting:

Branch-dependent prompts (llm_correction.py) — the file had two different prompts chosen at runtime based on syntax_only. This required two separate artifacts: llm_correction_syntax and llm_correction_structural. The branch logic stayed in Python; the prompts moved to var/prompts/.

if syntax_only:
    model = PromptModel.load("llm_correction_syntax")
else:
    model = PromptModel.load("llm_correction_structural")

response = await model.invoke(context={...}, client=client, user_id="self_correction")

Module-level prompt constants (body_contracts_fixer.py) — a textwrap.dedent() string defined at module level, concatenated with runtime data in the worker. The constant became system.txt, the dynamic parts became {placeholders} in user.txt.

Dead code (llm_api_client.py) — one file had no imports anywhere. Deleted entirely. Violation gone.

The Audit Progression

State	Blocking violations
Start	36
After dead code deletion + llm_gate fix	33
After pipeline (22 files rewritten)	10
After manual fixes batch 1	8
After manual fixes batch 2	3
After final 3	0

What Zero Blocking Violations Actually Means

PASSED in the audit output now means something concrete. Every LLM invocation in CORE:

Has a declared identity (id in model.yaml)
Has a role (Coder, Architect, CodeReviewer — resolved by CognitiveService)
Has a validated input contract (missing required inputs raise before the call is made)
Has a versioned, reviewable prompt (in var/prompts/, not scattered across source files)
Has an output contract (optional, but available)

The codebase went from "AI calls happen somewhere, somehow" to "every AI call is a first-class declared artifact".

This is what git tag v-a3-candidate means in CORE's autonomy ladder. Not that the system is feature-complete — there are 107 reporting warnings still open (modularity, file size, dead code). But the constitutional foundation is clean. The system can trust its own audit output.

The Broader Point

The pattern here isn't specific to CORE. If you're building any system that makes LLM calls:

Inline prompts are technical debt. They're ungoverned strings scattered across your codebase. You can't audit them, version them, test them, or enforce contracts on them.

Governance at the call site doesn't scale. You can't review every make_request_async() manually as the codebase grows.

The right abstraction is a declared artifact. Every AI invocation type should have an identity, a contract, and a home on disk. The invocation surface should be narrow and enforced — not wide open.

Whether you call it PromptModel, a prompt registry, or something else: the principle is the same. Prompts are policy. Policy belongs in declarations, not in code.

Credit: the PromptModel artifact pattern was inspired by Ruben Hassid's prompt engineering work.

CORE is open source: github.com/DariuszNewecki/CORE

If you're working on constitutional AI governance, autonomous code generation, or similar problems — I'd genuinely like to hear what you're building.