Dariusz Newecki

Posted on Mar 23

My AI Has 22 Workers, 2,470 Resolved Violations, and Still Can't Call Itself Autonomous. Here's the Gap.

#ai #architecture #devjournal #governance

Or: what it actually takes to close the loop.

I've been building CORE for a while now. It's a constitutional governance system for AI agents — the thing that gives them law instead of vibes. Workers, a Blackboard, a Constitution, autonomous remediation. The full picture.

This week I sat down and did something I'd been avoiding: I honestly assessed whether CORE is ready to declare A3 — strategic autonomy. Zero blocking violations. Zero constitutional violations. Continuous self-audit. No human in the loop except at the beginning and the end.

The answer was no. But the reason why taught me more about autonomous systems than six months of building did.

What's Actually Running

First, the good news. Here's the live runtime health dashboard from this morning:

Workers:          22 active / 24 total
Blackboard:       2,470 resolved  |  1,369 open
Open findings:    1,367
Recent crawls:    3 × completed clean (1,410 files each)

Twenty-two workers heartbeating. An audit→remediate→execute pipeline that actually closes. Workers that detect constitutional violations, post them to a shared Blackboard, claim them, generate fixes via LLM, validate through a Crate/Canary ceremony, and commit to git — all without a human touching anything.

That's real. It took months to get there. The loop runs.

And yet.

The Gap Nobody Talks About

Here's what the Observer reported alongside those numbers:

Stale entries:      1,258
Orphaned symbols:   1,193
Silent workers:     8

The open finding count, before I truncated the Blackboard to get a clean baseline, was growing at +1,268 per night despite +2,755 resolutions. The sensors were posting faster than the remediator could close. The system was working — and losing ground simultaneously.

This is the thing nobody warns you about when you start building autonomous systems: activity is not the same as progress. A system can be busy, healthy, heartbeating, resolving thousands of items — and still be drifting in the wrong direction. Without a way to observe the trend, you only ever have a snapshot. Snapshots lie.

That's Gap #1: the Reporter. Not a scanner. A reader. Something that looks at accumulated history and tells you: is this system getting healthier, or is it slowly drowning?

The Validator Problem

Gap #2 is scarier.

Right now, when a Worker generates a fix and creates a Proposal, that Proposal can reach live code without passing through a deterministic gate. There's no component that checks, before execution:

Does this file start with a path comment? (trivially checkable)
Does it violate layer boundaries? (AST check)
Does the Blackboard entry carry a valid registered Worker UUID? (DB lookup)

These are not intelligence problems. They're rule problems. And the architecture explicitly calls for them — I wrote it into the implementation plan months ago as Phase 4. They just... haven't been built yet.

The absence of a Validator Chain means the system's integrity is currently dependent on the LLM getting things right. That's not a governance model. That's hope.

Smart systems get manipulated. Rules do not.

A Validator should be as dumb and reliable as possible. The whole point is that it doesn't need to understand anything — it just checks the rule. Deterministic. Unambiguous. Unbypassable.

The Deepest Gap: Logic Conservation

This one I knew about but kept deprioritising.

My own notes from a session two weeks ago read:

Phase 3.2 (Logic Conservation Gate): NOT IMPLEMENTED — Octopus A3 NOT READY for complex infrastructure refactors

The Logic Conservation Gate is the thing that prevents the autonomous loop from "fixing" something in a way that is structurally valid but semantically wrong. The code compiles. The tests pass. The logic is broken.

Without it, A3 is not autonomy. It's a very confident system making mistakes at scale and committing them to git.

This is the gate that has to exist before I'm willing to declare the loop closed. Not because the system is untrustworthy — but because the architecture must not require trust. The chain enforces integrity regardless of the quality of the model's reasoning on any given day.

What This Has to Do With LangChain

(You knew this was coming.)

The reason I'm building CORE instead of using an agent framework is precisely this: every framework I looked at gives you capability without governance. You can make the agent do impressive things. You cannot make it accountable.

When something goes wrong — and it will — you have no paper trail. No constitutional record. No way to audit what the agent decided and why. No way to know whether the fix it applied was within its declared mandate or a creative interpretation of "just fix it."

CORE's Workers are constitutional officers. Their authority comes from their declaration, not their intelligence. The LLM inside a Worker is labour. The Worker's constitution is the law.

That inversion is the whole idea.

The Actual Roadmap to A3

Here's what needs to happen, in order, before I'm willing to say the words:

Safety gates (non-negotiable):

Logic Conservation Gate — the autonomous loop cannot corrupt logic
Validator Chain — four deterministic checks before any Proposal reaches live code
WorkerAuditor hardening — governance cannot silently fail during its own cycle errors

Stability (required for sustained operation):

Test Writer Worker — a real autonomous worker, not the stub that currently logs "not yet implemented" with confidence: 0.5
Reporter with trend data — snapshots are not enough; direction is what matters

Each of these is individually deployable. Each closes a portion of the loop.

The Vision Behind All of This

I have a slogan for CORE: "last programmer you will ever need."

I mean that with complete respect for every programmer reading this. But the dynamic is changing. AI wins on raw intelligence. Humans still win on discipline — because most AI agents are brilliant but lawless.

CORE's edge is disciplined intelligence. A constitutional AI that cannot skip steps, cannot forget its mandate, cannot act without a paper trail.

The end state: you hand CORE a document — a spec, a brief, a half-finished repo, a conversation. CORE talks back until it can say: "I understand." You confirm. CORE goes quiet and delivers. No human in the loop except at the beginning (intent) and end (review).

You have an idea? CORE asks questions until it has a constitutional Intent Document you've signed off on. Then it works from that document. If it hits ambiguity it resolves it constitutionally — not by guessing and not by interrupting you.

You have a half-baked repo? CORE reads everything — code, comments, README, commit history, test names. It builds a model of intent. It tells you: "I understand this. Here is what I think it intends. Here are the gaps. Here is what I can and cannot deliver." You confirm or correct. Then it works.

That's not a feature. That's a different category of thing.

Why the Gap Is Actually Good News

Here's what struck me when I laid all this out honestly:

The nervous system is nearly done. Twenty-two workers running. The audit loop closing. Constitutional enforcement working. That took the hard part — the architecture, the law, the infrastructure.

What's missing is five specific, well-defined, individually deployable items. Not "figure out how autonomous AI governance works." Not "invent a new architecture." Five implementation tasks with clear definitions of done.

The gap between A2 and A3 is no longer philosophical. It's a checklist.

Rome was not built in one day. But at some point you stop laying foundations and start counting the remaining columns.

I'm counting columns.

CORE is an open development — I write about it here and on X as it happens, including the parts that break and the parts that block themselves. If the constitutional governance angle interests you, check my previous posts.

DariuszNewecki / CORE

A thing that uses AI to write perfect applications. For those who want to know how: a governance runtime enforcing immutable constitutional rules on AI coding agents.

CORE

Executable constitutional governance for AI-assisted software development.

Designed for environments where AI action traceability is not optional.

The Problem

AI coding tools generate code fast. Too fast to stay sane.

Without enforcement, AI-assisted codebases accumulate invisible debt — layer violations, broken architectural contracts, files that grow unbounded. And agents, left unconstrained, will eventually do something like this:

Agent: "I'll delete the production database to fix this bug"
System: Executes.
You:    😱

CORE makes that impossible — not detectable after the fact. Impossible.

Agent: "I'll delete the production database to fix this bug"
Constitution: BLOCKED — Violates data.ssot.database_primacy
System: Execution halted. Violation logged.
You:    😌

CORE is a governance runtime that constrains AI agents with machine-enforced constitutional law — enforcing architectural invariants, blocking invalid mutations automatically, and making autonomous workflows auditable and deterministic.

LLMs operate inside CORE. Never above it.

🎬 Live Enforcement Demo

Blocking rule → targeted drilldown →…

View on GitHub

DEV Community