Dariusz Newecki

Posted on Nov 28, 2025 • Edited on Mar 3

How I Achieved 70% Autonomous Code Generation with Constitutional AI Governance

#ai #machinelearning #python #opensource

🚀 UPDATE (January 2026): CORE v2.2.0 Released

Since this post, we've achieved a major milestone: Universal Workflow Pattern

Every operation now follows: INTERPRET → ANALYZE → STRATEGIZE → GENERATE → EVALUATE → DECIDE

Meta achievement: CORE can explain itself through its own architecture! Try: core "what is CORE?"

Pattern compliance: 18% and growing (3 commands migrated)

Full details: v2.2.0 release notes

Original post from November 2025 below:

After three months of development, CORE v2.0.0 just achieved something I wasn't sure was possible: 70% success rate on autonomous code generation with constitutional governance ensuring it stays safe and bounded.

The Problem

AI agents can write code incredibly fast. Tools like Claude, GPT-4, and DeepSeek are crushing benchmarks. But there's a catch:

They break your architecture
They skip tests
They ignore naming conventions
They create files in random places
They have no accountability

How do you give AI agents autonomy without losing control?

The Solution: Constitutional AI Governance

CORE uses a "constitution" - human-authored policies stored as YAML files that AI agents can semantically understand. Every autonomous action goes through:

Constitutional Audit - Validates against architecture rules, naming conventions, file placement
Semantic Validation - Uses a knowledge graph (513 symbols) to understand context
Test Execution - Runs tests, auto-fixes failures
Clean Merge - Only merges if everything passes

Think of it like a constitutional democracy for AI: the AI has agency, but operates within defined boundaries.

Real Metrics from v2.0.0

Starting from scratch with no autonomous capabilities:

Code generation success: 0% → 70%
Semantic placement accuracy: 45% → 100%
Knowledge graph: 513 symbols vectorized
Module anchors: 66 for accurate code placement
Policy chunks: 48 chunks enabling constitutional understanding

How It Works: Mind-Body-Will Architecture

Mind (.intent/ directory)

Constitution defines immutable laws
Policies stored as human-authored YAML
Cryptographic signing for governance changes

Body (src/ code)

Deterministic execution tools
Constitutional auditor
Knowledge graph (PostgreSQL)
Vector storage (Qdrant)

Will (AI agents)

PlannerAgent creates execution plans
CoderAgent generates code within bounds
ExecutionAgent validates and tests
All operate in defined "autonomy lanes"

The Key Innovation: Semantic Policy Understanding

Instead of hardcoded rules, policies are vectorized and semantically searchable. The AI understands:

Which policies apply to which code
Why a placement violates architecture
How to remediate violations autonomously

Example policy chunk:

agent_rules:
  - id: "no_direct_db_access"
    statement: "Agents must use ServiceRegistry, not direct imports"
    enforcement: "error"
    rationale: "Prevents split-brain dependency injection"

The AI reads this, understands it semantically, and generates code that complies.

The Autonomy Ladder

CORE progresses through defined autonomy levels:

✅ A0: Self-Awareness - Knowledge graph operational
✅ A1: Self-Healing - Auto-fix drift, formatting, compliance
✅ A2: Code Generation - Create new features autonomously (current)
🎯 A3: Strategic Refactoring - Multi-file architectural improvements (next)
🔮 A4: Self-Replication - Write CORE.NG from scratch

What Makes This Different

vs. GitHub Copilot/Cursor:

CORE focuses on autonomy with governance, not assisted coding
Constitutional boundaries prevent drift
Cryptographic approval for governance changes

vs. AutoGPT/BabyAGI:

Constitutional framework prevents "going off the rails"
Audit system catches violations before they merge
Human quorum for critical decisions

vs. Traditional CI/CD:

AI understands why rules exist, not just what they are
Autonomous remediation, not just detection
Self-evolving within constitutional bounds

Example: Autonomous Code Generation Flow

User request: "Create a health endpoint"

PlannerAgent creates execution plan with constitutional check
CoderAgent generates code using semantic context (knows where health endpoints go, what imports to use, naming patterns)
Constitutional Audit validates placement, naming, imports
ExecutionAgent runs tests, auto-fixes failures
Merge - Only if audit passes and tests succeed

Success rate: 70% end-to-end without human intervention.

Tech Stack

Python 3.12+ with Poetry
PostgreSQL for knowledge graph (Single Source of Truth)
Qdrant for vector storage
LLM Providers: DeepSeek, Claude, OpenAI (configurable)
Constitutional Framework: YAML policies with cryptographic signing

Try It Yourself

git clone https://github.com/DariuszNewecki/CORE.git
cd CORE
poetry install
cp .env.example .env
# Add your LLM API keys

# Build knowledge graph
poetry run core-admin fix vector-sync --write

# Run constitutional audit
poetry run core-admin check audit

# Try autonomous code generation
poetry run core-admin chat "create a CLI command that validates JSON"

What's Next: A3 Autonomy

The next frontier is strategic refactoring - multi-file architectural improvements that require understanding the entire system, not just individual modules.

This requires:

Cross-file dependency analysis
Impact assessment before changes
Coordinated multi-file modifications
Validation that architecture improved, not regressed

Lessons Learned

1. You can't automate what you can't do manually
Building comprehensive manual tooling first made automation straightforward. CORE already had ~80% of needed infrastructure.

2. Context quality matters more than model quality
Moving from basic string concatenation to rich context packages (semantic search, graph traversal, structured metadata) improved success from 0% to 70%.

3. Constitutional governance enables safe autonomy
Policies as human-authored documents + AI semantic understanding = agents that are powerful yet provably bounded.

4. Incremental improvement > perfectionism
40-60% success with gradual overnight processing beats attempting perfect single-run results.

Open Source & MIT Licensed

CORE is fully open source under MIT license. The goal isn't a commercial product - it's exploring constitutional AI governance as a research direction.

If you're interested in: