DEV Community

Dariusz Newecki
Dariusz Newecki

Posted on

How I Achieved 70% Autonomous Code Generation with Constitutional AI Governance

After three months of development, CORE v2.0.0 just achieved something I wasn't sure was possible: 70% success rate on autonomous code generation with constitutional governance ensuring it stays safe and bounded.

The Problem

AI agents can write code incredibly fast. Tools like Claude, GPT-4, and DeepSeek are crushing benchmarks. But there's a catch:

  • They break your architecture
  • They skip tests
  • They ignore naming conventions
  • They create files in random places
  • They have no accountability

How do you give AI agents autonomy without losing control?

The Solution: Constitutional AI Governance

CORE uses a "constitution" - human-authored policies stored as YAML files that AI agents can semantically understand. Every autonomous action goes through:

  1. Constitutional Audit - Validates against architecture rules, naming conventions, file placement
  2. Semantic Validation - Uses a knowledge graph (513 symbols) to understand context
  3. Test Execution - Runs tests, auto-fixes failures
  4. Clean Merge - Only merges if everything passes

Think of it like a constitutional democracy for AI: the AI has agency, but operates within defined boundaries.

Real Metrics from v2.0.0

Starting from scratch with no autonomous capabilities:

  • Code generation success: 0% → 70%
  • Semantic placement accuracy: 45% → 100%
  • Knowledge graph: 513 symbols vectorized
  • Module anchors: 66 for accurate code placement
  • Policy chunks: 48 chunks enabling constitutional understanding

How It Works: Mind-Body-Will Architecture

Mind (.intent/ directory)

  • Constitution defines immutable laws
  • Policies stored as human-authored YAML
  • Cryptographic signing for governance changes

Body (src/ code)

  • Deterministic execution tools
  • Constitutional auditor
  • Knowledge graph (PostgreSQL)
  • Vector storage (Qdrant)

Will (AI agents)

  • PlannerAgent creates execution plans
  • CoderAgent generates code within bounds
  • ExecutionAgent validates and tests
  • All operate in defined "autonomy lanes"

The Key Innovation: Semantic Policy Understanding

Instead of hardcoded rules, policies are vectorized and semantically searchable. The AI understands:

  • Which policies apply to which code
  • Why a placement violates architecture
  • How to remediate violations autonomously

Example policy chunk:

agent_rules:
  - id: "no_direct_db_access"
    statement: "Agents must use ServiceRegistry, not direct imports"
    enforcement: "error"
    rationale: "Prevents split-brain dependency injection"
Enter fullscreen mode Exit fullscreen mode

The AI reads this, understands it semantically, and generates code that complies.

The Autonomy Ladder

CORE progresses through defined autonomy levels:

  • A0: Self-Awareness - Knowledge graph operational
  • A1: Self-Healing - Auto-fix drift, formatting, compliance
  • A2: Code Generation - Create new features autonomously (current)
  • 🎯 A3: Strategic Refactoring - Multi-file architectural improvements (next)
  • 🔮 A4: Self-Replication - Write CORE.NG from scratch

What Makes This Different

vs. GitHub Copilot/Cursor:

  • CORE focuses on autonomy with governance, not assisted coding
  • Constitutional boundaries prevent drift
  • Cryptographic approval for governance changes

vs. AutoGPT/BabyAGI:

  • Constitutional framework prevents "going off the rails"
  • Audit system catches violations before they merge
  • Human quorum for critical decisions

vs. Traditional CI/CD:

  • AI understands why rules exist, not just what they are
  • Autonomous remediation, not just detection
  • Self-evolving within constitutional bounds

Example: Autonomous Code Generation Flow

User request: "Create a health endpoint"

  1. PlannerAgent creates execution plan with constitutional check
  2. CoderAgent generates code using semantic context (knows where health endpoints go, what imports to use, naming patterns)
  3. Constitutional Audit validates placement, naming, imports
  4. ExecutionAgent runs tests, auto-fixes failures
  5. Merge - Only if audit passes and tests succeed

Success rate: 70% end-to-end without human intervention.

Tech Stack

  • Python 3.12+ with Poetry
  • PostgreSQL for knowledge graph (Single Source of Truth)
  • Qdrant for vector storage
  • LLM Providers: DeepSeek, Claude, OpenAI (configurable)
  • Constitutional Framework: YAML policies with cryptographic signing

Try It Yourself

git clone https://github.com/DariuszNewecki/CORE.git
cd CORE
poetry install
cp .env.example .env
# Add your LLM API keys

# Build knowledge graph
poetry run core-admin fix vector-sync --write

# Run constitutional audit
poetry run core-admin check audit

# Try autonomous code generation
poetry run core-admin chat "create a CLI command that validates JSON"
Enter fullscreen mode Exit fullscreen mode

What's Next: A3 Autonomy

The next frontier is strategic refactoring - multi-file architectural improvements that require understanding the entire system, not just individual modules.

This requires:

  • Cross-file dependency analysis
  • Impact assessment before changes
  • Coordinated multi-file modifications
  • Validation that architecture improved, not regressed

Lessons Learned

1. You can't automate what you can't do manually
Building comprehensive manual tooling first made automation straightforward. CORE already had ~80% of needed infrastructure.

2. Context quality matters more than model quality
Moving from basic string concatenation to rich context packages (semantic search, graph traversal, structured metadata) improved success from 0% to 70%.

3. Constitutional governance enables safe autonomy
Policies as human-authored documents + AI semantic understanding = agents that are powerful yet provably bounded.

4. Incremental improvement > perfectionism
40-60% success with gradual overnight processing beats attempting perfect single-run results.

Open Source & MIT Licensed

CORE is fully open source under MIT license. The goal isn't a commercial product - it's exploring constitutional AI governance as a research direction.

If you're interested in:

  • Constitutional AI governance
  • Autonomous code generation
  • Making AI agents safe and auditable
  • Self-improving systems

Check out the repo: https://github.com/DariuszNewecki/CORE

Documentation: https://dariusznewecki.github.io/CORE/


Questions? Comments? Critiques? Drop them below - I'm here to discuss the constitutional governance approach and learn from the community.

Top comments (0)