gabrielgadea

Posted on Mar 1

Claude Code as a Meta Code Orchestrator: When the AI Governs Itself

#ai #code #vibecoding #programming

The Missing Ingredient

You've seen it happen. Claude Code generates a brilliant solution — elegant, idiomatic, exactly what you needed. Then, three messages later, it makes the same structural mistake it "fixed" two iterations ago. Or it skips writing the test it just committed to writing. Or it reaches for a bash one-liner when a proper tool exists right there in its kit.

The problem isn't intelligence. The models are extraordinary. The problem is discipline at scale — and discipline is not a property that emerges from raw capability alone.

Every senior engineer knows this. Brilliant developers who can't estimate scope, can't stick to a refactoring constraint, can't remember what they decided an hour ago — they create chaos proportional to their talent. AI systems without internalized governance do the same thing, just faster.

But what happens when you architect the discipline into the system itself? When the AI doesn't just respond to prompts, but operates under a self-imposed governance layer that it cannot bypass?

That's what claude-code-kazuba is: a framework for turning Claude Code into a Meta Code Orchestrator.

What "Meta Code Orchestrator" Actually Means

The term is precise. "Meta" because the system operates on its own operation — it governs how it governs. "Orchestrator" because it doesn't just execute tasks; it routes, classifies, enforces, and accelerates across a three-layer architecture.

Layer 1: Cognitive Directives (CLAUDE.md)

This is the system's internalized knowledge. Not a README the AI reads once and forgets — a structured specification that loads at session start and defines the operating contract: classification taxonomy, communication protocol, self-monitoring triggers, delivery checklist. The AI doesn't consult it; it is it.

## CILA Taxonomy (Code-Integration Levels for AI)
| Level | Type              | Scope                    | Validation         |
|-------|-------------------|--------------------------|--------------------|
| L1    | Bugfix            | Single behavior fix      | Targeted test      |
| L2    | Optimization      | Performance, no API change| Benchmark before/after |
| L3    | Refactoring       | Structure, same behavior | Full test suite    |
| L4    | Architecture      | Fundamental change       | Full suite + integration |
| L5    | Paradigm shift    | New approach/abstraction | Exploratory spike first |

Layer 2: Runtime Enforcement (Hooks)

Sixteen hooks across three modules — hooks-essential, hooks-quality, hooks-routing — intercept five Claude Code events: PreToolUse, PostToolUse, UserPromptSubmit, SessionStart, PreCompact. Exit code 2 blocks the action. Exit code 0 allows it. The system cannot override these at inference time.

Layer 3: Invisible Acceleration (Rust)

Secrets scanning and hot-path validation compiled to native code via PyO3, with graceful Python fallback. The detection pipeline uses Aho-Corasick as a two-phase pre-filter: 14 keyword patterns eliminate 95%+ of files in O(n) before 9 regex patterns run on the remaining candidates. The AI doesn't think about this layer. It just runs 10-50x faster than pure Python for typical files.

Three layers. One emergent property: the system governs itself.

Meta-Code-First: The Plan Is Executable Code

This is the architectural decision that separates this framework from a sophisticated prompt collection.

In most AI-assisted workflows, a "plan" is a Markdown document — a checklist of intentions that the AI may or may not follow, with no enforcement mechanism between the plan and its execution.

In kazuba, plans are generated by code and validated by code.

python scripts/generate_plan.py --validate

This script doesn't write text. It generates 24 implementation phases, each with:

Explicit inputs and outputs
Auto-generated validation scripts
Cross-phase dependency declarations
A unified validator that runs all phases in sequence

python plans/validation/validate_all.py
# Phase 01 [PASS] Module loader
# Phase 02 [PASS] Settings merge
# ...
# Phase 24 [PASS] Integration smoke test
# All 24 phases: PASS

The plan executes. If Phase 07 fails, you know exactly where, exactly why, and exactly what the expected output was. This isn't documentation — it's a contract enforced by the runtime.

The recursive chain runs five levels deep:

Business requirement → architect prompt
Architect prompt → generate_plan.py
generate_plan.py → phase definitions + validators
Phase execution → auto-validated outputs
All-phases validator → CI gate

At no level does a human manually write a plan and hope the AI follows it. The plan is the code. The validation is the proof.

CILA: Formal Taxonomy for Intelligent Routing

You cannot route intelligently without a formal classification system.

This is the gap that makes most AI coding assistants feel unpredictable: they treat "fix this typo" and "redesign the authentication layer" as structurally equivalent problems — both are "things to do." The difference in scope, risk, validation depth, and reversibility is enormous. Without taxonomy, the system cannot distinguish them.

CILA (Code-Integration Levels for AI) provides that taxonomy. Every task is classified before the first line of code is touched:

L1 (Bugfix): Single targeted fix. Validation: one test that was failing now passes.
L2 (Optimization): Performance change, no interface mutation. Validation: benchmark before and after — the number must improve.
L3 (Refactoring): Structure changes, behavior identical. Validation: full suite green, zero regressions.
L4 (Architecture): Fundamental structural change. Validation: full suite + integration tests + branch isolation.
L5 (Paradigm Shift): New abstraction or approach. Protocol: exploratory spike first, then plan, then execute.

Classification isn't bureaucracy. It's the signal that tells the Meta Orchestrator what governance mode to apply. An L1 fix doesn't need a spike. An L5 change that skips the spike creates an expensive archaeology problem three weeks later.

The taxonomy also activates the circuit breakers defined in Layer 1. L4+ changes trigger mandatory user confirmation before execution. The system is aware of its own risk surface.

Self-Hosting: The Proof That It Works

The strongest evidence for an architecture is when it runs on itself.

The claude-code-kazuba repository contains a .claude/ directory. The framework installs itself. The hooks that prevent secrets from leaking in client projects also protect the framework's own source. The CILA taxonomy that governs feature development on client codebases governed the development of the framework itself.

claude-code-kazuba/
├── .claude/             # Framework governing itself
│   ├── CLAUDE.md        # Cognitive directives for dev sessions
│   ├── hooks/           # Runtime enforcement for framework code
│   └── skills/          # Reusable skill library
├── claude_code_kazuba/  # The framework source
│   └── data/modules/    # 3 modules, 16 hooks
└── tests/               # 1,577 tests, 90%+ coverage

1,577 tests. 90%+ coverage. CI green across lint, type check, and test matrix. These aren't aspirational metrics — they're the observable output of a system that enforces its own quality standards on its own development.

The self-hosting property is not a feature claim. It's empirical validation. If the governance architecture couldn't sustain its own development, you'd see it in the numbers. You don't.

The Convergence

What makes the three layers a system rather than a stack of tools?

Each layer alone is familiar:

A well-written CLAUDE.md is good practice
Hooks are a documented Claude Code feature
Rust extensions are a performance choice

The property that emerges from their convergence is what's uncommon: a session that starts already knowing what matters, where every action passes through enforcement rules that cannot be reasoned around, where acceleration happens below the visibility threshold.

The Meta Code Orchestrator doesn't replace the engineer's judgment. It preserves the engineer's judgment across sessions, across distractions, across the pressure to cut corners at 11pm when the deployment is in two hours.

That's the real differentiator. Not the hooks. Not the Rust. The fact that the system's internalized discipline is durable — it doesn't erode with fatigue, doesn't forget what was decided last session, doesn't skip the test when it's inconvenient.

Install and What's Next

pip install claude-code-kazuba
kazuba install --preset standard

Five presets are available: minimal, standard, research, professional, enterprise. Each installs a curated combination of modules, merging hooks into your existing settings.json without overwriting your configuration.

The framework uses a non-destructive merge: permissions.allow and permissions.deny are extended (no duplicates), hook arrays are appended per event, env keys are added only if absent. Your existing setup is preserved.

The claim isn't that claude-code-kazuba makes Claude Code smarter. The claim is that it makes Claude Code disciplined. Intelligence without discipline scales chaos. Intelligence with internalized governance scales competence.

The architecture exists. The tests pass. The self-hosting works. Whether that changes how you think about AI-assisted development is something you'll discover by running it on a real project.