Andrej Karpathy described the ideal AI system as a "command center" — observable, debuggable, steerable. Most agent frameworks give you none of that.
Here's the gap: your agent runs 50 tasks, fails silently on 3, and you find out from a customer complaint. There's no audit trail, no enforcement of what went wrong, no way to prevent it next time.
The enforcement ladder approach gives agents 5 levels of structural control:
- L1 (Prose): Instructions in CLAUDE.md — easily ignored
- L2 (Convention): Naming patterns, file structure — fragile
- L3 (Template): Structured output formats — moderate enforcement
- L4 (Test): Automated verification — catches violations
- L5 (Hook): Pre-commit/pre-deploy automation — prevents violations
The key insight: L1 (prose instructions) fails ~47% of the time under context pressure. L5 (hooks) fails 0% — the code literally cannot execute if the check fails.
When Karpathy talks about a command center, this is what he means: structural enforcement that doesn't depend on the model reading its instructions correctly.
Try it yourself: Free AI Governance Scanner — paste any GitHub repo and get a scored assessment in 30 seconds.
Top comments (0)