Stop Building Autonomous AI Agents. Build Governed Execution Runtimes Instead.
We’ve all seen the standard AI agent architecture:
LLM → Tool → Reflection → Retry → More Tools → Chaos
It works well for demos.
It fails the moment you need:
- auditability
- replayability
- deterministic boundaries
- regulator-facing guarantees
- operational observability
The core problem is simple:
Most AI systems use probabilistic orchestration.
The LLM controls:
- execution flow
- tool selection
- branching semantics
- retry topology
That means your runtime behavior changes dynamically based on latent model state.
For enterprise systems — especially FinTech, KYC/AML, DevSecOps, LegalTech — this is operationally unacceptable.
So we built something different:
Governed Probabilistic Execution
Instead of treating the LLM as the subject of orchestration, we treat it as a constrained compute unit operating inside a deterministic runtime.
Traditional agents:
LLM decides → System adapts
Governed execution:
System decides → LLM computes
The project:
llm-nano-vmnano-vm-mcpkyc-demo-streamlit
implements this model explicitly.
The Runtime Model
The architecture is built around a deterministic Finite State Machine (FSM).
The LLM:
- does not own control flow
- does not mutate execution topology
- does not dynamically create new execution semantics
- cannot escape governance boundaries
Instead, every execution step is bounded and explicitly governed.
FSM Runtime
↓
Projection Layer
↓
Bounded LLM Step
↓
Typed Transition
↓
Execution Trace
ProjectionLayer: Evaluator Blindness
One of the most important architectural properties is evaluator blindness.
The model never receives full runtime context.
It only sees a target-specific projection:
ProjectionLayer(target=LLM)
This creates several important guarantees:
1. Reduced semantic contamination
The model cannot overfit on:
- governance prompts
- rollback metrics
- entropy alerts
- audit metadata
- unrelated historical state
2. Anti-Hawthorne behavior
The evaluator cannot adapt its behavior simply because it knows it is being monitored.
3. Capability isolation
The Projection Layer acts as:
- a semantic firewall
- a capability boundary
- an information minimization layer
This architecture is closer to capability-security systems than to prompt engineering.
ASTEngine Instead of eval()
The runtime never executes arbitrary Python.
There is:
- no
eval() - no
exec() - no unrestricted expression execution
Conditions are evaluated through a constrained AST engine.
The important point is not just security.
The real goal is bounded semantic expressiveness.
The DSL intentionally forbids:
- method calls
- arbitrary arithmetic
- dynamic execution
- unrestricted Python semantics
Why?
Because unrestricted expressiveness destroys:
- replayability
- analyzability
- deterministic guarantees
- formal reasoning
This design philosophy is much closer to:
- Terraform HCL
- Open Policy Agent (Rego)
- AWS IAM policy DSLs
than to traditional AI orchestration frameworks.
Observability Beyond Tokens
Most AI observability tooling measures:
- latency
- token usage
- cost
- prompt traces
We wanted to measure something deeper:
Structural execution instability
The runtime tracks:
- path variance
- rollback density
- transition sequence variance
- transition entropy
Transition entropy is especially important.
If execution entropy exceeds an empirical threshold (2.5 bits), the runtime flags structural degradation.
This is not “AI monitoring”.
It is execution topology observability.
Failure Laboratory
The KYC Governance Simulator intentionally includes adversarial injectors:
tool_injectionpolicy_bypassskip_stepreorder_stepscorrupt_receiptgdpr_erase
The point is not to showcase a happy path.
The point is to demonstrate deterministic failure semantics under attack conditions.
Most AI demos try to hide instability.
We intentionally surface it.
Trace ≠ Receipt
Another core architectural principle:
Execution → Trace → Analyzer → Receipt
Where:
-
Trace= source of truth -
Receipt= deterministic projection -
Analyzer= post-hoc interpretation layer
Receipts are:
- recomputable
- deterministic
- derived artifacts
They are not mutable runtime state.
This is heavily inspired by event-sourcing philosophy.
Transactional AI Code Mutation
We applied the same principles to repository mutation.
The companion nano-vm-dev-agent performs code changes transactionally:
stage_patch()
→ validate_staged_mypy(tmpdir)
→ pytest
→ commit OR rollback
The repository is never mutated before type validation succeeds.
This creates CI-grade mutation safety for AI-assisted development.
Most coding agents operate on best-effort mutation semantics.
This runtime applies transactional guarantees instead.
Why Streamlit?
We intentionally skipped:
- React
- Vite
- complex async frontend state systems
The UI is built entirely in Python using Streamlit.
Why?
Because the project optimizes for:
- governance correctness
- deterministic behavior
- engineering simplicity
- type safety
- operational transparency
Not frontend maximalism.
Current Status
Current ecosystem status:
-
llm-nano-vmv0.8.4 -
nano-vm-mcpv0.4.3 kyc-demo-streamlit-
nano-vm-dev-agentv0.2.0
Engineering discipline:
mypy --strictpytest- deterministic constraints
- no arbitrary runtime execution
The KYC demo currently passes:
-
51/51tests -
0mypy errors
The Bigger Shift
The industry is saturated with autonomous agent hype.
But critical infrastructure does not need autonomous orchestration.
It needs:
- bounded execution
- deterministic governance
- replayability
- auditability
- operational observability
The future may not belong to autonomous agents.
It may belong to governed execution runtimes for probabilistic systems.
Top comments (0)