DEV Community

Alex Delov
Alex Delov

Posted on

Stop Building Autonomous AI Agents. Build Governed Execution Runtimes Instead.

Stop Building Autonomous AI Agents. Build Governed Execution Runtimes Instead.

We’ve all seen the standard AI agent architecture:

LLM → Tool → Reflection → Retry → More Tools → Chaos
Enter fullscreen mode Exit fullscreen mode

It works well for demos.

It fails the moment you need:

  • auditability
  • replayability
  • deterministic boundaries
  • regulator-facing guarantees
  • operational observability

The core problem is simple:

Most AI systems use probabilistic orchestration.

The LLM controls:

  • execution flow
  • tool selection
  • branching semantics
  • retry topology

That means your runtime behavior changes dynamically based on latent model state.

For enterprise systems — especially FinTech, KYC/AML, DevSecOps, LegalTech — this is operationally unacceptable.

So we built something different:

Governed Probabilistic Execution

Instead of treating the LLM as the subject of orchestration, we treat it as a constrained compute unit operating inside a deterministic runtime.

Traditional agents:
LLM decides → System adapts

Governed execution:
System decides → LLM computes
Enter fullscreen mode Exit fullscreen mode

The project:

  • llm-nano-vm
  • nano-vm-mcp
  • kyc-demo-streamlit

implements this model explicitly.


The Runtime Model

The architecture is built around a deterministic Finite State Machine (FSM).

The LLM:

  • does not own control flow
  • does not mutate execution topology
  • does not dynamically create new execution semantics
  • cannot escape governance boundaries

Instead, every execution step is bounded and explicitly governed.

FSM Runtime
    ↓
Projection Layer
    ↓
Bounded LLM Step
    ↓
Typed Transition
    ↓
Execution Trace
Enter fullscreen mode Exit fullscreen mode

ProjectionLayer: Evaluator Blindness

One of the most important architectural properties is evaluator blindness.

The model never receives full runtime context.

It only sees a target-specific projection:

ProjectionLayer(target=LLM)
Enter fullscreen mode Exit fullscreen mode

This creates several important guarantees:

1. Reduced semantic contamination

The model cannot overfit on:

  • governance prompts
  • rollback metrics
  • entropy alerts
  • audit metadata
  • unrelated historical state

2. Anti-Hawthorne behavior

The evaluator cannot adapt its behavior simply because it knows it is being monitored.

3. Capability isolation

The Projection Layer acts as:

  • a semantic firewall
  • a capability boundary
  • an information minimization layer

This architecture is closer to capability-security systems than to prompt engineering.


ASTEngine Instead of eval()

The runtime never executes arbitrary Python.

There is:

  • no eval()
  • no exec()
  • no unrestricted expression execution

Conditions are evaluated through a constrained AST engine.

The important point is not just security.

The real goal is bounded semantic expressiveness.

The DSL intentionally forbids:

  • method calls
  • arbitrary arithmetic
  • dynamic execution
  • unrestricted Python semantics

Why?

Because unrestricted expressiveness destroys:

  • replayability
  • analyzability
  • deterministic guarantees
  • formal reasoning

This design philosophy is much closer to:

  • Terraform HCL
  • Open Policy Agent (Rego)
  • AWS IAM policy DSLs

than to traditional AI orchestration frameworks.


Observability Beyond Tokens

Most AI observability tooling measures:

  • latency
  • token usage
  • cost
  • prompt traces

We wanted to measure something deeper:

Structural execution instability

The runtime tracks:

  • path variance
  • rollback density
  • transition sequence variance
  • transition entropy

Transition entropy is especially important.

If execution entropy exceeds an empirical threshold (2.5 bits), the runtime flags structural degradation.

This is not “AI monitoring”.

It is execution topology observability.


Failure Laboratory

The KYC Governance Simulator intentionally includes adversarial injectors:

  • tool_injection
  • policy_bypass
  • skip_step
  • reorder_steps
  • corrupt_receipt
  • gdpr_erase

The point is not to showcase a happy path.

The point is to demonstrate deterministic failure semantics under attack conditions.

Most AI demos try to hide instability.

We intentionally surface it.


Trace ≠ Receipt

Another core architectural principle:

Execution → Trace → Analyzer → Receipt
Enter fullscreen mode Exit fullscreen mode

Where:

  • Trace = source of truth
  • Receipt = deterministic projection
  • Analyzer = post-hoc interpretation layer

Receipts are:

  • recomputable
  • deterministic
  • derived artifacts

They are not mutable runtime state.

This is heavily inspired by event-sourcing philosophy.


Transactional AI Code Mutation

We applied the same principles to repository mutation.

The companion nano-vm-dev-agent performs code changes transactionally:

stage_patch()
→ validate_staged_mypy(tmpdir)
→ pytest
→ commit OR rollback
Enter fullscreen mode Exit fullscreen mode

The repository is never mutated before type validation succeeds.

This creates CI-grade mutation safety for AI-assisted development.

Most coding agents operate on best-effort mutation semantics.

This runtime applies transactional guarantees instead.


Why Streamlit?

We intentionally skipped:

  • React
  • Vite
  • complex async frontend state systems

The UI is built entirely in Python using Streamlit.

Why?

Because the project optimizes for:

  • governance correctness
  • deterministic behavior
  • engineering simplicity
  • type safety
  • operational transparency

Not frontend maximalism.


Current Status

Current ecosystem status:

  • llm-nano-vm v0.8.4
  • nano-vm-mcp v0.4.3
  • kyc-demo-streamlit
  • nano-vm-dev-agent v0.2.0

Engineering discipline:

  • mypy --strict
  • pytest
  • deterministic constraints
  • no arbitrary runtime execution

The KYC demo currently passes:

  • 51/51 tests
  • 0 mypy errors

The Bigger Shift

The industry is saturated with autonomous agent hype.

But critical infrastructure does not need autonomous orchestration.

It needs:

  • bounded execution
  • deterministic governance
  • replayability
  • auditability
  • operational observability

The future may not belong to autonomous agents.

It may belong to governed execution runtimes for probabilistic systems.

Repositories

Top comments (0)