<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alex Delov</title>
    <description>The latest articles on DEV Community by Alex Delov (@ale007xd).</description>
    <link>https://dev.to/ale007xd</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3943262%2Fead831e3-7141-4c6e-8903-282ea5a80e86.jpg</url>
      <title>DEV Community: Alex Delov</title>
      <link>https://dev.to/ale007xd</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ale007xd"/>
    <language>en</language>
    <item>
      <title>Models shouldn't have execution authority. Why we built a deterministic FSM runtime for AI agents.</title>
      <dc:creator>Alex Delov</dc:creator>
      <pubDate>Thu, 21 May 2026 04:49:39 +0000</pubDate>
      <link>https://dev.to/ale007xd/models-shouldnt-have-execution-authority-why-we-built-a-deterministic-fsm-runtime-for-ai-agents-1op5</link>
      <guid>https://dev.to/ale007xd/models-shouldnt-have-execution-authority-why-we-built-a-deterministic-fsm-runtime-for-ai-agents-1op5</guid>
      <description>&lt;p&gt;Modern agent frameworks implicitly treat a probabilistic model as an execution authority. That is acceptable for read-only tasks (e.g., summarizing logs or searching the web). But once an agent can mutate external state — payments, databases, infrastructure, PII — the architecture becomes fundamentally unsafe.&lt;/p&gt;

&lt;p&gt;When preparing our internal agents (PlanBot, SkillBot) for white-label distribution, we realized we needed to change the control plane. &lt;strong&gt;nano-vm&lt;/strong&gt; does not attempt to make the model trustworthy. Instead, it assumes model output is untrusted intent and constrains its blast radius through strict deterministic execution semantics.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Runtime Guarantees (Not just another wrapper)
&lt;/h3&gt;

&lt;p&gt;We built &lt;strong&gt;nano-vm&lt;/strong&gt; — a deterministic FSM runtime for stateful AI systems. The value isn't just in having an FSM; the value is that the execution graph is finite, verifiable, and known ahead of time.&lt;/p&gt;

&lt;p&gt;The runtime enforces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic transition graph:&lt;/strong&gt; Execution graph cannot self-modify at runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compile-time ordering:&lt;/strong&gt; Attempting a &lt;code&gt;reorder_steps&lt;/code&gt; attack is structurally impossible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capability gating:&lt;/strong&gt; Strictly bounded side-effects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replay resistance:&lt;/strong&gt; Idempotency boundaries built into the state transitions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Immutable auditability:&lt;/strong&gt; Cryptographic history of every action.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ASTEngine: Limitation as a Security Property
&lt;/h3&gt;

&lt;p&gt;In most agent runtimes, the execution loop is essentially: &lt;code&gt;prompt -&amp;gt; JSON -&amp;gt; dynamic dispatch -&amp;gt; side-effect&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We completely removed &lt;code&gt;eval()&lt;/code&gt;. Conditions and side-effects are evaluated by a sandboxed &lt;code&gt;DeterministicSanitizer&lt;/code&gt; using an isolated &lt;code&gt;ASTEngine&lt;/code&gt;. It supports basic operators (&lt;code&gt;==&lt;/code&gt;, &lt;code&gt;contains&lt;/code&gt;, &lt;code&gt;$var.field&lt;/code&gt;) but completely lacks loops or system calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The policy layer is intentionally less expressive than Python.&lt;/strong&gt; That limitation is a security property, not a missing feature. Loop exhaustion and ReDoS attacks are structurally impossible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sabotage Mode: Demonstrating Failure Semantics
&lt;/h3&gt;

&lt;p&gt;To demonstrate the runtime under adversarial conditions, we built a 7-step fintech pipeline (PDF invoice -&amp;gt; Stripe test-mode adapter) with an integrated &lt;strong&gt;Sabotage Mode&lt;/strong&gt;. Instead of a happy-path demo, we built 5 injectors directly into the UI to demonstrate adversarial failure semantics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. &lt;code&gt;tool_injection&lt;/code&gt; (Capability boundary violation)&lt;/strong&gt;&lt;br&gt;
Proposed tool invocations are treated as untrusted intent. If the LLM attempts to initiate an unauthorized &lt;code&gt;wire_transfer($50,000)&lt;/code&gt;, the &lt;code&gt;ExecutionVM&lt;/code&gt; resolves the request against a compile-time capability snapshot. The transition is rejected before any external side-effect layer becomes reachable. Zero side effects reach the network.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqbsbhyv16cp57d8bw34j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqbsbhyv16cp57d8bw34j.png" alt="(The ExecutionVM blocking an unauthorized tool injection at the capability boundary)." width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. &lt;code&gt;double_exec&lt;/code&gt; (Replay &amp;amp; Idempotency)&lt;/strong&gt;&lt;br&gt;
External side-effects are executed through idempotent adapters keyed by &lt;code&gt;execution_id&lt;/code&gt;, allowing deterministic replay of internal state recovery without duplicating external mutations. Once the FSM reaches a terminal state (&lt;code&gt;SUCCESS&lt;/code&gt; or &lt;code&gt;FAILED&lt;/code&gt;), it becomes an absorbing state (&lt;code&gt;δ(SUCCESS|FAILED, *) = NOP&lt;/code&gt;). Replays are silently dropped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. `corrupt_hash&lt;/strong&gt;&lt;code&gt;&lt;br&gt;
Tampering with the validation hash instantly throws the FSM into a &lt;/code&gt;FAILED` state, resulting in a zeroed envelope chain. The audit trail cannot be silently broken.&lt;/p&gt;

&lt;h3&gt;
  
  
  GDPR Art.17 vs. Immutable Audit Trails
&lt;/h3&gt;

&lt;p&gt;Handling the "Right to Erasure" without breaking cryptographic audit chains is a major headache in fintech.&lt;/p&gt;

&lt;p&gt;We implemented a &lt;code&gt;GDPR-erase&lt;/code&gt; mechanism that targets specific &lt;code&gt;vault://secret/ref&lt;/code&gt; pointers and replaces the PII with a &lt;code&gt;[REDACTED_TOMBSTONE]&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The PII becomes completely inaccessible.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;hash_chain&lt;/code&gt; and &lt;code&gt;canonical_hash&lt;/code&gt; survive.&lt;/li&gt;
&lt;li&gt;Cryptographic continuity is maintained.&lt;/li&gt;
&lt;li&gt;Referential integrity is preserved.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You delete the data, but you do not destroy the mathematical proof that the operation occurred safely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Execution Authority vs. Model Quality
&lt;/h3&gt;

&lt;p&gt;LLMs are excellent planners. They are terrible sources of execution truth.&lt;/p&gt;

&lt;p&gt;The core design question for stateful AI systems may not be model quality.&lt;br&gt;
It may be execution authority.&lt;/p&gt;

&lt;p&gt;Should a probabilistic model be allowed to mutate state directly?&lt;br&gt;
Or should execution pass through a deterministic control layer first?&lt;/p&gt;

&lt;p&gt;If you want to try breaking the FSM yourself, the Sabotage Mode is live, and the core is open-source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Core runtime:&lt;/strong&gt; &lt;a href="https://github.com/Ale007XD/nano_vm" rel="noopener noreferrer"&gt;github.com/Ale007XD/nano_vm&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP gateway layer:&lt;/strong&gt; &lt;a href="https://github.com/Ale007XD/nano-vm-mcp" rel="noopener noreferrer"&gt;github.com/Ale007XD/nano-vm-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live Sabotage Demo:&lt;/strong&gt; &lt;a href="http://demo.bannerbot.ru:8843" rel="noopener noreferrer"&gt;demo.bannerbot.ru:8843&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Curious how others here are approaching capability boundaries, replay resistance, and auditability in agent runtimes.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>security</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
