<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Adam Munawar Rahman</title>
    <description>The latest articles on DEV Community by Adam Munawar Rahman (@msradam).</description>
    <link>https://dev.to/msradam</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3965802%2F2a1c6dea-e938-42de-817c-a3cf42e40010.png</url>
      <title>DEV Community: Adam Munawar Rahman</title>
      <link>https://dev.to/msradam</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/msradam"/>
    <language>en</language>
    <item>
      <title>meet elliot: a robot sim an llm drives, kept honest by a state machine</title>
      <dc:creator>Adam Munawar Rahman</dc:creator>
      <pubDate>Tue, 09 Jun 2026 13:29:01 +0000</pubDate>
      <link>https://dev.to/msradam/meet-elliot-a-robot-sim-an-llm-drives-kept-honest-by-a-state-machine-2pfg</link>
      <guid>https://dev.to/msradam/meet-elliot-a-robot-sim-an-llm-drives-kept-honest-by-a-state-machine-2pfg</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pn8rnmto1hyq87tni0q.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pn8rnmto1hyq87tni0q.gif" alt="Elliot running" width="600" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Elliot drives. The state machine says no.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I gave a language model the wheel. Then I gave a state machine the power to&lt;br&gt;
refuse any move it tried to make.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;30 seconds to running.&lt;/strong&gt; No API key, no model, fully offline (a deterministic&lt;br&gt;
navigator drives):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/msradam/elliot &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;elliot
uv venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
python run.py &lt;span class="nt"&gt;--offline&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That is Elliot, a robot in a 2D simulation. An LLM is driving him. It narrates&lt;br&gt;
every step in its own words and reaches for the next phase the moment it thinks&lt;br&gt;
it is ready. The red lines are the state machine telling it no.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; the model proposes the next move; a finite state machine validates it&lt;br&gt;
against real state and refuses the rest. The refusing is done by Theodosia, a&lt;br&gt;
small open-source library I built, and it is one line of glue. The robot is just&lt;br&gt;
the most fun way I found to explain it.&lt;/p&gt;
&lt;h2&gt;
  
  
  the problem: LLMs are confident, not correct
&lt;/h2&gt;

&lt;p&gt;If you have built an agent that does more than one step, you know the failure&lt;br&gt;
mode. The model decides it has finished a step it has not finished. It skips&lt;br&gt;
ahead. It claims the file is written, the order is placed, the deploy is done.&lt;/p&gt;

&lt;p&gt;Mine once marked an order "placed" because the payment call timed out instead of&lt;br&gt;
returning an error. The model read the non-answer as a yes, moved on, and fired&lt;br&gt;
the confirmation email for an order that did not exist.&lt;/p&gt;

&lt;p&gt;It is not lying on purpose. It has no ground truth, only its own belief, and its&lt;br&gt;
belief sits upstream of reality.&lt;/p&gt;

&lt;p&gt;The usual fix is to prompt harder. Add "do not proceed until X." That works&lt;br&gt;
until it doesn't.&lt;/p&gt;
&lt;h2&gt;
  
  
  the idea: make the workflow a state machine, and let the model only propose
&lt;/h2&gt;

&lt;p&gt;What if the workflow itself were a finite state machine, with explicit states&lt;br&gt;
and explicit transitions, and the model could only &lt;em&gt;propose&lt;/em&gt; the next&lt;br&gt;
transition? Something else validates the proposal against real state and either&lt;br&gt;
runs it or refuses it and hands back the moves that are actually legal.&lt;/p&gt;

&lt;p&gt;You get determinism where you need it (the graph and the gates) and model&lt;br&gt;
judgment where you want it (which legal move, and why). The model drives. It&lt;br&gt;
does not get to redraw the road.&lt;/p&gt;
&lt;h2&gt;
  
  
  meet Elliot
&lt;/h2&gt;

&lt;p&gt;Elliot is that idea staged in a sim, and a little theatrical.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;His rulebook is a finite state machine, built with &lt;a href="https://github.com/apache/burr" rel="noopener noreferrer"&gt;Apache Burr&lt;/a&gt;
(a state-machine library): five phases, boot, recon, exploit, exfil, ghost,
and the legal moves between them. It decides nothing. It only says which moves
exist. (Yes, named after that Elliot. If you know, you know.)&lt;/li&gt;
&lt;li&gt;The mind is an LLM. It reads the state, narrates, and picks which legal move
to make, through litellm, so the model is swappable.&lt;/li&gt;
&lt;li&gt;His senses are the state: a 2D lidar, his position, a collision flag, an
arrival flag, all from &lt;a href="https://github.com/hanruihua/ir-sim" rel="noopener noreferrer"&gt;ir-sim&lt;/a&gt; (a 2D
robot simulator), running headless.&lt;/li&gt;
&lt;li&gt;A plain controller does the actual steering, because that is motor work and
the model is bad at motor work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7scqceobbg9jcl6cx728.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7scqceobbg9jcl6cx728.png" alt="Elliot's cockpit" width="799" height="666"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Watch the console in that screenshot. The model is in RECON, closing on the&lt;br&gt;
target. It is eager, so it keeps reaching for EXPLOIT. And the machine keeps&lt;br&gt;
answering:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;REFUSED&lt;/strong&gt; reached for &lt;code&gt;exploit&lt;/code&gt; from &lt;code&gt;recon&lt;/code&gt;; not earned. allowed: &lt;code&gt;recon&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It cannot move to EXPLOIT until the target is actually within sensing range. It&lt;br&gt;
cannot move to EXFIL until the simulator's own arrival flag fires. It cannot&lt;br&gt;
GHOST until it is genuinely back home. Every gate is a fact the world has to&lt;br&gt;
supply, not a claim the model can make. When the model reaches early, it gets&lt;br&gt;
the refusal plus the list of moves it is allowed, and it works the phase it is&lt;br&gt;
in instead.&lt;/p&gt;

&lt;p&gt;That refusal is not an error. It is the point. It is what makes it safe to hand&lt;br&gt;
the model the wheel.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;the machine refuses anything the world has not earned.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  the thing doing the refusing: Theodosia
&lt;/h2&gt;

&lt;p&gt;Here is the part I actually want to show you.&lt;/p&gt;

&lt;p&gt;Theodosia takes an Apache Burr state machine and mounts it as an&lt;br&gt;
&lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; server. One call:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;theodosia&lt;/span&gt;

&lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;theodosia&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;burr_app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;elliot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Now any MCP client (Claude, your own agent, a script) sees a tiny, constant tool&lt;br&gt;
surface: mostly one &lt;code&gt;step&lt;/code&gt; tool. The client calls &lt;code&gt;step(action)&lt;/code&gt;. Theodosia&lt;br&gt;
checks that action against the transitions actually reachable from the current&lt;br&gt;
state. If it is legal, it runs. If it is not, the client gets told no, and told&lt;br&gt;
what it &lt;em&gt;can&lt;/em&gt; do:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"invalid_transition"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"requested"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"exfil"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"valid_next_actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"exploit"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"action 'exfil' is not reachable from current state. Valid actions now: ['exploit']."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The graph is the contract.&lt;/p&gt;

&lt;p&gt;The gates are conditions on real state, and you write them in plain Burr:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;burr.core&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;when&lt;/span&gt;

&lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;with_transitions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exploit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exfil&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;when&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_reached&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;  &lt;span class="c1"&gt;# only once the world says so
&lt;/span&gt;    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exploit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exploit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;                            &lt;span class="c1"&gt;# otherwise, keep driving in
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Two more things Theodosia gives you that I leaned on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;State lives on the server.&lt;/strong&gt; The model never holds the state and cannot
drift it. It proposes; the server is the source of truth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Every step is recorded.&lt;/strong&gt; Theodosia keeps a hash-chained ledger of every
action and every refusal: each entry carries a hash of the one before it, so a
single edited or dropped step breaks the chain. Replay a session and you can
verify nothing was quietly changed. For anything auditable (payments, deploys,
support actions) that is the part that matters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Burr graph is the only thing you write. Theodosia is the one line that turns&lt;br&gt;
it into something an agent can drive but cannot break.&lt;/p&gt;
&lt;h2&gt;
  
  
  this is not really about robots
&lt;/h2&gt;

&lt;p&gt;Elliot is a robot because a robot is fun to watch and easy to understand. The&lt;br&gt;
pattern is for any LLM-driven workflow where the model should drive and should&lt;br&gt;
not be trusted to report its own progress:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a checkout flow where "payment captured" has to be true, not claimed&lt;/li&gt;
&lt;li&gt;a deploy pipeline where you cannot run the next stage until the last one passed&lt;/li&gt;
&lt;li&gt;a multi-step form, a support runbook, an agent task graph&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are not hypothetical. The production-grade version is&lt;br&gt;
&lt;a href="https://github.com/msradam/leavitt" rel="noopener noreferrer"&gt;Leavitt&lt;/a&gt;, an on-call incident-triage agent&lt;br&gt;
built on Theodosia: it reads Grafana metrics and logs, k6 load, and deployment&lt;br&gt;
context, correlates them, and writes a triage report whose disposition is&lt;br&gt;
constrained by the evidence, not the model's confidence.&lt;/p&gt;

&lt;p&gt;It only ever reads. You can point it at production and walk away. On Microsoft&lt;br&gt;
Research's AIOpsLab benchmark the enforcement layer costs nothing in accuracy; it&lt;br&gt;
just turns a confident wrong report into a "degraded" or "inconclusive" one.&lt;/p&gt;

&lt;p&gt;Draw it as a state machine. Put the conditions on real state. Mount it with&lt;br&gt;
Theodosia. The model gets to be smart inside the rails, and the rails do not&lt;br&gt;
move.&lt;/p&gt;
&lt;h2&gt;
  
  
  why not LangGraph or LangChain?
&lt;/h2&gt;

&lt;p&gt;LangGraph and LangChain are in-process orchestration layers. You compose nodes,&lt;br&gt;
hold the state, and run the loop inside your own program, and they are good at&lt;br&gt;
that: building graphs, wiring tools, threading memory and retrieval through a&lt;br&gt;
chain. If you want a flexible framework for assembling an agent's logic, that is&lt;br&gt;
exactly what they are for.&lt;/p&gt;

&lt;p&gt;Theodosia solves a different problem, and here is where I will plant a flag: a&lt;br&gt;
guardrail that runs in the same process as the agent is a guardrail the agent can&lt;br&gt;
route around. Theodosia is a contract enforced at the server level, not a&lt;br&gt;
framework you orchestrate from. It mounts a plain state machine as an MCP server,&lt;br&gt;
and the server decides which transition is even allowed and refuses the rest. The&lt;br&gt;
state lives server-side, out of the model's reach, and every step lands in a&lt;br&gt;
tamper-evident ledger. You can put Theodosia behind an agent written in&lt;br&gt;
LangGraph, LangChain, or nothing at all, because the enforcement does not live in&lt;br&gt;
the client.&lt;/p&gt;
&lt;h2&gt;
  
  
  try it
&lt;/h2&gt;

&lt;p&gt;Two steps, and the second one is a single line:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;theodosia
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;theodosia&lt;/span&gt;

&lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;theodosia&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;your_burr_app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# now it is an MCP server that refuses illegal moves
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The &lt;a href="https://github.com/msradam/theodosia" rel="noopener noreferrer"&gt;Theodosia repo&lt;/a&gt; has the full guide.&lt;br&gt;
And if you want to watch it drive something before you wire up your own graph,&lt;br&gt;
Elliot is the runnable demo, the offline command at the top of this post.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/msradam" rel="noopener noreferrer"&gt;
        msradam
      &lt;/a&gt; / &lt;a href="https://github.com/msradam/theodosia" rel="noopener noreferrer"&gt;
        theodosia
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Put an AI agent on rails: mount a Burr state machine as an MCP server so the agent can only take the next allowed step, with every step recorded and replayable.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Theodosia&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a href="https://pypi.org/project/theodosia/" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/1c0d4cf75aebeb8bd1675e45912d53a4da2f930c3d8c57e558cff2f8b22720f1/68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f7468656f646f7369613f7374796c653d666c61742d73717561726526636f6c6f723d323836393833266c6f676f3d70797069266c6f676f436f6c6f723d7768697465" alt="PyPI"&gt;&lt;/a&gt;
&lt;a href="https://github.com/msradam/theodosia/actions/workflows/ci.yml" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/bafffe7f05de073f7ddae92bff5a73e7c600a5f4e52033b92b6a41bddde992d2/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f616374696f6e732f776f726b666c6f772f7374617475732f6d73726164616d2f7468656f646f7369612f63692e796d6c3f6272616e63683d6d61696e267374796c653d666c61742d737175617265266c6162656c3d7465737473" alt="tests"&gt;&lt;/a&gt;
&lt;a href="https://github.com/msradam/theodosia/LICENSE" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/2dcdedac5245d3d31b8a74851d75723bb008863040b7c3d32461193146b7490d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6963656e73652d417061636865253230322e302d3238363938333f7374796c653d666c61742d737175617265" alt="License"&gt;&lt;/a&gt;
&lt;a href="https://msradam.github.io/theodosia/" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/d5272f38ee8ad1947b52a73064639c1247f8df99f7b00f2d4cf43f785501000c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f646f63732d7468656f646f7369612d3238363938333f7374796c653d666c61742d737175617265266c6f676f3d617374726f266c6f676f436f6c6f723d7768697465" alt="Docs"&gt;&lt;/a&gt;
&lt;a href="https://github.com/apache/burr" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/ced39fc4706bcecb1db8184c3c04f9394820c63a300da42d54d0182073755ac1/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6275696c742532306f6e2d417061636865253230427572722d3761336334613f7374796c653d666c61742d737175617265" alt="Built on Apache Burr"&gt;&lt;/a&gt;
&lt;a href="https://github.com/jlowin/fastmcp" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/feb7c3a2a7d551b5d9d791f7f6b2c3227e09d29c15c5c196718f6d6f81aa9357/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6275696c742532306f6e2d466173744d43502d3761336334613f7374796c653d666c61742d737175617265" alt="Built on FastMCP"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Theodosia mounts a &lt;a href="https://burr.dagworks.io/" rel="nofollow noopener noreferrer"&gt;Burr&lt;/a&gt; &lt;code&gt;Application&lt;/code&gt; as an MCP server. Every Burr action is reachable through a single &lt;code&gt;step(action, inputs)&lt;/code&gt; tool; the server checks reachability against the graph before each action runs, refuses out-of-order calls with the legal next moves, and records every attempt.&lt;/p&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/msradam/theodosia/demos/demo.gif"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fmsradam%2Ftheodosia%2FHEAD%2Fdemos%2Fdemo.gif" alt="A coffee-order FSM driven over MCP, with a refusal and recovery"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Install&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;Python 3.11, 3.12, or 3.13 (Burr does not yet support 3.14). On a fresh Python 3.14 install you will see "no version that satisfies the requirement theodosia"; create a 3.11–3.13 venv first.&lt;/p&gt;
&lt;div class="highlight highlight-source-shell notranslate position-relative overflow-auto js-code-highlight"&gt;
&lt;pre&gt;uv venv --python 3.13       &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; or: python3.13 -m venv .venv&lt;/span&gt;
uv pip install theodosia    &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; or: pip install theodosia&lt;/span&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Optional extras: &lt;code&gt;theodosia[observability]&lt;/code&gt;, &lt;code&gt;theodosia[ui]&lt;/code&gt;, &lt;code&gt;theodosia[claude]&lt;/code&gt;, &lt;code&gt;theodosia[mellea]&lt;/code&gt;, &lt;code&gt;theodosia[all]&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;On a slim Docker image (&lt;code&gt;python:3.13-slim&lt;/code&gt;, Alpine) the install pulls a &lt;code&gt;psutil&lt;/code&gt; build that needs &lt;code&gt;gcc&lt;/code&gt; and &lt;code&gt;python3-dev&lt;/code&gt;. Either use the full &lt;code&gt;python:3.13&lt;/code&gt; image, or &lt;code&gt;apt-get install -y gcc python3-dev&lt;/code&gt; before &lt;code&gt;pip install&lt;/code&gt;.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Try it without an API&lt;/h2&gt;…&lt;/div&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/msradam/theodosia" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;



&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/msradam" rel="noopener noreferrer"&gt;
        msradam
      &lt;/a&gt; / &lt;a href="https://github.com/msradam/elliot" rel="noopener noreferrer"&gt;
        elliot
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      a robot whose mind is a finite state machine. an llm drives the transitions, the machine refuses any move the world hasn't earned. hello, friend.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/msradam/elliot/media/elliot.gif"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fmsradam%2Felliot%2FHEAD%2Fmedia%2Felliot.gif" alt="Elliot" width="900"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class="snippet-clipboard-content notranslate position-relative overflow-auto"&gt;&lt;pre class="notranslate"&gt;&lt;code&gt;  ════════════════════════════════════════════════════════════════

     E L L I O T                          (yes, named after him.)

  ════════════════════════════════════════════════════════════════

   a robot that is a finite state machine. its sensors are the
   state. an llm drives the transitions. the machine refuses any
   move the world has not earned.

   hello, friend. you are about to run something that thinks, and i
   would rather you knew how little of it actually gets to decide.
   read this before you run me.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class="snippet-clipboard-content notranslate position-relative overflow-auto"&gt;
&lt;pre class="notranslate"&gt;&lt;code&gt;─[ what i am ]──────────────────────────────────────────────────
  one robot in a small, unmapped 2d world. somewhere out there is
  a target, and an obstacle or two between me and it. my job is a
  four-step break-in:

      ◉ boot  ▸  ◈ recon  ▸  ◆ exploit  ▸  ◇ exfil  ▸  ✕ ghost

    boot     wake up. read my own senses twice, make them agree,
             then start. i trust nothing yet, least of all me.
    recon    close on the target through open ground, around
             whatever is&lt;/code&gt;&lt;/pre&gt;…&lt;/div&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/msradam/elliot" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;What is the worst version of this you have hit? I want the story: the time an&lt;br&gt;
agent told you, with total confidence, that it had finished a step it never&lt;br&gt;
actually ran. What was it supposed to do, and how did you catch it? Drop it in&lt;br&gt;
the comments.&lt;/p&gt;

&lt;p&gt;hello, friend.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>opensource</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
