<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Markov</title>
    <description>The latest articles on DEV Community by Markov (@mandalore-wang).</description>
    <link>https://dev.to/mandalore-wang</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3898811%2F47ccd23d-9d3a-4230-a701-544351d17696.png</url>
      <title>DEV Community: Markov</title>
      <link>https://dev.to/mandalore-wang</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mandalore-wang"/>
    <language>en</language>
    <item>
      <title>How I cut my OpenAI Agent latency by replacing cloud sandboxes with a local microVM</title>
      <dc:creator>Markov</dc:creator>
      <pubDate>Tue, 28 Apr 2026 14:06:11 +0000</pubDate>
      <link>https://dev.to/mandalore-wang/how-i-cut-my-openai-agent-latency-by-replacing-cloud-sandboxes-with-a-local-microvm-1fp3</link>
      <guid>https://dev.to/mandalore-wang/how-i-cut-my-openai-agent-latency-by-replacing-cloud-sandboxes-with-a-local-microvm-1fp3</guid>
      <description>&lt;p&gt;A few days ago, I was building a coding agent using the new OpenAI Agents SDK. Like everyone else, I plugged in one of the official cloud sandboxes (I won't name names, they are all generally good). &lt;/p&gt;

&lt;p&gt;My agent was working, but it felt incredibly sluggish. &lt;/p&gt;

&lt;p&gt;I looked at the logs. My agent was averaging about 15 tool calls per task. Because the sandbox was hosted in the cloud, the physical path looked like this:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;My Agent Runtime → Internet → Cloud Sandbox → MicroVM → Internet → My Agent Runtime&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Every single &lt;code&gt;exec_command&lt;/code&gt; was doing two round trips across the public internet. That's 30 network hops per task. The cloud provider advertised a "90ms cold start", but what was actually killing my UX was the constant RTT overhead on every tool call. &lt;/p&gt;

&lt;p&gt;I tried falling back to the SDK's default local option (&lt;code&gt;bubblewrap&lt;/code&gt; on Linux). It was fast, but it relies on process-level syscall filters. Running untrusted LLM-generated code directly on my host kernel just felt like a disaster waiting to happen.&lt;/p&gt;

&lt;h3&gt;
  
  
  Finding the middle ground: BoxLite
&lt;/h3&gt;

&lt;p&gt;I wanted the hardware isolation of a cloud VM, but the zero-latency of a local process. I found &lt;strong&gt;BoxLite&lt;/strong&gt;. &lt;a href="https://github.com/boxlite-ai/boxlite" rel="noopener noreferrer"&gt;https://github.com/boxlite-ai/boxlite&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;BoxLite is essentially the SQLite of sandboxing. It's an embedded microVM that uses KVM (Linux) or Hypervisor.framework (macOS) to spin up a dedicated guest kernel right on your machine.&lt;/p&gt;

&lt;p&gt;The best part? No daemons to configure, no Docker sockets, no root access. Just a pip install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;boxlite-openai-agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The 1-Line Swap
&lt;/h3&gt;

&lt;p&gt;I didn't have to rewrite my agent logic. I just changed the &lt;code&gt;client&lt;/code&gt; in my &lt;code&gt;RunConfig&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;boxlite_openai_agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BoxLiteSandboxClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BoxLiteSandboxClientOptions&lt;/span&gt;

&lt;span class="c1"&gt;# ... agent setup ...
&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write fizzbuzz.py for n=15 and run it.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;run_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;RunConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;sandbox&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SandboxRunConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;BoxLiteSandboxClient&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="c1"&gt;# &amp;lt;-- Changed this line
&lt;/span&gt;            &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;BoxLiteSandboxClientOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python:3.12-slim&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Result
&lt;/h3&gt;

&lt;p&gt;The latency dropped immediately. Because the microVM runs in the same process as the agent runtime, the internet hops went from 30 down to &lt;strong&gt;zero&lt;/strong&gt;. The communication is all microsecond-level IPC.&lt;/p&gt;

&lt;p&gt;Plus, because it uses QCOW2 snapshots, I stopped having to re-run &lt;code&gt;pip install pandas&lt;/code&gt; on every session. I just snapshot the VM state and resume it the next day in under a second.&lt;/p&gt;

&lt;p&gt;If you are building coding agents on your laptop and are tired of cloud latency and timeouts, definitely give local microVMs a try. It completely changed my workflow.&lt;/p&gt;

</description>
      <category>sandbox</category>
      <category>agents</category>
      <category>openai</category>
      <category>python</category>
    </item>
  </channel>
</rss>
