DEV Community

Cover image for How I cut my OpenAI Agent latency by replacing cloud sandboxes with a local microVM
Markov
Markov

Posted on

How I cut my OpenAI Agent latency by replacing cloud sandboxes with a local microVM

A few days ago, I was building a coding agent using the new OpenAI Agents SDK. Like everyone else, I plugged in one of the official cloud sandboxes (I won't name names, they are all generally good).

My agent was working, but it felt incredibly sluggish.

I looked at the logs. My agent was averaging about 15 tool calls per task. Because the sandbox was hosted in the cloud, the physical path looked like this:

My Agent Runtime → Internet → Cloud Sandbox → MicroVM → Internet → My Agent Runtime

Every single exec_command was doing two round trips across the public internet. That's 30 network hops per task. The cloud provider advertised a "90ms cold start", but what was actually killing my UX was the constant RTT overhead on every tool call.

I tried falling back to the SDK's default local option (bubblewrap on Linux). It was fast, but it relies on process-level syscall filters. Running untrusted LLM-generated code directly on my host kernel just felt like a disaster waiting to happen.

Finding the middle ground: BoxLite

I wanted the hardware isolation of a cloud VM, but the zero-latency of a local process. I found BoxLite. https://github.com/boxlite-ai/boxlite

BoxLite is essentially the SQLite of sandboxing. It's an embedded microVM that uses KVM (Linux) or Hypervisor.framework (macOS) to spin up a dedicated guest kernel right on your machine.

The best part? No daemons to configure, no Docker sockets, no root access. Just a pip install:

pip install boxlite-openai-agents
Enter fullscreen mode Exit fullscreen mode

The 1-Line Swap

I didn't have to rewrite my agent logic. I just changed the client in my RunConfig:

from boxlite_openai_agents import BoxLiteSandboxClient, BoxLiteSandboxClientOptions

# ... agent setup ...

await Runner.run(
    agent,
    "Write fizzbuzz.py for n=15 and run it.",
    run_config=RunConfig(
        sandbox=SandboxRunConfig(
            client=BoxLiteSandboxClient(), # <-- Changed this line
            options=BoxLiteSandboxClientOptions(
                image="python:3.12-slim"
            ),
        ),
    ),
)
Enter fullscreen mode Exit fullscreen mode

The Result

The latency dropped immediately. Because the microVM runs in the same process as the agent runtime, the internet hops went from 30 down to zero. The communication is all microsecond-level IPC.

Plus, because it uses QCOW2 snapshots, I stopped having to re-run pip install pandas on every session. I just snapshot the VM state and resume it the next day in under a second.

If you are building coding agents on your laptop and are tired of cloud latency and timeouts, definitely give local microVMs a try. It completely changed my workflow.

Top comments (0)