DEV Community

Ajay Kumar
Ajay Kumar

Posted on

I built a sandbox that boots an AI agent VM in ~300ms — here's how

If you've ever built an AI agent that runs code, you've hit the same wall I did:
how do you run untrusted LLM-generated code safely, without it taking forever?

I tried Docker. Shared kernel — felt too risky for arbitrary code execution. I tried full VMs. Safe, but 5–10 second cold starts killed the UX.

So I built Sandflare — it uses Firecracker microVMs to launch isolated sandboxes in ~1-2s. Tweaked it further and its ~300ms now. Sandflare psql also launches in milliseconds. How wonderful is that.

How we get to ~300ms

The trick is snapshot + restore with userfaultfd (UFFD).

  1. Boot a VM once, fully configured
  2. Take a memory snapshot
  3. On every new sandbox request, restore from that snapshot
  4. Memory pages fault in on-demand — the VM is responsive before it's fully loaded into RAM

This is the same technique AWS uses internally. The result: consistent sub-400ms cold starts.

What Sandflare actually does

Beyond fast boots, I added the things I kept needing for agent workloads:

Run code and stream output:

from sandflare import Sandbox

sb = Sandbox.create("agent", size="nano")

for event in sb.stream("python3 analyse.py"):
    if event.event == "stdout":
        print(event.line)
Enter fullscreen mode Exit fullscreen mode

Wire a Postgres database in one call:

from sandflare import Sandbox, Database

db = Database.create("memory", password="Secret#42")
sb = Sandbox.create("agent")
sb.wire(db.name)

# DATABASE_URL is now injected into the sandbox env
# your agent can just connect normally
Enter fullscreen mode Exit fullscreen mode

Upload files, download results:

sb.upload(csv_bytes, "/home/agent/data.csv")
chart = sb.download("/home/agent/chart.png")
Enter fullscreen mode Exit fullscreen mode

Real benchmark numbers

Running from a GCP worker in the same region:

run 1: 363ms
run 2: 292ms
run 3: 307ms
run 4: 303ms
run 5: 345ms

min:  292ms  |  mean: 322ms  |  p50: 307ms
Enter fullscreen mode Exit fullscreen mode

Who is this for?

  • Coding agents — give Claude/GPT a real sandbox to write and run code
  • AI data analysis — upload a CSV, run pandas, download the chart
  • CI/CD pipelines — clean isolated environment per run
  • Multi-agent pipelines — multiple sandboxes sharing one Postgres DB

Try it

Free tier: 10 sandboxes, no credit card.

👉 sandflare.io

📖 docs.sandflare.io


Would love to hear how you're handling code execution in your agent projects — and if anyone has pushed Firecracker cold starts below 200ms, I'm all ears!

Top comments (0)