A deep-dive into building a sandbox orchestrator that gives AI agents their own isolated machines. Firecracker microVMs, snapshot restore, and why 28ms matters.
tags: go, opensource, ai, devops
I've been building AI agents that generate and execute code. The agents write Python scripts, run data analysis, generate charts, process files. Standard stuff in 2026.
The problem I kept hitting: where does that code actually run?
I tried Docker. It works, but containers share the host kernel. When the runc CVEs dropped in 2024-2025 (CVE-2024-21626, then three more in 2025), I started thinking harder about what "isolation" actually means when an AI is writing arbitrary code on my machine.
I tried E2B. Great product, but my data was leaving my machine. For an internal tool processing company data, that was a non-starter.
So I built ForgeVM. A single Go binary that orchestrates isolated sandboxes. This article is about the hardest part: getting Firecracker microVMs to boot in 28ms.
What Firecracker actually is
Firecracker is AWS's microVM manager. It's what powers Lambda and Fargate. Open source, written in Rust, runs on KVM.
The key insight: Firecracker is not QEMU. QEMU emulates an entire PC with hundreds of devices. Firecracker emulates exactly 4 devices:
- virtio-block (disk)
- virtio-net (network)
- serial console
- 1-button keyboard (just to stop the VM)
That's it. No USB, no GPU, no sound card, no PCI bus. This minimal device model is why it's fast and why the attack surface is tiny.
Each Firecracker microVM gets:
- Its own Linux kernel
- Its own root filesystem
- Its own network namespace
- Communication with the host via vsock (virtio socket, not TCP)
A guest exploit can't reach the host because there's a hardware boundary (KVM) between them. Compare that to Docker where a kernel vulnerability affects every container on the host.
The cold boot problem
Here's the thing though. Booting a Firecracker microVM from scratch takes about 1 second. That includes:
- Firecracker process starts (~50ms)
- Load kernel into memory (~100ms)
- Kernel boots, init runs (~500ms)
- Guest agent starts and signals ready (~200ms)
1 second is fine for long-running workloads. It's not fine when your AI agent needs to run print(1+1) and return the result in a chat interface. Users notice 1 second of latency.
I needed sub-100ms. Ideally sub-50ms.
The snapshot trick
Firecracker supports snapshotting a running VM's complete state to disk. This includes:
- Full memory contents (the entire RAM, written to a file)
- CPU register state (instruction pointer, stack pointer, all registers)
- Device state (virtio queues, serial port state)
When you restore from a snapshot, Firecracker doesn't boot a kernel. It doesn't run init. It doesn't start your agent. It memory-maps the snapshot file, loads the CPU state, and resumes execution from exactly where it left off.
The VM doesn't know it was ever stopped. From the guest's perspective, time just skipped forward.
Here's what this looks like in practice:
# First spawn (cold boot) - ~1 second
1. Start Firecracker process
2. Boot kernel + rootfs
3. Wait for guest agent to signal ready
4. Pause the VM
5. Snapshot memory + CPU + devices to disk
6. Resume the VM, hand it to the user
# Every subsequent spawn - ~28ms
1. Copy the snapshot files (copy-on-write, nearly instant)
2. Start new Firecracker process with --restore-from-snapshot
3. VM resumes exactly where the snapshot was taken
4. Guest agent is already running, already ready
The 28ms breaks down roughly as:
- ~5ms: Firecracker process startup
- ~8ms: mmap the memory snapshot file
- ~10ms: restore CPU and device state
- ~5ms: vsock reconnection and ready signal
How I implemented it in Go
ForgeVM's Firecracker provider manages the snapshot lifecycle. Here's the simplified flow:
func (f *FirecrackerProvider) Spawn(ctx context.Context, opts SpawnOptions) (string, error) {
// Check if we have a snapshot for this image
snap := f.getSnapshot(opts.Image)
if snap != nil {
// Fast path: restore from snapshot (~28ms)
return f.restoreFromSnapshot(ctx, snap, opts)
}
// Slow path: cold boot + create snapshot (~1s)
vm, err := f.coldBoot(ctx, opts)
if err != nil {
return "", err
}
// Wait for guest agent to be ready
f.waitForAgent(ctx, vm)
// Pause VM and snapshot
f.pauseVM(ctx, vm)
f.createSnapshot(ctx, vm)
f.resumeVM(ctx, vm)
return vm.ID, nil
}
The snapshot files are per-image. First time someone spawns python:3.12, it cold-boots, snapshots, and every subsequent python:3.12 spawn restores in 28ms. Different images get different snapshots.
The copy-on-write detail
You can't share a single snapshot file across multiple running VMs because each VM writes to memory. The solution is copy-on-write:
- The base snapshot is read-only
- Each new VM gets a CoW overlay for both the memory file and the rootfs
- Writes go to the overlay, reads fall through to the base
- On destroy, delete the overlay. Base snapshot stays pristine.
This means 50 running VMs from the same snapshot share most of their memory pages. Only the pages that each VM actually wrote are unique. Memory efficient.
The guest agent
Each Firecracker VM runs a custom agent binary (forgevm-agent) as PID 1. The agent:
- Listens on vsock for commands from the host
- Executes commands via
os/exec - Handles file read/write/list/delete operations
- Streams stdout/stderr back to the host in real-time
- Uses a length-prefixed JSON protocol over the vsock connection
The protocol is simple:
[4 bytes: message length][JSON payload]
Request:
{"type": "exec", "command": "python3 /app/main.py", "workdir": "/workspace"}
Response (streamed):
{"type": "stdout", "data": "hello world\n"}
{"type": "exit", "code": 0}
vsock is important here. It's a virtio socket, not TCP/IP. The guest has no network stack visible to the host. There's no IP address, no port, no routing. Just a direct kernel-to-kernel channel. This eliminates an entire class of network-based attacks.
Why not just Docker?
I actually built a Docker provider too. ForgeVM has a provider interface, and Docker is one of the backends. Here's the honest comparison:
Docker containers:
- Boot: ~200-500ms
- Isolation: Linux namespaces + cgroups + seccomp
- Attack surface: Shared host kernel. Every syscall from the container hits the real kernel.
- KVM needed: No
- Runs on: Linux, Mac, Windows
Firecracker microVMs:
- Boot: ~28ms (snapshot) / ~1s (cold)
- Isolation: KVM hardware virtualization. Separate kernel per sandbox.
- Attack surface: Minimal VMM with 4 devices. Guest kernel is a separate kernel.
- KVM needed: Yes
- Runs on: Linux with /dev/kvm
gVisor (via Docker provider with runsc runtime):
- Boot: ~300-800ms
- Isolation: User-space kernel intercepts syscalls. ~70 host syscalls exposed.
- Attack surface: Much smaller than Docker, larger than Firecracker.
- KVM needed: No
- Runs on: Linux
In ForgeVM, you switch between these with one config change:
providers:
default: "firecracker" # or "docker"
docker:
runtime: "runc" # or "runsc" for gVisor
Same API. Same SDKs. Same pool mode. Different isolation level.
For development, I use Docker (runs on my Mac). For production, Firecracker. The application code doesn't know or care which provider is active.
Pool mode: the resource trick
This is the part I'm most proud of and it has nothing to do with Firecracker specifically.
Traditional sandbox tools: 1 user = 1 VM (or container). If you have 100 concurrent users, you need 100 VMs. At 512MB each, that's 50GB of RAM just for sandboxes.
ForgeVM's pool mode: 1 VM serves up to N users. Each user gets a logical "sandbox" with its own workspace directory (/workspace/{sandbox-id}/). The orchestrator:
- Routes all exec calls to the shared VM but sets
WorkDirto the user's workspace - Rewrites all file paths through
scopedPath()to prevent directory traversal - Tracks user count per VM and creates new VMs when capacity is full
- Destroys VMs only when all users have left
// scopedPath prevents user A from accessing user B's workspace
func scopedPath(vmID, sandboxID, path string) string {
if vmID == "" {
return path // 1:1 mode, no scoping
}
base := "/workspace/" + sandboxID
cleaned := filepath.Clean(filepath.Join(base, path))
if !strings.HasPrefix(cleaned, base+"/") && cleaned != base {
return base // traversal attempt, return base
}
return cleaned
}
100 users, 20 VMs instead of 100. 60% less infrastructure.
The security trade-off is real: pool mode gives you directory-level isolation, not kernel-level. Users in the same VM share a kernel. For internal tools where you trust the users but want to isolate the AI-generated code from the host, this is fine. For multi-tenant public platforms, you'd want the optional per-user UID and PID namespace hardening on top.
Numbers
Some benchmarks from my development machine (AMD Ryzen 7, 32GB RAM, NVMe SSD):
| Operation | Time |
|---|---|
| Firecracker cold boot | ~1.1s |
| Firecracker snapshot restore | ~28ms |
| Docker container start (alpine) | ~180ms |
| Docker container start (python:3.12) | ~450ms |
| Exec "echo hello" (Firecracker) | ~3ms |
| Exec "echo hello" (Docker) | ~8ms |
| Exec "python3 -c 'print(1)'" (Firecracker) | ~45ms |
| File write 1MB (Firecracker, vsock) | ~12ms |
| File write 1MB (Docker, tar copy) | ~25ms |
| Sandbox destroy (Firecracker) | ~15ms |
| Sandbox destroy (Docker) | ~50ms |
The Firecracker exec latency is lower because vsock is a direct kernel channel, while Docker exec creates a new exec instance and attaches via the Docker daemon.
What I'd do differently
Start with Docker, not Firecracker. I built the Firecracker provider first because I was excited about 28ms boots. But 80% of people trying ForgeVM don't have KVM available (Mac users, CI/CD, cloud VMs without nested virt). The Docker provider should have been day one.
The guest agent protocol should have been gRPC, not custom JSON. The length-prefixed JSON protocol works fine but I'm essentially maintaining a custom RPC framework. gRPC over vsock would have given me streaming, error codes, and code generation for free.
Pool mode security should have been built-in from the start. The directory-level isolation works, but per-user UIDs and PID namespace isolation should be default-on, not optional. I'm retrofitting this now.
Try it
git clone https://github.com/DohaerisAI/forgevm && cd forgevm
./scripts/setup.sh
./forgevm serve
from forgevm import Client
client = Client("http://localhost:7423")
with client.spawn(image="python:3.12") as sb:
result = sb.exec("print('hello from a 28ms sandbox')")
print(result.stdout)
MIT licensed. Single binary. No telemetry. No cloud.
GitHub: github.com/DohaerisAI/forgevm
If you made it this far and found this useful, a star on GitHub genuinely helps with discoverability. Happy to answer questions in the comments about the Firecracker internals, the provider architecture, or the pool mode design.

Top comments (0)