Adwitiya Trivedi

Posted on Mar 16

How I built sandboxes that boot in 28ms using Firecracker snapshots

#go #agents #linux #ai

A deep-dive into building a sandbox orchestrator that gives AI agents their own isolated machines. Firecracker microVMs, snapshot restore, and why 28ms matters.
tags: go, opensource, ai, devops

I've been building AI agents that generate and execute code. The agents write Python scripts, run data analysis, generate charts, process files. Standard stuff in 2026.

The problem I kept hitting: where does that code actually run?

I tried Docker. It works, but containers share the host kernel. When the runc CVEs dropped in 2024-2025 (CVE-2024-21626, then three more in 2025), I started thinking harder about what "isolation" actually means when an AI is writing arbitrary code on my machine.

I tried E2B. Great product, but my data was leaving my machine. For an internal tool processing company data, that was a non-starter.

So I built ForgeVM. A single Go binary that orchestrates isolated sandboxes. This article is about the hardest part: getting Firecracker microVMs to boot in 28ms.

What Firecracker actually is

Firecracker is AWS's microVM manager. It's what powers Lambda and Fargate. Open source, written in Rust, runs on KVM.

The key insight: Firecracker is not QEMU. QEMU emulates an entire PC with hundreds of devices. Firecracker emulates exactly 4 devices:

virtio-block (disk)
virtio-net (network)
serial console
1-button keyboard (just to stop the VM)

That's it. No USB, no GPU, no sound card, no PCI bus. This minimal device model is why it's fast and why the attack surface is tiny.

Each Firecracker microVM gets:

Its own Linux kernel
Its own root filesystem
Its own network namespace
Communication with the host via vsock (virtio socket, not TCP)

A guest exploit can't reach the host because there's a hardware boundary (KVM) between them. Compare that to Docker where a kernel vulnerability affects every container on the host.

The cold boot problem

Here's the thing though. Booting a Firecracker microVM from scratch takes about 1 second. That includes:

Firecracker process starts (~50ms)
Load kernel into memory (~100ms)
Kernel boots, init runs (~500ms)
Guest agent starts and signals ready (~200ms)

1 second is fine for long-running workloads. It's not fine when your AI agent needs to run print(1+1) and return the result in a chat interface. Users notice 1 second of latency.

I needed sub-100ms. Ideally sub-50ms.

The snapshot trick

Firecracker supports snapshotting a running VM's complete state to disk. This includes:

Full memory contents (the entire RAM, written to a file)
CPU register state (instruction pointer, stack pointer, all registers)
Device state (virtio queues, serial port state)

When you restore from a snapshot, Firecracker doesn't boot a kernel. It doesn't run init. It doesn't start your agent. It memory-maps the snapshot file, loads the CPU state, and resumes execution from exactly where it left off.

The VM doesn't know it was ever stopped. From the guest's perspective, time just skipped forward.

Here's what this looks like in practice:

# First spawn (cold boot) - ~1 second
1. Start Firecracker process
2. Boot kernel + rootfs
3. Wait for guest agent to signal ready
4. Pause the VM
5. Snapshot memory + CPU + devices to disk
6. Resume the VM, hand it to the user

# Every subsequent spawn - ~28ms
1. Copy the snapshot files (copy-on-write, nearly instant)
2. Start new Firecracker process with --restore-from-snapshot
3. VM resumes exactly where the snapshot was taken
4. Guest agent is already running, already ready

The 28ms breaks down roughly as:

~5ms: Firecracker process startup
~8ms: mmap the memory snapshot file
~10ms: restore CPU and device state
~5ms: vsock reconnection and ready signal

How I implemented it in Go

ForgeVM's Firecracker provider manages the snapshot lifecycle. Here's the simplified flow:

func (f *FirecrackerProvider) Spawn(ctx context.Context, opts SpawnOptions) (string, error) {
    // Check if we have a snapshot for this image
    snap := f.getSnapshot(opts.Image)

    if snap != nil {
        // Fast path: restore from snapshot (~28ms)
        return f.restoreFromSnapshot(ctx, snap, opts)
    }

    // Slow path: cold boot + create snapshot (~1s)
    vm, err := f.coldBoot(ctx, opts)
    if err != nil {
        return "", err
    }

    // Wait for guest agent to be ready
    f.waitForAgent(ctx, vm)

    // Pause VM and snapshot
    f.pauseVM(ctx, vm)
    f.createSnapshot(ctx, vm)
    f.resumeVM(ctx, vm)

    return vm.ID, nil
}

The snapshot files are per-image. First time someone spawns python:3.12, it cold-boots, snapshots, and every subsequent python:3.12 spawn restores in 28ms. Different images get different snapshots.

The copy-on-write detail

You can't share a single snapshot file across multiple running VMs because each VM writes to memory. The solution is copy-on-write:

The base snapshot is read-only
Each new VM gets a CoW overlay for both the memory file and the rootfs
Writes go to the overlay, reads fall through to the base
On destroy, delete the overlay. Base snapshot stays pristine.

This means 50 running VMs from the same snapshot share most of their memory pages. Only the pages that each VM actually wrote are unique. Memory efficient.

The guest agent

Each Firecracker VM runs a custom agent binary (forgevm-agent) as PID 1. The agent:

Listens on vsock for commands from the host
Executes commands via os/exec
Handles file read/write/list/delete operations
Streams stdout/stderr back to the host in real-time
Uses a length-prefixed JSON protocol over the vsock connection

The protocol is simple:

[4 bytes: message length][JSON payload]

Request:

{"type": "exec", "command": "python3 /app/main.py", "workdir": "/workspace"}

Response (streamed):

{"type": "stdout", "data": "hello world\n"}
{"type": "exit", "code": 0}

vsock is important here. It's a virtio socket, not TCP/IP. The guest has no network stack visible to the host. There's no IP address, no port, no routing. Just a direct kernel-to-kernel channel. This eliminates an entire class of network-based attacks.

Why not just Docker?

I actually built a Docker provider too. ForgeVM has a provider interface, and Docker is one of the backends. Here's the honest comparison:

Docker containers:

Boot: ~200-500ms
Isolation: Linux namespaces + cgroups + seccomp
Attack surface: Shared host kernel. Every syscall from the container hits the real kernel.
KVM needed: No
Runs on: Linux, Mac, Windows

Firecracker microVMs:

Boot: ~28ms (snapshot) / ~1s (cold)
Isolation: KVM hardware virtualization. Separate kernel per sandbox.
Attack surface: Minimal VMM with 4 devices. Guest kernel is a separate kernel.
KVM needed: Yes
Runs on: Linux with /dev/kvm

gVisor (via Docker provider with runsc runtime):

Boot: ~300-800ms
Isolation: User-space kernel intercepts syscalls. ~70 host syscalls exposed.
Attack surface: Much smaller than Docker, larger than Firecracker.
KVM needed: No
Runs on: Linux

In ForgeVM, you switch between these with one config change:

providers:
  default: "firecracker"  # or "docker"
  docker:
    runtime: "runc"        # or "runsc" for gVisor

Same API. Same SDKs. Same pool mode. Different isolation level.

For development, I use Docker (runs on my Mac). For production, Firecracker. The application code doesn't know or care which provider is active.

Pool mode: the resource trick

This is the part I'm most proud of and it has nothing to do with Firecracker specifically.

Traditional sandbox tools: 1 user = 1 VM (or container). If you have 100 concurrent users, you need 100 VMs. At 512MB each, that's 50GB of RAM just for sandboxes.

ForgeVM's pool mode: 1 VM serves up to N users. Each user gets a logical "sandbox" with its own workspace directory (/workspace/{sandbox-id}/). The orchestrator:

Routes all exec calls to the shared VM but sets WorkDir to the user's workspace
Rewrites all file paths through scopedPath() to prevent directory traversal
Tracks user count per VM and creates new VMs when capacity is full
Destroys VMs only when all users have left

// scopedPath prevents user A from accessing user B's workspace
func scopedPath(vmID, sandboxID, path string) string {
    if vmID == "" {
        return path  // 1:1 mode, no scoping
    }
    base := "/workspace/" + sandboxID
    cleaned := filepath.Clean(filepath.Join(base, path))
    if !strings.HasPrefix(cleaned, base+"/") && cleaned != base {
        return base  // traversal attempt, return base
    }
    return cleaned
}

100 users, 20 VMs instead of 100. 60% less infrastructure.

The security trade-off is real: pool mode gives you directory-level isolation, not kernel-level. Users in the same VM share a kernel. For internal tools where you trust the users but want to isolate the AI-generated code from the host, this is fine. For multi-tenant public platforms, you'd want the optional per-user UID and PID namespace hardening on top.

Numbers

Some benchmarks from my development machine (AMD Ryzen 7, 32GB RAM, NVMe SSD):

Operation	Time
Firecracker cold boot	~1.1s
Firecracker snapshot restore	~28ms
Docker container start (alpine)	~180ms
Docker container start (python:3.12)	~450ms
Exec "echo hello" (Firecracker)	~3ms
Exec "echo hello" (Docker)	~8ms
Exec "python3 -c 'print(1)'" (Firecracker)	~45ms
File write 1MB (Firecracker, vsock)	~12ms
File write 1MB (Docker, tar copy)	~25ms
Sandbox destroy (Firecracker)	~15ms
Sandbox destroy (Docker)	~50ms

The Firecracker exec latency is lower because vsock is a direct kernel channel, while Docker exec creates a new exec instance and attaches via the Docker daemon.

What I'd do differently

Start with Docker, not Firecracker. I built the Firecracker provider first because I was excited about 28ms boots. But 80% of people trying ForgeVM don't have KVM available (Mac users, CI/CD, cloud VMs without nested virt). The Docker provider should have been day one.

The guest agent protocol should have been gRPC, not custom JSON. The length-prefixed JSON protocol works fine but I'm essentially maintaining a custom RPC framework. gRPC over vsock would have given me streaming, error codes, and code generation for free.

Pool mode security should have been built-in from the start. The directory-level isolation works, but per-user UIDs and PID namespace isolation should be default-on, not optional. I'm retrofitting this now.

Try it

git clone https://github.com/DohaerisAI/forgevm && cd forgevm
./scripts/setup.sh
./forgevm serve

from forgevm import Client

client = Client("http://localhost:7423")
with client.spawn(image="python:3.12") as sb:
    result = sb.exec("print('hello from a 28ms sandbox')")
    print(result.stdout)

MIT licensed. Single binary. No telemetry. No cloud.

GitHub: github.com/DohaerisAI/forgevm

If you made it this far and found this useful, a star on GitHub genuinely helps with discoverability. Happy to answer questions in the comments about the Firecracker internals, the provider architecture, or the pool mode design.

Top comments (2)

klement Gunndu • Mar 17

The 4-device-only emulation model is the real insight here. We've been running agent code in containers and the kernel-sharing risk keeps coming up — 28ms snapshot restore makes microVMs actually viable for per-request isolation.

Daniel Dennis • Apr 13

This is such a cool project. I will definitely contribute to it in any way I can. I have also learned so much from the article.