MicroVM vs Docker for AI agents: I gave one sudo and broke the other

#ai #agents #devops #hosting

Last week I ran a small experiment that I should have run a year ago.

Same agent code. Same model. Same task list: install three Python packages from a CSV, fetch a few APIs, write JSON to disk, run a long-running scheduled job. Two isolation modes. One was a Docker container with the agent process inside, mounted volume, the usual. The other was a Firecracker microVM running a slim Linux image with the agent on top. Both got sudo inside their sandbox. I let them run for seven days each, then rotated.

I went in expecting the difference to be small. Memory overhead, maybe boot time. The actual difference was bigger than that.

Day one and two: Docker

Setup was the part everyone has done before. docker run with a few mounts, the agent gets a shell inside, away we go. The agent is told to apt-get install a couple of system libraries it has decided it needs. That works. It writes a 40 MB cache file to /tmp. That works. It runs a long job that opens a few hundred sockets to a public API.

Around hour eighteen the host machine's dmesg started printing about memory pressure. Not from the agent itself. From a different container running on the same host. The Python process inside my agent's container had a retry loop that would not stop holding file descriptors. That was the leak. Linux does not look at which container a process lives in when it picks something to OOM-kill. It just picks. The neighbor went down.

This is the part of Docker that no production person likes to talk about. Containers share the host kernel. They share the host scheduler. When one container goes off the rails, the rest of them feel it on the same host. If you're a small shop running one agent on one host, fine. None of this matters yet. For anything that looks like a tenant model, it stops being fine fast.

The other thing I noticed on day two: the agent decided to chmod 777 a folder it didn't own. Not malicious. Just a Python script doing what Python scripts do when permissions throw an error. With sudo available inside the container, it succeeded. The host filesystem was untouched (because mounted volumes have their own boundary), but anything inside that container was now wide open to whatever the agent did next.

Day three: rebuild as a MicroVM

I tore down the Docker setup and rebuilt the same agent inside a Firecracker microVM. Same code, same packages, same task list. Boot time went from about 200 ms (Docker) to about 700 ms (microVM). Memory baseline went up by roughly 60 MB for the kernel itself.

That is the cost. You pay it once.

What you get is a separate kernel. The agent's sudo is a real Linux sudo inside a real kernel that nobody else on the host shares. When the agent ran the same chmod 777 thing, it still happened, but the blast radius was a single VM that I could destroy and recreate in under a second. When the agent leaked file descriptors, only the VM's per-process limits got hit. The host kernel didn't notice.

Day four I let the agent install a kernel module on purpose. In Docker this would be a host-level event (or blocked outright, depending on capabilities). In the microVM it loaded into the VM's own kernel and stayed there. Whether that is useful or annoying is up to you. Either way, it stayed inside the VM.

Day five through seven: chaos

I wrote a script that told the agent to do five terrible things in a row, the kind of thing a real long-running agent might trip into over a week:

Open 10,000 file descriptors and never close them
Allocate memory in a tight loop until something breaks
Spawn 200 child processes
Write a 5 GB file to its own filesystem
Make a sudo iptables rule that drops all incoming packets

Docker handled (1) and (2) with cgroup limits if you remember to set them. Most people don't. The default is no limit. By the time I noticed (1) on the third run, the host had hit its global file descriptor ceiling and SSH started refusing connections.

The microVM handled (1) through (4) inside its own boundary. The kernel inside the VM hit its limits, the VM started thrashing, and then I killed it from outside. The host machine never noticed any of it. (5) was the most interesting. The agent set the iptables rule and locked itself out of network. In Docker the agent had affected the container's network namespace, recoverable but messy. In the microVM the agent had only touched the VM's network stack. I rebooted the VM in 800 ms and was back.

Snapshots are where the gap really opened. Firecracker can snapshot a running VM to disk and resume it later. I snapshotted the agent mid-task on day six, killed the host, restored the snapshot on a different machine, and the agent picked up its loop one second later without knowing anything had happened. Try that with a Docker container and you will spend the afternoon learning about CRIU and giving up.

The link to actually running this in production

Doing this experiment locally is one thing. Running an agent like this for a paying customer, on hardware you have to keep alive for 30+ days at a stretch, is a different problem. The boring infrastructure problem nobody writes about: it isn't the isolation primitive that's hard. Anyone can spin up Firecracker. The hard part is babysitting a hundred of these things at once, snapshotting them every so often, recovering them when a host dies, and not losing the agent's state in the meantime.

I'll plug the thing I work on once and move on. The Builder Sandbox tier is a managed wrapper around exactly this microVM-with-sudo model, with the snapshot and recovery loop already wired up. If you don't want to babysit it, that's the option. If you want to babysit it yourself, Firecracker is open source and the docs are fine.

What I'd tell past me

Running one tiny agent on a host you own? Docker is fine. The overhead is real, the boundary is real enough for that scope, and you already know the tools.

The moment you have an agent that needs sudo, runs for more than a few days, and might do something weird at 3am, switch to a real VM. The 60 MB and the 500 ms of extra boot time will pay for themselves the first time the agent does something stupid. The snapshot story alone is worth the migration.

The thing I didn't expect, going in, was how much my mental model changed. With Docker I treat the container as a thing the agent lives in. With a microVM I treat the VM as a thing the agent is. That shift, more than any individual feature, is what made the seven-day test feel different on day three than it did on day one.

A few specifics, in case you try this

The microVM rebuild was Firecracker 1.5 with a vanilla Ubuntu 22.04 rootfs, 2 vCPU, 1 GB RAM, virtio-net for the network. Boot times stayed under a second consistently once I trimmed the kernel config. I used jailer to drop privileges on the Firecracker process itself, and seccomp filters on the agent's user inside the VM. None of that is exotic. The Firecracker docs cover all of it. The only thing I had to figure out the hard way was the snapshot directory layout, which the docs assume you already understand.

For Docker the comparison build was the standard python:3.12-slim base with the agent process as the entrypoint, a tmpfs mount for /tmp, and --cap-drop=ALL plus only the capabilities the agent actually needed. Even with that, the chmod-777 case still worked inside the container because sudo plus CAP_FOWNER is enough for filesystem-mode changes. You can lock this down further with seccomp profiles, but at that point you have built a worse VM with extra steps.

If you want the longer cost breakdown of running this yourself versus paying someone to keep it alive, I wrote that up here: https://rapidclaw.dev/blog/openclaw-hosting-cost-self-host-vs-managed.