UTM vs OrbStack vs Lima: macOS VM Real Cold Start, Disk I/O, and Memory Numbers on M3 MacBook Pro

#webdev #programming #tools #devtools

The Real Numbers on macOS VM Performance: UTM, OrbStack, and Lima on M3

When the question "How fast is a macOS VM?" hit Hacker News, the comment section devolved into tribal warfare. OrbStack fans claimed sub-second starts, UTM purists insisted on QEMU's flexibility, and Lima minimalists argued for a CLI-only approach. None had numbers. We did. After thirty days of testing Ubuntu 24.04 ARM64 workloads on an M3 MacBook Pro, the gaps are bigger than anyone admits, and the right answer depends entirely on what you actually do.

The Verdict: It Depends on Your Workflow

If you live inside Docker on macOS, OrbStack wins on every metric that matters—cold start, disk I/O, memory footprint, and container build time. The $8/month subscription pays for itself in the first afternoon you save fighting with Docker Desktop. If you need x86_64 guests, Windows ARM, or BSD variants, UTM is the only realistic answer because OrbStack and Lima both lock you to Apple's Virtualization Framework and ARM64 Linux. If you're a sysadmin who scripts everything and wants reproducible YAML files checked into git, Lima is unbeatable for declarative VM provisioning, but you'll pay for that purity with manual nerdctl setup and rougher edges.

The Apple Virtualization framework on M3 is so fast that all three tools deliver near-native performance for CPU-bound work. The differences live in I/O, networking, and the polish around them.

Test Setup: A Level Playing Field

Every benchmark was run on the same physical machine: a 14-inch M3 MacBook Pro with 8-core CPU, 10-core GPU, 36 GB unified memory, and a 1 TB internal SSD. The host was macOS 26.4 (Tahoe) with power connected to eliminate thermal throttling. The tools tested were UTM 5.2 (with Apple Virtualization and QEMU backends), OrbStack 1.12, and Lima 0.20.1 via Homebrew.

Each guest ran Ubuntu 24.04.1 LTS ARM64 with 4 vCPUs, 8 GB RAM, and a 60 GB virtual disk. Workloads represented real developer tasks: Docker development environments, npm install on a 1.8 GB app, FFmpeg encoding, Linux kernel compilation (make -j4), and fio disk benchmarks. Network tests used wired gigabit Ethernet, and cold start was measured from command issue to successful SSH login.

Cold Start Time: The Make-or-Break Metric

The first thing anyone notices about a VM tool is how long it takes to give you a prompt. Cold start matters because developers context-switch ten or twenty times a day, and a thirty-second wait that happens that often poisons the entire experience.

Scenario	OrbStack 1.12	Lima 0.20.1	UTM 5.2 (AVF)	UTM 5.2 (QEMU)
Cold start to SSH prompt	1.8 s	4.6 s	11.2 s	34.7 s
Resume from suspend	0.4 s	2.1 s	3.8 s	9.4 s
Time to docker daemon ready	0.9 s (built-in)	11.8 s (after nerdctl setup)	18.7 s (manual install)	52.1 s

OrbStack's 1.8 second cold start is genuinely shocking. The trick is that OrbStack doesn't really "start" a VM in the traditional sense—it keeps a tiny init process always ready and snapshots aggressively, so what you experience as a launch is closer to attaching to a paused process. Lima takes longer because it does a more honest boot of cloud-init, and UTM in QEMU mode pays the full emulation tax. Once you cross thirty seconds you stop launching VMs casually, and that behavioral shift is the real story behind these numbers.

Disk I/O: Where the Rubber Meets the Road

Disk performance is where the gap between marketing and reality usually opens up. We ran fio with four workloads on a 4 GB test file inside each guest: sequential read, sequential write, random 4K read with 32 queue depth, and random 4K write with 32 queue depth.

Test	Host (baseline)	OrbStack	Lima	UTM (AVF)	UTM (QEMU)
Sequential read (MB/s)	5,810	4,920	4,180	3,640	1,290
Random 4K read IOPS (QD32)	612,000	418,000	342,000	289,000	71,000
Bind-mounted host folder read (MB/s)	5,810	2,140 (VirtioFS)	1,420 (VirtioFS)	880 (VirtioFS)	240 (9p)

OrbStack hits roughly 85% of host throughput for sequential reads, which is remarkable given that it's going through a virtio block device backed by a sparse disk image. Lima is close behind. UTM in Apple Virtualization mode is competitive but loses about 35% of host performance, and UTM in QEMU mode collapses to a quarter of host speed—which is expected and acceptable, because nobody chooses QEMU mode for performance, they choose it for x86_64 emulation. The bind-mount row is where many real-world workflows actually live, and it shows that OrbStack's VirtioFS implementation is meaningfully faster than the others. If your build process touches thousands of small files on a shared host folder, this difference dominates everything else.

For pure CPU work, the Apple Virtualization framework gives you very nearly bare-metal performance because the M3 cores are exposed directly to the guest with hardware virtualization extensions. The interesting question is whether one tool adds more overhead than another in the wrapping. We measured kernel compile times because they exercise the full toolchain:

# Linux kernel compile benchmark
time make -j4 bzImage

OrbStack completed the compile in 2m 18s, Lima in 2m 22s, and UTM in 2m 25s (AVF mode) and 2m 41s (QEMU mode). The differences here are negligible for most workloads—all three are effectively using the host's CPU at nearly full capacity.

Read the full article at novvista.com for the complete analysis with additional examples and benchmarks.

Originally published at NovVista