vlad

Posted on Feb 24

I Spent $350 on an SSD to Fix a Configuration Problem (ESXi VM Migration Deep Dive)

#vmware #esxi #homelab #sysadmin

I needed to migrate VMs off an ESXi host. Standard approach: mount an NFS datastore, use vmkfstools to copy VMDKs across.

Expected speed: ~100 MB/s.

Actual speed: 3–5 MB/s.

My conclusion: "The HDD is dying."

So I did what any reasonable person would do: ordered a 4 TB SSD for $350, cloned the entire VMFS datastore onto it, and prepared to write a guide about how SSDs are essential for ESXi migration.

Spoiler: I'm an idiot.

📊 Interactive benchmark charts — full matrix, sync penalty, IOPS data
🐙 Source — GitHub (raw data coming later)

The Aha Moment (That Took Too Long)

After getting the SSD and seeing 98 MB/s, I benchmarked the original HDD one more time for comparison.

Result: 82 MB/s.

Same HDD. Same VM. Same everything. Except now it was only 20% slower than the brand new $350 SSD.

I hadn't fixed the hardware. I'd accidentally fixed the configuration — and had no idea what I'd changed.

The Real Culprit: NFS Sync Mode

After two days of bisecting changes, I found it.

# /etc/exports — the line that cost me $350
/mnt/raid/esxi-import 192.168.0.0/24(rw,sync,no_root_squash,no_subtree_check)
#                                        ^^^^
#                                        THIS

sync mode forces the NFS server to confirm every write is physically on disk before acknowledging it to the client. vmkfstools issues sequential writes and waits for each acknowledgement before sending the next chunk. On spinning rust, each fsync takes ~10ms. Do the math: 100 writes/sec × ~50KB = 5 MB/s. Exactly what I was seeing.

The fix:

# /etc/exports
/mnt/raid/esxi-import 192.168.0.0/24(rw,async,no_root_squash,no_subtree_check)

# Apply — DON'T FORGET THIS STEP
exportfs -ra

# Verify it actually loaded
exportfs -v | grep async

That last exportfs -ra is important. I've seen people edit the file and wonder why nothing changed.

The Numbers (16-Run Matrix)

Once I understood the real variable, I ran a proper benchmark — 16 combinations of source × destination × NIC speed × MTU. Same VM throughout: 32 GB provisioned, ~19 GB allocated (thin/sparse).

Source	Dest	NIC	MTU	MB/s
HDD	HDD	1G	1500	82.4
HDD	HDD	1G	9000	85.7
HDD	NVMe	1G	1500	82.9
HDD	NVMe	1G	9000	85.8
HDD	HDD	25G	1500	110.2
HDD	HDD	25G	9000	114.5
HDD	NVMe	25G	1500	122.9
HDD	NVMe	25G	9000	127.4
SSD	HDD	1G	1500	98.1
SSD	HDD	1G	9000	102.6
SSD	NVMe	1G	1500	98.6
SSD	NVMe	1G	9000	103.0
SSD	HDD	25G	1500	155.1
SSD	HDD	25G	9000	157.4
SSD	NVMe	25G	1500	285.6
SSD	NVMe	25G	9000	289.7

All runs with NFS async. Sync numbers below.

The Sync Penalty — Measured Properly

Setup	Async	Sync	Multiplier
HDD→HDD 1G	82.4 MB/s	6.5 MB/s	12.7×
HDD→HDD 25G	110.2 MB/s	6.5 MB/s	16.9×
SSD→NVMe 25G	289.7 MB/s	12.3 MB/s	23.5×

The faster the stack, the worse sync destroys it. On 25G with NVMe, sync costs you 23.5× throughput. This is not a minor tuning knob.

The Bottleneck Hierarchy

Each layer proven by isolating variables:

1. NFS async mode     — 13-24× improvement, always, no exceptions
2. Source drive       — 1.2× on 1G, up to 2.3× on 25G+NVMe  
3. Network speed      — 1.3× with HDD, up to 2.9× with SSD+NVMe
4. Destination drive  — ~0 on 1G, +84% on 25G+SSD
5. MTU 9000           — consistent +3-4 MB/s absolute, everywhere

The hierarchy is strict. Fix layer N before upgrading layer N+1.

NVMe destination on 1G = +0.4 MB/s.

NVMe destination on 25G with SSD source = +130 MB/s.

Same hardware. Completely different result.

Step-by-Step: The Correct Way

NFS Server Setup

# /etc/exports
/mnt/raid/esxi-import 192.168.0.0/24(rw,async,no_root_squash,no_subtree_check)

exportfs -ra
exportfs -v | grep async   # verify
systemctl enable --now nfs-server

ESXi NFS Mount

esxcli storage nfs add \
  -H 192.168.0.105 \
  -s /mnt/raid/esxi-import \
  -v migration_target

esxcli storage nfs list   # verify

Migrate

vmkfstools -i /vmfs/volumes/source/vm/disk.vmdk \
  -d thin \
  /vmfs/volumes/migration_target/vm/disk.vmdk

Validate

# Hash on ESXi while source is still mounted
md5sum /vmfs/volumes/source/vm/disk-flat.vmdk > /vmfs/volumes/migration_target/disk.md5

# Unmount source, then verify cold on Linux
md5sum -c disk.md5

The Economics

Option	Cost	Speed gain
Fix NFS async	$0	13-24×
MTU 9000	$0	+3-4 MB/s
Upgrade to 25G NIC	~$150 used	+30-150% (stack dependent)
SSD staging	$80-350	+20% on 1G, +84% on 25G

Fix the config before buying hardware. Every time.

Lessons Learned

Configuration mistakes cost more than hardware — $350 SSD, fixed by editing one word in a config file
Measure before you buy — 5 minutes of benchmarking would have saved the SSD purchase entirely
The checklist exists for a reason — exportfs -v | grep async takes 2 seconds
Upgrades multiply, they don't add — NVMe destination means nothing without the network to feed it

Conclusion

If you take away one thing:

NFS sync mode will ruin your day, your week, and your migration project.

Enable async. Reload the config. Verify it loaded. Test it works. Then worry about whether you need an SSD.

Want the Full Data?

This article covers the fundamentals. The Advanced Deep Dive ($9) goes further:

Multi-target NFS fan-out (8 simultaneous copies, throughput plateau analysis)
tmpfs as migration target — eliminating destination storage from the critical path entirely
2026 hardware update: dual 25G + iSCSI over NVMe RAID0
Parallelism scaling data and where it stops helping
Raw NVMe baseline (concurrent dd readers, queue depth analysis)
Interactive benchmark charts — full matrix, sync penalty, IOPS (free)
Source — raw data coming later

The destination storage is almost never the real constraint. The advanced article explains what is — and what to do about it.

get in touch: vlad [at] cipeople [dot] com — this is a solved problem.

All numbers from real hardware, real workloads. NFS async throughout unless explicitly marked sync. February 2026.

DEV Community