Zoltan Toma

Posted on Nov 2 • Originally published at zoltantoma.com on Nov 1

Three VM Crashes Later, I Built Data Disk Support

#vagrant #wsl2 #vhd #datapersistence

Three Times

My colleague lost his VM three times. Not in a year - in the last few months we’ve been working together.

Each time it happened, we’d shrug it off. That’s what Vagrant is for, right? Just run vagrant up and you’re back in business. The development environment is code - it’s reproducible.

Except it wasn’t.

All three times, he had uncommitted work on that VM. Database migrations half-done. Configuration files tweaked just right but not yet committed. Local test data. You know, the stuff you’re “definitely going to commit tomorrow.”

Why Everything Lives on the VM

We work on Windows with VirtualBox VMs. But here’s the thing - we clone everything into the VM, not to shared folders. Why?

Linux filesystem compatibility. Case-sensitive paths. Symlinks that actually work. File permissions that make sense. Performance that doesn’t crawl to a halt when node_modules has 50,000 files.

So yeah, everything lives inside the VM. Which means when the VM dies, everything dies with it.

After the third time, I decided: data disk support isn’t a nice-to-have. It’s more important than networking, more important than fancy features. My colleague shouldn’t lose work because we didn’t have a way to separate persistent data from ephemeral VMs.

The Plan

I’d been building a Vagrant provider for WSL2 (because VirtualBox and WSL2 don’t play nice together, but that’s another story). Version 0.2.0 had snapshots and Docker support. Version 0.3.0 was going to be data disks.

The idea was simple: mount a VHD as a data disk, store your code there, blow away the VM whenever you want. The data persists.

Looked at how VirtualBox and VMware do it. Both support multiple data disks, VHD and VHDX formats. WSL2 has wsl --mount --vhd for mounting VHD files. Should be straightforward, right?

The First Problem: VHD Format

Started implementing. First disk worked - created a VHDX, mounted it, formatted it. Great.

Then I tried VHD format (for VirtualBox compatibility) and hit this:

New-VHD : A parameter cannot be found that matches parameter name 'VHDType'

Wait, what? The PowerShell docs said… oh. The format isn’t determined by a parameter. It’s determined by the file extension. .vhd vs .vhdx. The -VHDType Dynamic parameter doesn’t exist.

Fixed it. Removed the parameter, let the extension do the work. Both formats working.

The Second Problem: Provisioning Ate Everything

Got multiple disks working. Example Vagrantfile had three disks - two default, one persistent. Ran vagrant up.

Checked inside the VM:

ls -la /mnt/

Four data directories. /mnt/data1, /mnt/data2, /mnt/data3, /mnt/data4.

What? I only configured three disks.

The provisioning script was mounting every /dev/sd[b-z] device it found. Turns out WSL2 has some system disks (sda-sdd), and then our data disks start at sde. My script was finding an extra disk somewhere and mounting it.

Quick fix: change the loop to only process /dev/sd[e-z] and limit it to the expected number of disks.

EXPECTED_DISKS=3
disk_count=0
for device in /dev/sd[e-z]; do
  if [$disk_count -ge $EXPECTED_DISKS]; then
    break
  fi
  # ... rest of the mounting logic
done

Now we have exactly three disks. No more, no less.

The Third Problem: Admin Rights

Ran the tests. Green across the board when running as admin. Great.

Then tried vagrant up as a regular user. Boom:

Failed to mount VHD. Administrator privileges are required.

Of course. Both New-VHD (creating VHD files) and wsl --mount (mounting them) need admin rights. That’s… annoying.

But wait. After vagrant halt, if you run vagrant up again as a regular user, does it work?

Tested it. The VM started fine. No errors.

Looked closer. When the VM halts, the VHD files stay mounted at the host level (the /dev/sde, /dev/sdf devices are still there). When the VM starts again, those devices are already present. No need to re-mount. No need for admin rights.

Added a check: before trying to mount, count how many /dev/sd[e-z] devices are already in the distribution. If we have enough, skip the mounting step entirely.

def data_disk_already_mounted?(expected_disk_count)
  result = Vagrant::Util::Subprocess.execute(
    "wsl", "-d", @config.distribution_name, "--", "lsblk", "-nd", "-o", "NAME"
  )
  device_count = result.stdout.lines.count { |line| line.match?(/^sd[e-z]$/) }
  device_count >= expected_disk_count
end

Now the workflow is:

First vagrant up (admin required) - creates and mounts VHDs
vagrant halt - VM stops, VHDs stay mounted
Second vagrant up (no admin required) - disks already there, just start the VM

Perfect for my use case. You only need admin once to set things up.

The Fourth Problem: Integration Tests

Integration tests need admin rights too. But what if someone runs the test suite without admin?

Added a check at the start of the data disk test:

$isAdmin = ([Security.Principal.WindowsPrincipal] [Security.Principal.WindowsIdentity]::GetCurrent()).IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)

if (-not $isAdmin) {
    Write-Host "=== $TestName Test SKIPPED ===" -ForegroundColor Yellow
    Write-Host "Reason: Administrator privileges required for data disk tests"
    exit 0
}

Now the test gracefully skips instead of failing with cryptic errors.

What Actually Works Now

Version 0.3.0 ships with:

Multiple data disks per VM:

wsl.data_disk do |disk|
  disk.size = 10
  disk.format = 'vhdx'
end

wsl.data_disk do |disk|
  disk.size = 5
  disk.format = 'vhd'
end

wsl.data_disk do |disk|
  disk.path = '../persistent-data.vhdx' # Custom path, survives destroy
end

Smart mounting:

Checks if disks are already accessible before trying to mount
Skips admin-requiring operations when possible
Clear error messages when admin is actually needed

Persistence:

Default disks (in .vagrant/ directory) get cleaned up on vagrant destroy
Custom-path disks survive destroy - your data is safe
After destroy/up cycle, data on the persistent disk is intact

Testing:

Integration tests verify all scenarios
Graceful skip when admin rights aren’t available
Tests cover VHD/VHDX formats, persistence, cleanup

PowerShell Integration Tests

Since this is Windows-only functionality, I wrote PowerShell-based integration tests. Each test creates a real Vagrant environment, performs operations, and verifies the results.

The data disk test verifies:

# Test 1: Creating VM with data disks
vagrant up --provider=wsl2

# Test 2: Verifying VHD files exist
# Checks for 2 default + 1 persistent disk

# Test 3-6: Mount verification and data writes
# Writing test data to each disk

# Test 7: VHD cleanup on destroy
vagrant destroy
# Default VHDs should be deleted
# Persistent VHD should survive

# Test 8: VM recreation
vagrant up
# New default disks (clean, no old data)
# Persistent disk retained data

Running the full test suite:

> rake test

Running: test_data_disk.ps1

Test 1: Creating VM with data disks
[PASS] VM created with data disks

Test 2: Verifying VHD files exist
[PASS] All VHD files created (2 default + 1 persistent)
  - data-disk-0.vhdx: 516 MB (default)
  - data-disk-1.vhd: 56.06 MB (default)
  - test-data-disk.vhdx: 740 MB (persistent)

Test 3: Verifying data disks are mounted
[PASS] Data disks are mounted

Test 4-6: Writing data to disks
[PASS] Data written to first disk
[PASS] Data written to second disk
[PASS] Data written to persistent disk

Test 7: Testing VHD cleanup on destroy
[PASS] Default VHD files cleaned up after destroy
[PASS] Persistent VHD survived destroy

Test 8: Testing VM recreation
[PASS] New default VHD files created on up
[PASS] New default disks are clean (no old data)
[PASS] Persistent disk retained data across destroy/up cycle

=== DataDisk Test PASSED ===

Test Summary
Passed: 6
Failed: 0

OVERALL: PASSED

The tests run the actual Vagrant commands, create real VHD files, mount them in WSL2, write data, destroy the VM, and verify persistence. No mocks. Real integration testing.

Claude: Writing these tests caught several bugs before they shipped. The disk counting issue? Found it during test development. The admin rights check? Added after the test failed on a non-admin console. Integration tests are tedious to write but worth it.

The Real Win

My colleague can now put his working directory on a persistent data disk. The VM is ephemeral. The data isn’t.

VM got corrupted? vagrant destroy && vagrant up. Code is still there.

Want to try a different distro? Switch VMs, mount the same data disk. Code is still there.

Accidentally broke the system? Doesn’t matter. The stuff that matters is on a disk that survives.

This is what I should have built first. Not snapshots, not networking, not fancy features. The ability to separate “the environment” from “the work.”

Because losing code sucks. And it shouldn’t happen just because a VM died.

What’s Next

The data disk support is solid. Next up: actually figuring out networking so VMs can talk to each other. But that can wait.

Right now, my colleague’s code is safe. That’s worth more than any feature.

Code’s at github.com/LeeShan87/vagrant-wsl2-provider. Version 0.3.0 is tagged and ready. Admin rights required for setup, but after that you’re good.

Try it. Mount your code on a persistent disk. Stop worrying about losing work.

DEV Community