Three Times
My colleague lost his VM three times. Not in a year - in the last few months we’ve been working together.
Each time it happened, we’d shrug it off. That’s what Vagrant is for, right? Just run vagrant up and you’re back in business. The development environment is code - it’s reproducible.
Except it wasn’t.
All three times, he had uncommitted work on that VM. Database migrations half-done. Configuration files tweaked just right but not yet committed. Local test data. You know, the stuff you’re “definitely going to commit tomorrow.”
Why Everything Lives on the VM
We work on Windows with VirtualBox VMs. But here’s the thing - we clone everything into the VM, not to shared folders. Why?
Linux filesystem compatibility. Case-sensitive paths. Symlinks that actually work. File permissions that make sense. Performance that doesn’t crawl to a halt when node_modules has 50,000 files.
So yeah, everything lives inside the VM. Which means when the VM dies, everything dies with it.
After the third time, I decided: data disk support isn’t a nice-to-have. It’s more important than networking, more important than fancy features. My colleague shouldn’t lose work because we didn’t have a way to separate persistent data from ephemeral VMs.
The Plan
I’d been building a Vagrant provider for WSL2 (because VirtualBox and WSL2 don’t play nice together, but that’s another story). Version 0.2.0 had snapshots and Docker support. Version 0.3.0 was going to be data disks.
The idea was simple: mount a VHD as a data disk, store your code there, blow away the VM whenever you want. The data persists.
Looked at how VirtualBox and VMware do it. Both support multiple data disks, VHD and VHDX formats. WSL2 has wsl --mount --vhd for mounting VHD files. Should be straightforward, right?
The First Problem: VHD Format
Started implementing. First disk worked - created a VHDX, mounted it, formatted it. Great.
Then I tried VHD format (for VirtualBox compatibility) and hit this:
New-VHD : A parameter cannot be found that matches parameter name 'VHDType'
Wait, what? The PowerShell docs said… oh. The format isn’t determined by a parameter. It’s determined by the file extension. .vhd vs .vhdx. The -VHDType Dynamic parameter doesn’t exist.
Fixed it. Removed the parameter, let the extension do the work. Both formats working.
The Second Problem: Provisioning Ate Everything
Got multiple disks working. Example Vagrantfile had three disks - two default, one persistent. Ran vagrant up.
Checked inside the VM:
ls -la /mnt/
Four data directories. /mnt/data1, /mnt/data2, /mnt/data3, /mnt/data4.
What? I only configured three disks.
The provisioning script was mounting every /dev/sd[b-z] device it found. Turns out WSL2 has some system disks (sda-sdd), and then our data disks start at sde. My script was finding an extra disk somewhere and mounting it.
Quick fix: change the loop to only process /dev/sd[e-z] and limit it to the expected number of disks.
EXPECTED_DISKS=3
disk_count=0
for device in /dev/sd[e-z]; do
if [$disk_count -ge $EXPECTED_DISKS]; then
break
fi
# ... rest of the mounting logic
done
Now we have exactly three disks. No more, no less.
The Third Problem: Admin Rights
Ran the tests. Green across the board when running as admin. Great.
Then tried vagrant up as a regular user. Boom:
Failed to mount VHD. Administrator privileges are required.
Of course. Both New-VHD (creating VHD files) and wsl --mount (mounting them) need admin rights. That’s… annoying.
But wait. After vagrant halt, if you run vagrant up again as a regular user, does it work?
Tested it. The VM started fine. No errors.
Looked closer. When the VM halts, the VHD files stay mounted at the host level (the /dev/sde, /dev/sdf devices are still there). When the VM starts again, those devices are already present. No need to re-mount. No need for admin rights.
Added a check: before trying to mount, count how many /dev/sd[e-z] devices are already in the distribution. If we have enough, skip the mounting step entirely.
def data_disk_already_mounted?(expected_disk_count)
result = Vagrant::Util::Subprocess.execute(
"wsl", "-d", @config.distribution_name, "--", "lsblk", "-nd", "-o", "NAME"
)
device_count = result.stdout.lines.count { |line| line.match?(/^sd[e-z]$/) }
device_count >= expected_disk_count
end
Now the workflow is:
- First
vagrant up(admin required) - creates and mounts VHDs -
vagrant halt- VM stops, VHDs stay mounted - Second
vagrant up(no admin required) - disks already there, just start the VM
Perfect for my use case. You only need admin once to set things up.
The Fourth Problem: Integration Tests
Integration tests need admin rights too. But what if someone runs the test suite without admin?
Added a check at the start of the data disk test:
$isAdmin = ([Security.Principal.WindowsPrincipal] [Security.Principal.WindowsIdentity]::GetCurrent()).IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)
if (-not $isAdmin) {
Write-Host "=== $TestName Test SKIPPED ===" -ForegroundColor Yellow
Write-Host "Reason: Administrator privileges required for data disk tests"
exit 0
}
Now the test gracefully skips instead of failing with cryptic errors.
What Actually Works Now
Version 0.3.0 ships with:
Multiple data disks per VM:
wsl.data_disk do |disk|
disk.size = 10
disk.format = 'vhdx'
end
wsl.data_disk do |disk|
disk.size = 5
disk.format = 'vhd'
end
wsl.data_disk do |disk|
disk.path = '../persistent-data.vhdx' # Custom path, survives destroy
end
Smart mounting:
- Checks if disks are already accessible before trying to mount
- Skips admin-requiring operations when possible
- Clear error messages when admin is actually needed
Persistence:
- Default disks (in
.vagrant/directory) get cleaned up onvagrant destroy - Custom-path disks survive destroy - your data is safe
- After destroy/up cycle, data on the persistent disk is intact
Testing:
- Integration tests verify all scenarios
- Graceful skip when admin rights aren’t available
- Tests cover VHD/VHDX formats, persistence, cleanup
PowerShell Integration Tests
Since this is Windows-only functionality, I wrote PowerShell-based integration tests. Each test creates a real Vagrant environment, performs operations, and verifies the results.
The data disk test verifies:
# Test 1: Creating VM with data disks
vagrant up --provider=wsl2
# Test 2: Verifying VHD files exist
# Checks for 2 default + 1 persistent disk
# Test 3-6: Mount verification and data writes
# Writing test data to each disk
# Test 7: VHD cleanup on destroy
vagrant destroy
# Default VHDs should be deleted
# Persistent VHD should survive
# Test 8: VM recreation
vagrant up
# New default disks (clean, no old data)
# Persistent disk retained data
Running the full test suite:
> rake test
Running: test_data_disk.ps1
Test 1: Creating VM with data disks
[PASS] VM created with data disks
Test 2: Verifying VHD files exist
[PASS] All VHD files created (2 default + 1 persistent)
- data-disk-0.vhdx: 516 MB (default)
- data-disk-1.vhd: 56.06 MB (default)
- test-data-disk.vhdx: 740 MB (persistent)
Test 3: Verifying data disks are mounted
[PASS] Data disks are mounted
Test 4-6: Writing data to disks
[PASS] Data written to first disk
[PASS] Data written to second disk
[PASS] Data written to persistent disk
Test 7: Testing VHD cleanup on destroy
[PASS] Default VHD files cleaned up after destroy
[PASS] Persistent VHD survived destroy
Test 8: Testing VM recreation
[PASS] New default VHD files created on up
[PASS] New default disks are clean (no old data)
[PASS] Persistent disk retained data across destroy/up cycle
=== DataDisk Test PASSED ===
Test Summary
Passed: 6
Failed: 0
OVERALL: PASSED
The tests run the actual Vagrant commands, create real VHD files, mount them in WSL2, write data, destroy the VM, and verify persistence. No mocks. Real integration testing.
Claude: Writing these tests caught several bugs before they shipped. The disk counting issue? Found it during test development. The admin rights check? Added after the test failed on a non-admin console. Integration tests are tedious to write but worth it.
The Real Win
My colleague can now put his working directory on a persistent data disk. The VM is ephemeral. The data isn’t.
VM got corrupted? vagrant destroy && vagrant up. Code is still there.
Want to try a different distro? Switch VMs, mount the same data disk. Code is still there.
Accidentally broke the system? Doesn’t matter. The stuff that matters is on a disk that survives.
This is what I should have built first. Not snapshots, not networking, not fancy features. The ability to separate “the environment” from “the work.”
Because losing code sucks. And it shouldn’t happen just because a VM died.
What’s Next
The data disk support is solid. Next up: actually figuring out networking so VMs can talk to each other. But that can wait.
Right now, my colleague’s code is safe. That’s worth more than any feature.
Code’s at github.com/LeeShan87/vagrant-wsl2-provider. Version 0.3.0 is tagged and ready. Admin rights required for setup, but after that you’re good.
Try it. Mount your code on a persistent disk. Stop worrying about losing work.
Top comments (0)