Zoltan Toma

Posted on Nov 2 • Originally published at zoltantoma.com on Oct 5

Snapshots and Testing: Building Real Tests for a Vagrant Provider

#vagrant #wsl2 #testing #snapshots

The Testing Problem Nobody Talks About

After getting Docker support working, I wanted to add snapshot functionality to the vagrant-wsl2-provider. But there was a nagging problem I’d been avoiding: how do you actually test a Vagrant provider plugin?

The “proper” way would be to write Ruby unit tests with RSpec, mock all of Vagrant’s internals, and test each component in isolation. But here’s the thing - Vagrant is a massive gem with heavy dependencies. Just getting the test environment set up requires pulling in the entire Vagrant gem, which then requires native extensions, specific Ruby versions, and a whole dependency chain that’s… painful.

I tried. I really did. Created a spec/ directory, added RSpec, started writing tests. Got errors about missing Vagrant classes. Added Vagrant as a dev dependency. Got compilation errors for native extensions. Spent an hour on dependency hell before I stopped and asked myself: what am I actually trying to test here?

Integration Tests: The Pragmatic Choice

I don’t care if my Driver class can be instantiated in isolation. I care if vagrant snapshot save actually works when a user runs it. I care if vagrant ssh -c "command" returns output. I care if vagrant up creates a working WSL2 distribution.

So I made a decision that probably horrifies some people: PowerShell-based integration tests.

# test/integration/test_basic.ps1
vagrant up --provider=wsl2
if ($LASTEXITCODE -eq 0) {
    Write-Host "[PASS] vagrant up succeeded" -ForegroundColor Green
} else {
    throw "vagrant up failed"
}

Dead simple. No mocks. No stubs. Just run the actual command and check if it works.

Why This Makes Sense

Tests real behavior - What users actually experience
No mocking infrastructure - No dependency on Vagrant’s internal APIs
Fast to write - PowerShell is native to Windows where WSL2 runs
Easy to debug - When a test fails, you can literally copy-paste the command
No Ruby dependency hell - Just PowerShell and Vagrant installed

The test suite structure is straightforward:

test/integration/
├── test_basic.ps1 # vagrant up, status, ssh, destroy
├── test_snapshot.ps1 # snapshot lifecycle
└── run_all_tests.ps1 # runs everything

Run it with rake test. That’s it.

The Snapshot Implementation

WSL2 already has everything we need for snapshots: wsl --export and wsl --import. The challenge was wrapping this in Vagrant’s provider capability system.

Driver Methods

The Driver class handles the actual WSL2 operations:

def save_snapshot(snapshot_name)
  snapshot_file = snapshot_path(snapshot_name)

  @machine.ui.info "Saving snapshot: #{snapshot_name}"
  execute("wsl", "--export", @config.distribution_name, snapshot_file)
  @machine.ui.success "Snapshot saved: #{snapshot_name}"
end

def restore_snapshot(snapshot_name)
  snapshot_file = snapshot_path(snapshot_name)

  # Unregister current distribution
  halt if state == :running
  execute("wsl", "--unregister", @config.distribution_name)

  # Import the snapshot
  dist_dir = distribution_path
  execute("wsl", "--import", @config.distribution_name,
          dist_dir, snapshot_file, "--version", @config.version.to_s)
end

Snapshots are just tar files stored in .vagrant/machines/{name}/wsl2/snapshots/. Nothing fancy, which is exactly what we want.

Provider Capabilities

Vagrant expects providers to register capabilities for snapshot operations:

# lib/vagrant-wsl2-provider/plugin.rb
provider_capability "wsl2", "snapshot_list" do
  require_relative "cap/snapshot_list"
  Cap::SnapshotList
end

provider_capability "wsl2", "snapshot_save" do
  require_relative "cap/snapshot_save"
  Cap::SnapshotSave
end

Each capability is just a thin wrapper that calls the driver:

# lib/vagrant-wsl2-provider/cap/snapshot_save.rb
module VagrantPlugins
  module WSL2
    module Cap
      class SnapshotSave
        def self.snapshot_save(machine, snapshot_name)
          driver = machine.provider.instance_variable_get(:@driver)
          driver.save_snapshot(snapshot_name)
        end
      end
    end
  end
end

This gives us full Vagrant snapshot support:

vagrant snapshot save before-experiment
vagrant snapshot restore before-experiment
vagrant snapshot list
vagrant snapshot delete old-snapshot

# Bonus: push/pop work automatically!
vagrant snapshot push
vagrant snapshot pop

The `vagrant ssh -c` Rabbit Hole

While testing snapshots, I noticed vagrant ssh -c "echo test" would execute successfully (exit code 0) but show no output. This was… not great.

The problem was that Vagrant’s built-in SSHRun action wasn’t compatible with the WSL2 communicator. It expected SSH-style communication, but we use direct WSL command execution.

Custom SSHRun Action

The solution was a custom action that properly streams output:

# lib/vagrant-wsl2-provider/action/ssh_run.rb
def call(env)
  command = env[:ssh_run_command]

  if command
    exit_status = env[:machine].communicate.execute(command, error_check: false) do |type, data|
      case type
      when :stdout
        $stdout.print(data)
        $stdout.flush
      when :stderr
        $stderr.print(data)
        $stderr.flush
      end
    end

    env[:ssh_run_exit_status] = exit_status
  end

  @app.call(env)
end

But this required updating the communicator to accept a block parameter:

# lib/vagrant-wsl2-provider/communicator.rb
def execute(command, opts = {}, &block)
  result = Vagrant::Util::Subprocess.execute(
    "wsl", "-d", distribution_name, "-u", "vagrant", "--",
    "bash", "-l", "-c", encoded_command,
    :notify => [:stdin, :stdout, :stderr]
  ) do |type, data|
    # Call the block if provided
    block.call(type, data) if block_given?

    # Default output handling
    puts data if type == :stdout && !block_given?
  end
end

Now vagrant ssh -c works as expected:

$ vagrant ssh -c "uname -a"
Linux vagrant-wsl2-basic 5.15.133.1-microsoft-standard-WSL2 ...

$ vagrant ssh -c "docker ps"
CONTAINER ID IMAGE COMMAND ...

Perfect for scripting!

WSL2 Output Encoding Hell

One weird issue that bit me during testing: WSL commands on Windows return text with null bytes (\0) scattered throughout. This breaks PowerShell’s string matching.

# This doesn't work
$wslList = wsl -l -v
if ($wslList -match "vagrant-wsl2-basic") { # Never matches!

The fix is to strip null bytes:

# This works
$wslList = (wsl -l -v | Out-String) -replace '\0', ''
if ($wslList -match "vagrant-wsl2-basic") { # Works!

This is now documented in the test template so future tests don’t hit this.

Test Infrastructure as Documentation

The test/ directory felt wrong. These weren’t just tests - they were working examples. So I renamed it to examples/:

examples/
├── basic/ # Minimal Vagrantfile
├── snapshot/ # Snapshot demo
├── provisioners/ # Shell/file/ansible examples
├── docker-test/ # Docker with systemd
└── test-distros/ # Various Linux distributions

Each directory has a working Vagrantfile that serves three purposes:

User documentation - “Here’s how to use feature X”
Integration test fixture - Tests use these examples
Manual testing - Quick vagrant up to try something

The integration tests just reference these examples:

$ExampleDir = Join-Path $PSScriptRoot "..\..\examples\snapshot"
Push-Location $ExampleDir

vagrant up --provider=wsl2
vagrant snapshot save test-snapshot
vagrant snapshot restore test-snapshot

No duplication. The docs and tests stay in sync automatically.

Test Results

The test suite validates:

Basic lifecycle: up, status, ssh-config, destroy
WSL distribution creation and cleanup
SSH command execution with output
Snapshot save/restore/list/delete
Snapshot push/pop (auto-generated names)

Current status:

========================================
Test Summary
========================================
Passed: 2
Failed: 0

OVERALL: PASSED

Not a huge test suite, but it covers the core workflows users actually care about.

Lessons Learned

1. Integration tests > Unit tests for infrastructure tools

When you’re wrapping external commands (WSL, Docker, systemd), integration tests give you more confidence. Mock-heavy unit tests just test your mocks.

2. Use the platform’s native tools

PowerShell on Windows is fine. Don’t fight it by trying to force Ruby/RSpec everywhere.

3. Examples should be runnable

If your documentation includes code, make it actual working code that’s tested. “Docs that lie” is worse than no docs.

4. Block parameters are powerful

Ruby blocks for streaming callbacks are elegant. Don’t be afraid to use them.

5. Exit codes matter

Always check $LASTEXITCODE in PowerShell. A command can “succeed” but do nothing.

What’s Next

With snapshots and testing infrastructure in place, v0.2.0 is almost ready. The last major feature is WSL mount support for VHD/VHDX data disks, but that’s for the next iteration.

For now, I’m happy that:

Snapshots work reliably
vagrant ssh -c returns output
Tests actually test real behavior
The code is properly documented

Not bad for a plugin nobody asked for that I’m building because I’m stuck on Windows. 😄

Try It Out

The snapshot support is now merged into main:

git clone https://github.com/LeeShan87/vagrant-wsl2-provider.git
cd vagrant-wsl2-provider
rake install_local

cd examples/snapshot
vagrant up --provider=wsl2
vagrant snapshot save clean
# ... experiment ...
vagrant snapshot restore clean

Run the tests:

rake test # All tests
rake test_basic # Just basic functionality
rake test_snapshot # Just snapshot tests

Questions? Open an issue on GitHub. I’m always curious how people are (or aren’t) using this thing.

DEV Community