The Testing Journey So Far
When I started this Vagrant WSL2 Provider project, I knew unit tests would be a waste of tokens. My requirements change daily. AI pair programming with Claude means rapid iteration, constant refactoring, and features that evolve as I discover what WSL2 can and can’t do.
So I skipped unit tests entirely and went straight to integration tests. Simple PowerShell scripts that do the real thing:
vagrant up --provider=wsl2
if ($LASTEXITCODE -ne 0) { throw "failed" }
Write-Host "[PASS] vagrant up succeeded" -ForegroundColor Green
It worked. It was fast to write. And for a while, it was enough.
When the Custom Framework Started Hurting
The project grew. We shipped v0.3.0 with snapshot support and data disks. Then came networking support, and that’s where things got interesting (read: broken).
I started noticing weird behavior:
-
vagrant statusalways showed “running” even aftervagrant halt - SSH commands with background processes didn’t work
- The
&character invagrant ssh -c "python3 -m http.server &"just… disappeared
The networking feature needed these to work. Start a web server on one VM, connect to it from another. Basic stuff. Except it didn’t work.
And I had no tests for it.
The Real Problem
I could write more custom PowerShell test scripts. That wasn’t the issue. The issue was:
- Token waste - Claude and I were spending time debugging test output formatting instead of fixing bugs
- No structure - Every test script had its own cleanup logic, error handling, and output style
- Missing coverage - We had tests for features that worked, not for bugs we needed to fix
-
Manual everything - Want better output? Write more
Write-Hostcalls
I started thinking: “What if this custom testing framework became an open-source project?”
Then I realized: Nobody would use it. Because Pester already exists.
Claude: Can confirm. We literally googled “PowerShell testing framework” and found Pester in about 10 seconds.
Why Pester, Why Now
Pester is the standard PowerShell testing framework. It’s:
- Built into Windows 10/11 (though we needed to upgrade to 5.x)
- BDD-style syntax (
Describe,Context,It) - Proper assertions (
Should -Be,Should -Match) - Automatic setup/teardown (
BeforeAll,AfterAll) - Better reporting (colored output, timing, clear pass/fail)
More importantly: I’m about to write tests that document bugs. Not tests that pass. Tests that fail, on purpose, to show exactly what’s broken.
For that, I need a real framework.
The Migration
The old way (119 lines):
try {
Push-Location $ExampleDir
vagrant destroy -f 2>$null
vagrant up --provider=wsl2
if ($LASTEXITCODE -ne 0) {
throw "vagrant up failed with exit code $LASTEXITCODE"
}
Write-Host "[PASS] vagrant up succeeded" -ForegroundColor Green
# ... more tests ...
exit 0
} catch {
Write-Host "=== Test FAILED ===" -ForegroundColor Red
vagrant destroy -f 2>$null
exit 1
} finally {
Pop-Location
}
The new way (67 lines):
BeforeAll {
$script:ExampleDir = Join-Path $PSScriptRoot "..\..\examples\basic"
Push-Location $script:ExampleDir
vagrant destroy -f 2>$null | Out-Null
}
AfterAll {
vagrant destroy -f 2>$null | Out-Null
Pop-Location
}
Describe "Vagrant WSL2 Provider - Basic Operations" {
Context "When creating a new VM" {
It "Should successfully run 'vagrant up --provider=wsl2'" {
vagrant up --provider=wsl2
$LASTEXITCODE | Should -Be 0
}
}
}
Cleaner. More structure. Automatic cleanup. And when something fails, I get this:
[-] Should show correct state after halt (not running) 6.27s (6.27s|1ms)
Expected regular expression 'running' to not match
'Current machine states:
default running (wsl2)
The WSL2 distribution is running', but it did match.
That’s way better than parsing my custom [PASS] / [FAIL] output.
Documenting Bugs with Tests
Here’s the important part. I didn’t just migrate existing tests. I added new ones that fail on purpose :
Context "When managing VM state" {
It "Should successfully halt the VM" {
vagrant halt
$LASTEXITCODE | Should -Be 0
}
It "Should show correct state after halt (not running)" {
Start-Sleep -Seconds 2
$status = (vagrant status) -join "`n"
# After halt, should NOT show "running"
$status | Should -Not -Match "running"
}
}
This test fails. As it should. Because vagrant halt doesn’t actually change the status output. That’s the bug.
Another one:
It "Should start a background process (Python web server)" {
$output = vagrant ssh -c "python3 -m http.server 8888 > /dev/null 2>&1 &" 2>&1
$LASTEXITCODE | Should -Be 0
Start-Sleep -Seconds 2
$psOutput = vagrant ssh -c "ps aux | grep 'http.server' | grep -v grep" 2>&1
$psOutput | Should -Match "http.server"
}
Also fails. The background process doesn’t start. SSH command escaping is broken.
Test Results
After the migration:
Tests Passed: 11, Failed: 3
- 11 passing tests - The features that work
- 3 failing tests - The bugs I need to fix
That’s the state of the project right now. Green for confidence, red for work to do.
Why This Matters
I’m not writing tests for test coverage. I’m writing tests because I don’t want to pull my hair out debugging regressions.
When someone opens a PR, I want them to run rake test_pester and see what they broke (or fixed). When I refactor the SSH action, I want to know immediately if I broke ansible provisioner support.
Unit tests wouldn’t catch that. Integration tests do.
When This Approach Works
This testing strategy isn’t universal. It works specifically because:
The context is right:
- Hobbyist/MVP project with changing requirements
- AI-assisted development (not wasting tokens on test boilerplate)
- Open source (contributors need to understand what’s safe to change)
The motivation is right:
- Not “we need tests because best practices say so”
- But “I found bugs, I need to document them, and I don’t want to break working features while fixing them”
The migration is pragmatic:
- Not “throw away all tests and rewrite”
- But “keep legacy tests running, migrate gradually, remove old ones when confident”
This is the opposite of cargo cult programming. No testing pyramids, no TDD dogma, no “you must have 80% coverage” rules. Just: what does this project actually need right now?
And right now, it needs:
- Confidence that basic features work (11 green tests)
- Documentation of known bugs (3 red tests)
- A framework that won’t waste our time when we add more tests
Pester gives us that. A custom framework wouldn’t scale. Unit tests would miss the real issues.
The Setup
Getting Pester working was surprisingly smooth:
# Rakefile
desc "Ensure Pester 5.x is installed (>= 5.0, < 6.0)"
task :ensure_pester do
pester_check = <<~POWERSHELL
$pester = Get-Module -ListAvailable -Name Pester |
Where-Object { $_.Version -ge '5.0' -and $_.Version -lt '6.0' } |
Select-Object -First 1
if (-not $pester) {
Install-PackageProvider -Name NuGet -MinimumVersion 2.8.5.201 -Force
Install-Module -Name Pester -RequiredVersion 5.7.1 -Force -Scope CurrentUser
}
POWERSHELL
sh "powershell -Command \"#{pester_check.gsub("\n", "; ")}\""
end
Now rake test_pester automatically checks for Pester 5.x and installs it if needed. Version locked to avoid the 6.x alpha. One less thing to think about.
What’s Next
Now comes the fun part: fixing the bugs.
The vagrant halt status issue is probably in the state tracking logic. The SSH background process issue is definitely in how we escape shell commands. Both have tests that document exactly what should happen.
This is reverse TDD. Write the tests after you find the bugs, then fix the code until the tests pass.
Not ideal, but way better than:
- Fixing a bug
- Manually testing it
- Breaking it again three commits later
- Discovering it broke when trying to demo the feature
Been there. Done that. Got the T-shirt.
Try It Yourself
The Vagrant WSL2 Provider is on GitHub. The Pester tests are in test/integration/. If you want to see a real-world example of integration testing a Vagrant provider, check it out:
git clone https://github.com/LeeShan87/vagrant-wsl2-provider.git
cd vagrant-wsl2-provider
rake ensure_pester
rake test_pester
You’ll see the same 11 passing, 3 failing tests I see. And when I fix those bugs, you’ll see 14 passing tests.
That’s the plan, anyway.
Claude: Bold of you to assume the fixes won’t reveal more bugs.
Fair point.
Top comments (0)