DEV Community

Zoltan Toma
Zoltan Toma

Posted on • Originally published at zoltantoma.com on

Why I Chose Pester Over Custom Test Scripts (And Why Now)

The Testing Journey So Far

When I started this Vagrant WSL2 Provider project, I knew unit tests would be a waste of tokens. My requirements change daily. AI pair programming with Claude means rapid iteration, constant refactoring, and features that evolve as I discover what WSL2 can and can’t do.

So I skipped unit tests entirely and went straight to integration tests. Simple PowerShell scripts that do the real thing:

vagrant up --provider=wsl2
if ($LASTEXITCODE -ne 0) { throw "failed" }
Write-Host "[PASS] vagrant up succeeded" -ForegroundColor Green

Enter fullscreen mode Exit fullscreen mode

It worked. It was fast to write. And for a while, it was enough.

When the Custom Framework Started Hurting

The project grew. We shipped v0.3.0 with snapshot support and data disks. Then came networking support, and that’s where things got interesting (read: broken).

I started noticing weird behavior:

  • vagrant status always showed “running” even after vagrant halt
  • SSH commands with background processes didn’t work
  • The & character in vagrant ssh -c "python3 -m http.server &" just… disappeared

The networking feature needed these to work. Start a web server on one VM, connect to it from another. Basic stuff. Except it didn’t work.

And I had no tests for it.

The Real Problem

I could write more custom PowerShell test scripts. That wasn’t the issue. The issue was:

  1. Token waste - Claude and I were spending time debugging test output formatting instead of fixing bugs
  2. No structure - Every test script had its own cleanup logic, error handling, and output style
  3. Missing coverage - We had tests for features that worked, not for bugs we needed to fix
  4. Manual everything - Want better output? Write more Write-Host calls

I started thinking: “What if this custom testing framework became an open-source project?”

Then I realized: Nobody would use it. Because Pester already exists.

Claude: Can confirm. We literally googled “PowerShell testing framework” and found Pester in about 10 seconds.

Why Pester, Why Now

Pester is the standard PowerShell testing framework. It’s:

  • Built into Windows 10/11 (though we needed to upgrade to 5.x)
  • BDD-style syntax (Describe, Context, It)
  • Proper assertions (Should -Be, Should -Match)
  • Automatic setup/teardown (BeforeAll, AfterAll)
  • Better reporting (colored output, timing, clear pass/fail)

More importantly: I’m about to write tests that document bugs. Not tests that pass. Tests that fail, on purpose, to show exactly what’s broken.

For that, I need a real framework.

The Migration

The old way (119 lines):

try {
    Push-Location $ExampleDir
    vagrant destroy -f 2>$null

    vagrant up --provider=wsl2
    if ($LASTEXITCODE -ne 0) {
        throw "vagrant up failed with exit code $LASTEXITCODE"
    }
    Write-Host "[PASS] vagrant up succeeded" -ForegroundColor Green

    # ... more tests ...

    exit 0
} catch {
    Write-Host "=== Test FAILED ===" -ForegroundColor Red
    vagrant destroy -f 2>$null
    exit 1
} finally {
    Pop-Location
}

Enter fullscreen mode Exit fullscreen mode

The new way (67 lines):

BeforeAll {
    $script:ExampleDir = Join-Path $PSScriptRoot "..\..\examples\basic"
    Push-Location $script:ExampleDir
    vagrant destroy -f 2>$null | Out-Null
}

AfterAll {
    vagrant destroy -f 2>$null | Out-Null
    Pop-Location
}

Describe "Vagrant WSL2 Provider - Basic Operations" {
    Context "When creating a new VM" {
        It "Should successfully run 'vagrant up --provider=wsl2'" {
            vagrant up --provider=wsl2
            $LASTEXITCODE | Should -Be 0
        }
    }
}

Enter fullscreen mode Exit fullscreen mode

Cleaner. More structure. Automatic cleanup. And when something fails, I get this:

[-] Should show correct state after halt (not running) 6.27s (6.27s|1ms)
    Expected regular expression 'running' to not match
    'Current machine states:

    default running (wsl2)

    The WSL2 distribution is running', but it did match.

Enter fullscreen mode Exit fullscreen mode

That’s way better than parsing my custom [PASS] / [FAIL] output.

Documenting Bugs with Tests

Here’s the important part. I didn’t just migrate existing tests. I added new ones that fail on purpose :

Context "When managing VM state" {
    It "Should successfully halt the VM" {
        vagrant halt
        $LASTEXITCODE | Should -Be 0
    }

    It "Should show correct state after halt (not running)" {
        Start-Sleep -Seconds 2
        $status = (vagrant status) -join "`n"

        # After halt, should NOT show "running"
        $status | Should -Not -Match "running"
    }
}

Enter fullscreen mode Exit fullscreen mode

This test fails. As it should. Because vagrant halt doesn’t actually change the status output. That’s the bug.

Another one:

It "Should start a background process (Python web server)" {
    $output = vagrant ssh -c "python3 -m http.server 8888 > /dev/null 2>&1 &" 2>&1
    $LASTEXITCODE | Should -Be 0

    Start-Sleep -Seconds 2

    $psOutput = vagrant ssh -c "ps aux | grep 'http.server' | grep -v grep" 2>&1
    $psOutput | Should -Match "http.server"
}

Enter fullscreen mode Exit fullscreen mode

Also fails. The background process doesn’t start. SSH command escaping is broken.

Test Results

After the migration:

Tests Passed: 11, Failed: 3

Enter fullscreen mode Exit fullscreen mode
  • 11 passing tests - The features that work
  • 3 failing tests - The bugs I need to fix

That’s the state of the project right now. Green for confidence, red for work to do.

Why This Matters

I’m not writing tests for test coverage. I’m writing tests because I don’t want to pull my hair out debugging regressions.

When someone opens a PR, I want them to run rake test_pester and see what they broke (or fixed). When I refactor the SSH action, I want to know immediately if I broke ansible provisioner support.

Unit tests wouldn’t catch that. Integration tests do.

When This Approach Works

This testing strategy isn’t universal. It works specifically because:

The context is right:

  • Hobbyist/MVP project with changing requirements
  • AI-assisted development (not wasting tokens on test boilerplate)
  • Open source (contributors need to understand what’s safe to change)

The motivation is right:

  • Not “we need tests because best practices say so”
  • But “I found bugs, I need to document them, and I don’t want to break working features while fixing them”

The migration is pragmatic:

  • Not “throw away all tests and rewrite”
  • But “keep legacy tests running, migrate gradually, remove old ones when confident”

This is the opposite of cargo cult programming. No testing pyramids, no TDD dogma, no “you must have 80% coverage” rules. Just: what does this project actually need right now?

And right now, it needs:

  1. Confidence that basic features work (11 green tests)
  2. Documentation of known bugs (3 red tests)
  3. A framework that won’t waste our time when we add more tests

Pester gives us that. A custom framework wouldn’t scale. Unit tests would miss the real issues.

The Setup

Getting Pester working was surprisingly smooth:

# Rakefile
desc "Ensure Pester 5.x is installed (>= 5.0, < 6.0)"
task :ensure_pester do
  pester_check = <<~POWERSHELL
    $pester = Get-Module -ListAvailable -Name Pester |
      Where-Object { $_.Version -ge '5.0' -and $_.Version -lt '6.0' } |
      Select-Object -First 1
    if (-not $pester) {
      Install-PackageProvider -Name NuGet -MinimumVersion 2.8.5.201 -Force
      Install-Module -Name Pester -RequiredVersion 5.7.1 -Force -Scope CurrentUser
    }
  POWERSHELL
  sh "powershell -Command \"#{pester_check.gsub("\n", "; ")}\""
end

Enter fullscreen mode Exit fullscreen mode

Now rake test_pester automatically checks for Pester 5.x and installs it if needed. Version locked to avoid the 6.x alpha. One less thing to think about.

What’s Next

Now comes the fun part: fixing the bugs.

The vagrant halt status issue is probably in the state tracking logic. The SSH background process issue is definitely in how we escape shell commands. Both have tests that document exactly what should happen.

This is reverse TDD. Write the tests after you find the bugs, then fix the code until the tests pass.

Not ideal, but way better than:

  1. Fixing a bug
  2. Manually testing it
  3. Breaking it again three commits later
  4. Discovering it broke when trying to demo the feature

Been there. Done that. Got the T-shirt.

Try It Yourself

The Vagrant WSL2 Provider is on GitHub. The Pester tests are in test/integration/. If you want to see a real-world example of integration testing a Vagrant provider, check it out:

git clone https://github.com/LeeShan87/vagrant-wsl2-provider.git
cd vagrant-wsl2-provider
rake ensure_pester
rake test_pester

Enter fullscreen mode Exit fullscreen mode

You’ll see the same 11 passing, 3 failing tests I see. And when I fix those bugs, you’ll see 14 passing tests.

That’s the plan, anyway.

Claude: Bold of you to assume the fixes won’t reveal more bugs.

Fair point.

Top comments (0)