“This Should Be Quick”
Famous last words.
After implementing basic networking support and learning about WSL2’s shared network architecture, I wanted to make it work reliably across multiple distributions. The plan: add integration tests, support more distros (Debian, Fedora, AlmaLinux, Kali, openSUSE), and document the limitations properly.
What I thought would be a quick Sunday afternoon turned into a deep dive into systemd services, DNS resolution, and why background processes in SSH are surprisingly hard.
Claude: Can confirm. We went from “let’s add a test” to “why is DNS broken on Debian” to “let’s refactor everything to use systemd services” in about 4 hours.
The Multi-Distro Challenge
Ubuntu was easy - it has netplan. But what about:
- Debian - Uses systemd-networkd
- Fedora/AlmaLinux/Kali - Use NetworkManager
- openSUSE - Uses wicked
Each has its own network configuration system. My first instinct: support each one natively.
def write_netplan_config
# Ubuntu with netplan
netplan_config = <<~YAML
network:
version: 2
ethernets:
eth0:
dhcp4: true
addresses:
- #{ip}/#{prefix}
YAML
@machine.communicate.sudo("netplan apply")
end
def write_networkmanager_config
# Fedora/AlmaLinux/Kali
@machine.communicate.sudo(
"nmcli connection modify 'eth0' +ipv4.addresses #{ip}/#{prefix}"
)
@machine.communicate.sudo("nmcli connection up 'eth0'")
end
def write_systemd_networkd_config
# Debian
# ... and so on
end
Seemed reasonable, right?
The DNS Problem
Debian was the first to break.
vm2: Warning: Failed to fetch http://deb.debian.org/debian/dists/trixie/InRelease
vm2: Temporary failure resolving 'deb.debian.org'
The static IP was configured, but DNS stopped working after the provision script ran. Why?
After some debugging:
vagrant ssh vm2 -c "ip a show eth0"
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
inet 192.168.50.11/24 scope global eth0
# Where's the WSL2 DHCP IP (172.x.x.x)?
Oh. The systemd-networkd restart command wiped out the WSL2 DHCP IP, which also provides DNS resolution and the default route. No DHCP IP = no DNS = broken apt.
Claude: This is the WSL2 version of “I deleted production.” Except you’re deleting your own network stack.
Ubuntu Wasn’t Better
Tried Ubuntu with netplan:
==> vm1: Netplan configuration written with 1 static IP(s)
systemd-networkd is not running, output might be incomplete.
Failed to reload network settings: Unit dbus-org.freedesktop.network1.service not found.
Falling back to a hard restart of systemd-networkd.service
Same problem. netplan apply tries to restart systemd-networkd, which breaks WSL2’s network management.
NetworkManager: Different Problem
Fedora seemed promising - NetworkManager should handle multiple IPs gracefully, right?
Error: Connection activation failed: No suitable device found for this connection
(device eth0 not available because profile is not compatible with device
(permanent MAC address doesn't match)).
Ah. WSL2’s MAC address changes on every restart. NetworkManager stores the MAC in the connection profile and refuses to work when it doesn’t match.
Great.
The Systemd Service Revelation
At this point I had three different broken approaches for three different distros. Time to step back.
What do we actually need?
- Add static IP to eth0
- Keep WSL2’s DHCP IP intact (for DNS/routing)
- Persist across reboots
- Work on all distros
What if… we just don’t touch the native network config systems at all?
def write_systemd_static_ip_service(after_services, distro_name)
# Universal method for ALL distros
ip_commands = @static_ips.map { |ip_info|
"ip addr add #{ip_info[:ip]}/#{ip_info[:prefix]} dev eth0 || true"
}.join("\n")
service_config = <<~SERVICE
[Unit]
Description=Vagrant Static IP Configuration
After=#{after_services}
Wants=network-online.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/bash -c '#{ip_commands}'
[Install]
WantedBy=multi-user.target
SERVICE
# Write service file
@machine.communicate.sudo("mv #{service_path} /etc/systemd/system/vagrant-static-ip.service")
@machine.communicate.sudo("systemctl daemon-reload")
@machine.communicate.sudo("systemctl enable vagrant-static-ip.service")
@machine.communicate.sudo("systemctl start vagrant-static-ip.service")
end
A systemd oneshot service that:
- Runs after network is up
- Adds static IPs with
ip addr add - Uses
|| trueso it’s idempotent - Doesn’t restart anything
- Doesn’t touch WSL2’s DHCP configuration
And here’s the beautiful part - it works on every distro because they all use systemd.
The Refactor
Wait, why are we even detecting distros? The whole point of the systemd service is that it’s universal. And network-online.target works on all of them.
def write_netplan_config
# Universal solution for all distros - systemd service
# No need to detect distro or network manager
write_systemd_static_ip_service
end
That’s it. No detection. No branching. One implementation.
From ~160 lines of distro-specific code to ~45 lines of universal code. DRY for the win.
Testing: The SSH Background Process Bug
Integration test time. Need to test VM-to-VM communication. Python HTTP server seems perfect:
# Start HTTP server on vm1
vagrant ssh vm1 -c "python3 -m http.server 8080 --bind 192.168.50.10 > /dev/null 2>&1 &"
# Test from vm2
vagrant ssh vm2 -c "curl http://192.168.50.10:8080/"
Except… it doesn’t work. The & background process never starts.
Why? Because of how we encode commands:
def encode_command(command)
encoded = Base64.strict_encode64(command)
"echo '#{encoded}' | base64 -d | bash"
end
That pipe to bash is blocking. Even with & at the end of the command, the SSH session waits for bash to finish. And bash waits for the backgrounded process because… pipes.
Tried eval instead:
"eval \"$(echo '#{encoded}' | base64 -d)\""
But that breaks redirect parsing because of the double quotes.
Solution: Leave it as a known bug for now, use PowerShell jobs in the test to work around it:
$serverJob = Start-Job -ScriptBlock {
vagrant ssh vm1 -c "python3 -m http.server 8080 --bind 192.168.50.10"
}
Start-Sleep -Seconds 3
$http_result = vagrant ssh vm2 -c "curl http://192.168.50.10:8080/"
Stop-Job $serverJob
Not elegant, but it works. The SSH command encoding is a problem for another day.
Claude: Translation: “I’ll fix this later” = “This will ship as-is”
The README: Managing Expectations
After all this, I wrote probably the most important documentation - the limitations README. Because this feature works , but it has constraints:
## WSL2 Networking Limitations
⚠️ **Important:** Private network support in WSL2 is experimental.
### Shared Network Infrastructure
- All WSL2 VMs share the same virtual network switch
- Same MAC address - every VM gets the same MAC on each WSL restart
- Shared base IP - all VMs share the same WSL2 DHCP IP
- IP visibility - you may see other VMs' static IPs on a single VM
### What This Means
**VM-to-VM Communication:** ⚠️ LIMITED
- VMs share the same physical NIC and MAC address
- Ping between VMs using static IPs does not work reliably
- TCP/UDP application traffic may work if routing is configured correctly
**Windows Host Access:** ❌ LIMITED
- Use port forwarding instead
**Process Isolation:** ✅ WORKS
- Each distribution runs in its own PID namespace
Being honest about limitations is better than users discovering them the hard way.
Lessons Learned
1. Don’t Fight the Platform
My first instinct was to use each distro’s native network config system. But WSL2 isn’t a traditional VM - it has its own quirks. Fighting those quirks with netplan/NetworkManager/etc. just created more problems.
The systemd service approach works with WSL2’s design instead of against it.
2. Sometimes “Good Enough” Is the Win
Perfect VM-to-VM networking in WSL2? Not possible. The architecture doesn’t support it.
But adding static IPs that survive reboots and work across distros? That’s achievable. And for development/testing use cases, it’s useful.
3. Documentation > Implementation
The most important code I wrote today was the README explaining why things don’t work perfectly. Setting expectations upfront saves everyone frustration later.
4. Integration Tests Reveal Real Problems
Writing the test exposed the SSH background process bug, the DNS issues, and the MAC address problem. All things that wouldn’t show up in manual testing.
Even though the test needed workarounds, it was still valuable.
What’s Next
The networking feature works across 6 distributions (Ubuntu, Debian, Fedora, AlmaLinux, Kali, openSUSE). It’s documented. It has tests (mostly).
But there are TODOs:
- Fix the SSH command encoding for background processes
- Maybe explore WSL2 mirrored networking mode (Windows 11 22H2+)
- Test with more complex network scenarios
For now though, it’s good enough. Users can run multi-VM setups for testing, the limitations are clear, and the code is maintainable.
That’s a win.
Claude: Started the session thinking “quick feature add.” Ended it having refactored the entire networking implementation and written a philosophical README about WSL2’s limitations. Classic Sunday.
Try It
The multi-VM networking example is in the repo:
git clone https://github.com/LeeShan87/vagrant-wsl2-provider
cd examples/multi-vm-network
vagrant up
Check the README for the full list of limitations and workarounds. And maybe don’t expect VirtualBox-level networking - this is WSL2, after all.
Actual time spent: 4+ hours Lines of code written: ~300 Lines of code deleted: ~100 Times I questioned my life choices: Several Would I do it again: Probably
Top comments (0)