CatButtes

Posted on Jan 17, 2021 • Edited on Jan 22, 2021 • Originally published at buttes.dev

Getting Vagrant and Ansible working on Windows 10

#ansible #vagrant #devops

One of the big things I have started using both at work and at home is Ansible. It is an automated deployment tool that makes it really easy to create reproducable machine builds. This means I have needed to figure out a way to test the playbooks I am using to deploy and upgrade servers. Enter Vagrant.

At home, getting this working has been easy. My home machine is a macbook and everything just works on there. I can run ansible and vagrant natively and all is happy. When it comes to work though, that is a different matter. Like many professional developers I use Windows 10 at work. Where I have been using Ansible I have set up an environment using the Windows Subsytem for Linux (WSL) (specifically WSL2 - this is important later). I figured I could reuse this setup for vagrant and I was partly right.

Round 1 - HyperV doesn't want to listen

My first attempt to get things working was met with failure. I installed vagrant on both Windows and in the WSL setup (required to for Windows). I was using the Vagrantfile below

Vagrant.configure("2") do |config|
  config.vm.define "machine_test" do |machine|
    machine.vm.box = "bento/debian-10.6"
    machine.vm.hostname = "machine"

    machine.vm.provision "shell", inline: "mkdir /data"

    machine.vm.provision "ansible", run: "always" do |ansible|
      ansible.playbook = "main.yml"
    end
  end
end

And when it attempted to run the ansible provisioner it would fail to connect with the error Warning: Connection Timeout. Retrying.... It took a while but I eventually turned up that Vagrant doesn't know how to configure HyperV virtual machines networking properly. As a result, the host couldn't access the VM to continue the setup.

Round 2 - VirtualBox doesn't want to listen

I then installed VirtualBox, which Vagrant is able to configure networking on. This one was met with the same failure. It took me a while but I eventually realised that WSL2 is significantly different to WSL1 - which has implications for networked apps.

WSL2 runs in a lightweight VM - compared to WSL1, which runs directly in the Windows 10 space. This means that they have had to put in a bit of a tweak to get networking to work kinda how you would expect. 127.0.0.1 points to the WSL2 VM while there is an entry in /etc/hosts to point localhost at 127.0.1.1 - which is then routed through to the host machine. When Vagrant tries to connect to a VM, it uses 127.0.0.1 by default, which fails in the WSL2 VM as the Guest VM is actually living on Windows and therefore is actually on 127.0.1.1.

For some reason, pointing directly at 127.0.1.1 also failed with the same error. The upshot here seemed to be that I needed to downgrade from WSL2 to WSL1...

Round 3 - Ubuntu doesn't want to run

Once the Ubuntu WSL install had been downgraded to WSL1 (using wsl --set-version Ubuntu 1) it was time to try again. This time Vagrant just refused to run completely, complaining that fuse was missing. After a bit of messing around trying to get that fixed, I tried installing a debian WSL1 install and Vagrant ran.

While Vagrant was running, it was still having an issue. Attempts to run vagrant up would result in the error __connect_nonblock': Operation already in progress - connect(2) for 127.0.0.1:2222 (Errno::EALREADY). More googling gave me the answer - it is a bug in vagrant 2.2.10. A quick downgrade to 2.2.9 and it all worked!

Round 4 - SSH is looking out for us

Finally, vagrant is able to set up a VM for us, and the shell provisioner works. Now we are starting to get errors from Ansible - this is progress and we are nearly there! The error we are getting is WARNING: UNPROTECTED PRIVATE KEY FILE! when it tries to use SSH to connect to the VM.

This error is because the Vagrantfile is on a windows drive which is mounted into the WSL1 area. This means that all files in there have 777 permisions set, including the SSH private key. This causes SSH to refuse to accept it, instead throwing the error we are seeing. The quickest fix for this is just swapping out private keys for passwords. This means we need to add in at the Vm level and the Ansible level. Our new vagrantfile looks like this

Vagrant.configure("2") do |config|
  config.vm.define "machine_test" do |machine|
    machine.vm.box = "bento/debian-10.6"
    machine.vm.hostname = "machine"

    machine.ssh.password = "vagrant"
    machine.ssh.insert_key = "false"

    machine.vm.provision "shell", inline: "mkdir /data"

    machine.vm.provision "ansible", run: "always" do |ansible|

      ansible.playbook = "main.yml"
      ansible.extra_vars = {
        ansible_connection: "ssh",
        ansible_user: "vagrant",
        ansible_ssh_pass: "vagrant",
      }
    end
  end
end

Running with this and it all works. We have a vagrant installation, on Windows that is able to use the Ansible provisioner to stand up and tear down VMs.

The final setup

In the end, our setup looks like this:

Windows 10 with WSL installed
Debian installed as a WSL1 environment (other distros will probably work, I just like debian based ones)
Ansible installed in the WSL1 environment
Vagrant 2.2.9 installed in the WSL1 environment
Vagrantfile configured to use passwords instead of private keys

With this setup, we are able to use the same vagrantfile on both windows and Mac. Although it isn't perfect (using passwords in place of keys), it is good enough for my uses where the boxes will always be temporary and protected by a network firewall.