Ayush Verma

Posted on Mar 22

I automated my homelab so hard it started naming servers after Tarantino characters

#devops #opentofu #ansible #homelab

This started because I was lazy.

Every time I needed a new container on my Proxmox server, I'd open the web UI, click Create CT, pick a template, type a hostname, set the specs, configure networking, hit create, wait, then SSH in and install the same packages I always install. Fifteen minutes of clicking for something I do regularly.

So one evening I wrote a main.tf to automate it. That should have been the end of it. It wasn't.

The first version was embarrassing

provider "proxmox" {
  pm_api_url = "https://192.168.2.250:8006/api2/json"
}

resource "proxmox_lxc" "test_container" {
  target_node  = "proxmox"
  hostname     = "test-lxc"
  cores        = 1
  memory       = 512
}

API URL hardcoded in source. Specs as magic numbers. Want a second container? Copy-paste the whole block. It worked, but anyone looking at this in a code review would have things to say.

Then I kept going

I split it into proper files. Moved every value into variables with defaults. Added validation so cores = 0 or disk_size = "big" would fail at plan time instead of blowing up on the Proxmox API with some cryptic error.

Then I switched to a map with for_each:

lxc_containers = {
  testbox = {}
  docker  = { cores = 4, memory = 4096, disk_size = "64G" }
}

testbox = {} gives you a container with sane defaults. Override only what's different. Adding a container is one line, not a resource block.

The Tarantino thing

I needed hostnames. Didn't want container-1, container-2. That's boring and you can never remember which is which.

So I threw 30 Tarantino characters into a list and used random_shuffle to pick one per container, plus a random_id hex suffix to avoid collisions. Now when I tofu apply, my container might come back as beatrix-a3f1b2 or django-7c9e04. My Proxmox dashboard looks ridiculous and I love it.

SSH keys without managing SSH keys

Instead of copying keys around, the config pulls them from GitHub:

data "http" "github_keys" {
  url = "https://github.com/${var.github_username}.keys"
}

Every apply fetches the latest keys. Add a new key to GitHub, run the pipeline, done. No ssh-copy-id, no Ansible vault for public keys.

Where it got hard: Ansible + DHCP

After creating containers, I wanted Ansible to configure them. Install packages, set timezone, run updates. But the containers use DHCP, so at plan time there's no IP. The output just says ip = "dhcp". Not helpful.

I wrote a script that calls the Proxmox API after boot to get the actual IP:

ip=$(curl -sk "${PM_API_URL}/nodes/${node}/lxc/${vmid}/interfaces" \
  -H "Authorization: PVEAPIToken=..." \
  | jq -r '.data[] | select(.name=="eth0") | .["ip-addresses"][]
    | select(.["ip-address-type"]=="inet") | .["ip-address"]')

Generates an Ansible inventory on the fly. No hardcoded IPs.

Sounds clean, right? It wasn't.

Everything that went wrong

Wrong field name. The script was producing ansible_host: null for every container. Ansible tried to resolve the hostname, which obviously didn't exist in DNS, and died. I spent 15 minutes curling the Proxmox API manually before I noticed: the field is ip-address, not inet-addr. Every example I'd seen online used inet-addr. One jq filter change and it worked.

Race condition I didn't think about. The Makefile ran tofu apply then immediately generated inventory. But the container had literally just booted. DHCP hadn't assigned an IP yet. The inventory came back empty, Ansible ran against zero hosts, and the whole thing "succeeded" with nothing actually provisioned. No errors. No warnings. Just silence.

I added retry logic. The script now polls the API every 5 seconds, up to a minute, waiting for all containers to have IPs before generating inventory. Not elegant, but it works every time.

Lifecycle rules that created a deadlock. I wanted to protect important containers from accidental tofu destroy. OpenTofu has prevent_destroy for this, but it has to be a literal true or false in the code. You can't pass a variable. So I ended up with two resource blocks: proxmox_lxc.protected and proxmox_lxc.unprotected, and a flag in tfvars to control which pool a container landed in.

This worked until I tried to flip the flag. Changing a container from protected to unprotected meant OpenTofu wanted to destroy the protected one and recreate it in the unprotected block. But prevent_destroy blocked the destroy. Complete deadlock. The only fix was tofu state mv to manually move resources between blocks.

I spent an hour on this before deciding it was absurd for a homelab. Deleted the whole thing. One resource block, no flags, no state gymnastics.

The Cloudflare Tunnel rabbit hole. I thought it'd be cool to SSH into my containers from anywhere. Set up Cloudflare Tunnel, created DNS records, installed cloudflared on a container. Everything looked right. Then: remote error: tls: handshake failure. Spent another hour debugging tunnel configs, ingress rules, Access policies.

Then I asked myself when I last needed to SSH into my homelab from outside my house. The answer was never. Deleted the entire Cloudflare integration. Over-engineering is a real disease.

CI that passed locally and failed in Actions. ansible-lint was green on my machine because I had community.general installed from an old project. The GitHub Actions runner didn't have it. Pipeline went red. Then lint complained my variables didn't have a role prefix. Then it complained about a missing newline at the end of a file. Three separate commits to fix linting issues. Should have run it locally first.

Where it ended up

make up

That's the whole thing. Creates containers, waits for IPs, provisions with Ansible, prints this:

============================================
Infrastructure up and provisioned
============================================

SSH into your containers:
  ssh root@192.168.2.85

make down tears it all down. make up again and you get a fresh set. Different Tarantino name, same packages, same config. Ansible is idempotent: first run changed=4, second run changed=0.

What I'd skip next time

The prevent_destroy experiment. Not worth the state complexity for a homelab.

Cloudflare Tunnel. Cool idea, but I didn't need it. Building for a problem you don't have is how features become baggage.

And I'd run the linter locally before pushing. Not after CI tells me what I already should have caught.

The repo

github.com/ayushverma8/homelab

Clone it, fill in your Proxmox credentials, make up. Let me know what Tarantino character you get.

DEV Community