Hamdi (KHELIL) LION

Posted on Jan 20

Building a Kubernetes homelab the hard but right way 🧱☸️

#devops #kubernetes #containers #cloud

If you have been following my previous articles, you already know the drill 😄
I like setups that are:

boring
repeatable
close to real production

This article is not about spinning up a quick cluster.
It is about building a platform.

A platform you can:

rebuild from scratch
scale without fear
explain to a client or a teammate
reuse in real consulting missions

And yes, it runs at home 👀

The mindset first 🧠

Before touching any tool, I decided on one rule:

👉 no manual action that I cannot reproduce with code

That single rule drives everything else.

So the stack becomes very natural:

Proxmox for virtualization
Terraform to create machines
Cloud Init to bootstrap the OS
Kubespray to install Kubernetes
Helmfile for everything after day 1

Each layer has one job only.
No overlap. No shortcuts.

The big picture 🗺️

Here is what actually happens when I type terraform apply 👇

Proxmox clones VMs from a cloud init template
Each VM boots with the right user, SSH keys and network
Nodes are reachable immediately
Kubespray turns them into a real HA Kubernetes cluster
Helmfile deploys platform components on top

Once you understand that flow, debugging becomes easy and scaling becomes boring.
And boring is good 😌

Step 1 Proxmox as a real compute layer ⚙️

I treat Proxmox like a private cloud, not like a lab UI.

No clicking. No guessing.

Terraform talks directly to the Proxmox API and creates Kubernetes nodes exactly the same way every time.

What I am doing here

cloning from a golden cloud init template
injecting user data via snippets
assigning static IPs
tagging nodes for clarity
keeping compute concerns separate from Kubernetes

At this stage, Kubernetes does not exist yet.
And that is exactly what I want 👍

Step 2 The cluster is just data 📐

This is one of my favorite parts.

The cluster topology lives in a tfvars file.
No logic. No magic. Just data.

This is extremely important because it means:

the same code can create one cluster or ten
topology changes do not require refactoring
environments stay consistent over time

Adding a node or a whole new cluster is just editing data 🔁

Step 3 Reusable Terraform modules 🧩

Everything goes through a compute module.

Terraform is responsible for:

machines
networking
bootstrapping access

And nothing else.

Once the VMs exist, Terraform is basically done.

This clear boundary avoids a lot of confusion later.

Step 4 Kubespray does the heavy lifting ☸️

Kubespray is where Kubernetes actually comes to life.

It handles:

HA control plane
stacked etcd
container runtime
CNI and kubelet configuration
sane defaults and hardening

This is not a quick kubeadm script.
This is a production grade installer that I fully trust.

Step 5 Bootstrap only the essentials 🚦

At cluster creation time, I only enable what is strictly required:

metrics server
ingress nginx
metallb
gateway api

Nothing fancy. Nothing opinionated.

The goal here is to have a usable Kubernetes cluster, not a fully loaded platform.

Everything else can wait.

Step 6 Day 2 is simple and explicit with Helmfile 📦

This is important.

There is no fancy GitOps setup here.

No Argo CD bootstrap.
No Flux managing Flux.
No recursive GitOps inception 😄

Instead:

Helmfile is run explicitly
changes are intentional
failures are visible
debugging stays simple

For a single cluster or a homelab, this is often the best tradeoff.

When GitOps really makes sense 🚀

Now, things change when you have multiple clusters to manage.

At that point, this setup becomes extremely powerful.

Because:

Terraform already creates the machines
Kubespray already installs Kubernetes
Helmfile already describes the platform state

You can easily plug a GitOps layer on top to:

create a new cluster from scratch
apply a standard baseline
end up with a ready to use Kubernetes platform
with almost zero manual action

Think of it as:
👉 one command to pop a full Kubernetes cluster, ready for workloads

This is where Argo CD or Flux start to shine, but only when the scale justifies it.

Until then, keeping things simple is often the most mature choice 🧘

Why I like this approach so much ❤️

Because it grows with you.

simple when you have one cluster
scalable when you have many
understandable at every step
close to how real platforms are built

No magic.
No hidden automation.
Just solid engineering.

If you can run this at home, you can run it anywhere 🌍

What is next 🔮

In upcoming articles I will dive into:

storage choices and csi
dns and external-dns
security with kyverno
observability and metrics
multi cluster patterns

Stay tuned 👋 and happy clustering :)

DEV Community