DEV Community

Cover image for Homelab HA Kubernetes Cluster Upgrade: My New Shrine / Altar
George Ezejiofor
George Ezejiofor

Posted on • Originally published at georgeezejiofor.com

Homelab HA Kubernetes Cluster Upgrade: My New Shrine / Altar

INTRODUCTION

In the beginning, there was MicroK8s on a Mac Studio. It was fast, it was ARM64, but it was lonely. Today, I stand before a high-availability monument built on Proxmox, orchestrated by Terraform, and kept in holy alignment by FluxCD.

Not long ago, my entire Kubernetes universe lived inside a humble Mac Studio - a single microk8s cluster with 6 nodes running on ARM64. It was cute, quiet, and completely unfit for the kind of multi‑DC, production‑grade nonsense I wanted to learn.

So I burned it down. And built this new place of worship.

Today, I run a high‑availability kubeadm cluster across three bare‑metal Proxmox Datacenters, all managed with Terraform, Ansible, and FluxCD. No cloud vendor lock‑in. No magic. Just a rack full of metal, a bunch of cables, and a lot of terminal time.

This is the story of my shrine - and how you can build one too.

UGLY WIRING:

MAJOR REASON WHY I CALLED IT SHRINE 😂

homelab1


Traffic Flow at a Glance

Before we dive into the layers, here's how the traffic moves from my "pulpit" (Mac Studio) to the "shrine" (the cluster):

No inbound holes – all management traffic originates from my Mac or the cluster itself (GitOps pulls). This is how real datacenters work.

┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      │
│                         🖥️ macOS COMMAND CENTER (The Pulpit)                         │
│                                                                                      │
│              kubectl  │  Terraform  │  Ansible  │  Flux CLI  │  Git                  │
│                                                                                      │
│                        (All management tools installed locally)                      │
│                                                                                      │
└─────────────────────────────────────────┬────────────────────────────────────────────┘
                                          
                          SSH │ API (HTTPS) │ Git (SSH/HTTPS)
                                          
                                          
┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      │
│                         🛡️ OPNsense Firewall (10.0.1.1)                              │
│                                                                                      │
│   ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐                      │
│   │ DHCP Server     │  │ Static DHCP     │  │ WireGuard VPN   │                      │
│   │ 10.0.1.100-200  │  │ MAC → IP Pinning │  │ Remote Access   │                      │
│   └─────────────────┘  └─────────────────┘  └─────────────────┘                      │
│                                                                                      │
│   • Split-Horizon DNS: *.georgehomelab.com → 10.0.1.x                               │
│   • Gateway for all Proxmox + Kubernetes traffic                                     │
│   • Firewall rules: WAN → LAN passes for management                                  │
│                                                                                      │
└─────────────────────────────────────────┬────────────────────────────────────────────┘
                                          
                                          │ LAN (10.0.1.0/16)
                                          │ 2.5GbE Links
                                          
┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      │
│                         🔌 Zyxel XMG1915-10E Switch                                  │
│                                                                                      │
│                     Star topology │ 8× 2.5GbE + 2× SFP+                              │
│                                                                                      │
└─────────────────────────────────────────┬────────────────────────────────────────────┘
                                          
              ┌───────────────────────────┼───────────────────────────┐
              │                           │                           │
              ▼                           ▼                           ▼
┌─────────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────────┐
│                         │ │                         │ │                         │
│  🏗️ Proxmox Node 1      │ │  🏗️ Proxmox Node 2      │ │  🏗️ Proxmox Node 3      │
│  (proxmox-dc-1)         │ │  (proxmox-dc-2)         │ │  (proxmox-dc-3)         │
│  10.0.1.10              │ │  10.0.1.11              │ │  10.0.1.12              │
│                         │ │                         │ │                         │
│  • Local ZFS Storage    │ │  • Local ZFS Storage    │ │  • Local ZFS Storage    │
│  • vmbr0 Bridge         │ │  • vmbr0 Bridge         │ │  • vmbr0 Bridge         │
│  • NFS Client (Backups) │ │  • NFS Client (Backups) │ │  • NFS Client (Backups) │
│                         │ │                         │ │                         │
│  Terraform → VM Creation via Proxmox API (telmate/provider)                          │
│  Packer → Ubuntu Cloud-Init Templates                                                │
│                                                                                      │
└─────────────────────────┘ └─────────────────────────┘ └─────────────────────────┘
              │                           │                           │
              │ Cloud-Init DHCP (Static Reservations → Predictable IPs)                │
              │                           │                           │
              └───────────────────────────┼───────────────────────────┘
                                          
┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      │
│                         ☸️ HA Kubernetes Cluster (kubeadm)                           │
│                                                                                      │
│   ┌─────────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────────┐
│   │ Control Plane Node 1    │ │ Control Plane Node 2    │ │ Control Plane Node 3    │
│   │ k8s-cp-1                │ │ k8s-cp-2                │ │ k8s-cp-3                │
│   │ 10.0.1.110              │ │ 10.0.1.111              │ │ 10.0.1.112              │
│   │                         │ │                         │ │                         │
│   │ • etcd (stacked)        │ │ • etcd (stacked)        │ │ • etcd (stacked)        │
│   │ • kube-apiserver        │ │ • kube-apiserver        │ │ • kube-apiserver        │
│   │ • kube-vip (VIP)        │ │ • kube-vip (VIP)        │ │ • kube-vip (VIP)        │
│   └─────────────────────────┘ └─────────────────────────┘ └─────────────────────────┘
│                                                                                      │
│   ┌─────────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────────┐
│   │ Worker Node 1           │ │ Worker Node 2           │ │ Worker Node 3           │
│   │ k8s-worker-1            │ │ k8s-worker-2            │ │ k8s-worker-3            │
│   │ 10.0.1.120              │ │ 10.0.1.121              │ │ 10.0.1.122              │
│   │                         │ │                         │ │                         │
│   │ • Calico CNI (BGP)      │ │ • Calico CNI (BGP)      │ │ • Calico CNI (BGP)      │
│   │ • kube-proxy            │ │ • kube-proxy            │ │ • kube-proxy            │
│   │ • Workload Pods         │ │ • Workload Pods         │ │ • Workload Pods         │
│   └─────────────────────────┘ └─────────────────────────┘ └─────────────────────────┘
│                                                                                      │
│   Pod CIDR: 10.244.0.0/16 │ Service CIDR: 10.245.0.0/16 │ MetalLB: 10.0.1.200-210  │
│                                                                                      │
│   🔧 Bootstrapped entirely by Ansible (kubeadm playbook)                            │
│                                                                                      │
└─────────────────────────────────────────┬────────────────────────────────────────────┘
                                          
                                          │ GitOps Sync (Outbound Only)
                                          │ FluxCD pulls from GitHub (no inbound!)
                                          
┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      │
│                         🔄 FluxCD System (Inside Cluster)                            │
│                                                                                      │
│   ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐                      │
│   │ source-         │  │ kustomize-      │  │ helm-           │                      │
│   │ controller      │  │ controller      │  │ controller      │                      │
│   └─────────────────┘  └─────────────────┘  └─────────────────┘                      │
│                                                                                      │
│   ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐                      │
│   │ notification-   │  │ image-reflector-│  │ image-          │                      │
│   │ controller      │  │ controller      │  │ automation-     │                      │
│   └─────────────────┘  └─────────────────┘  └─────────────────┘                      │
│                                                                                      │
│   • Deployed as part of Ansible playbook (not a separate step)                       │
│   • Continuously reconciles cluster state with Git                                   │
│   • Auto-heals configuration drift                                                   │
│                                                                                      │
└─────────────────────────────────────────┬────────────────────────────────────────────┘
                                          
                                          │ HTTPS/SSH (Outbound Pull)
                                          
                                          
┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      │
│                              📦 GitHub Private Repository                            │
│                                                                                      │
│   ┌─────────────────────────────────────────────────────────────────────────────┐    │
│   │  clusters/prod/                                                             │    │
│   │  ├── flux-system/          # Flux bootstrapping config                      │    │
│   │  │   ├── gotk-components.yaml                                               │    │
│   │  │   └── gotk-sync.yaml                                                     │    │
│   │  ├── apps/                  # Application deployments                       │    │
│   │  │   ├── metallb/                                                          │    │
│   │  │   ├── istio-ingress/                                                    │    │
│   │  │   └── prometheus-stack/                                                 │    │
│   │  └── infrastructure/        # Cluster-wide config                          │    │
│   │      ├── namespaces.yaml                                                   │    │
│   │      └── storage-class.yaml                                                │    │
│   └─────────────────────────────────────────────────────────────────────────────┘    │
│                                                                                      │
│   🔑 Source of Truth: Every change starts as a PR, reviewed, merged, then applied   │
│                                                                                      │
└──────────────────────────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Level 1: The Physical Layer (The Foundations)

Every altar begins with something tangible.

Hardware: A fleet of Minisforum MS-01 machines acting as compute Datacenter (96GB RAM and 8GB nvidia GPU on each machine) . That's Total of 288GB RAM and 24GB nvidia GPU in the three _Minisforum MS-01
_

Network Entry Point: My Mac Studio (the “pulpit”), connected via Wi-Fi

Firewall: OPNsense bridging external (192.168.1.x) to internal lab network (10.0.1.x)

Out-of-Band Access: TinyPilot Voyager 2a and TESmart 4‑port HDMI KVM — BIOS-level control even when the OS is down

Switch: Zyxel XMG1915-10E (2.5GbE + SFP+) is the Central Nervous System. With High-Velocity East-West Traffic ( Low Latency / High Throughput for etcd and storage )

Why I worship here:

Physical simplicity enables logical complexity.

No mystery cables. Everything is deliberate. This playground makes me the opportunity to play with any cloud-native tool with ease.

Level 2: The Infrastructure Layer (Proxmox Datacenter)

Before automation, there must be a foundation.

Proxmox VE installed manually on all three Minisforum MS-01 machine

Clustered into a single datacenter abstraction

Networking:

  • vmbr0 → Kubernetes network

Static host IPs:

  • 10.0.1.1x
  • 10.0.1.1x
  • 10.0.1.1x

Gateway: 10.0.1.x (OPNsense)

Storage:

  • Local ZFS (NVMe)
  • NFS for shared ISO + backups

The ritual:

I installed the first Proxmox VE manually on each machine via TinyPilot’s virtual media from my MAC-Studio browser over wifi.

No HDMI cable ever touched my desk.

Level 3: The Node Layer (Terraform Automation)

I no longer click buttons to create infrastructure.

I declare it.

Using the Proxmox Terraform provider, I define:

  • VM CPU, memory, disk
  • Network interfaces
  • Clone source (Ubuntu template from Packer)
resource "proxmox_vm_qemu" "k8s_node" {
  for_each = var.nodes

  name        = each.value.name
  target_node = each.value.proxmox_node
  clone       = "ubuntu-24-04-template"
  cores       = each.value.cores
  memory      = each.value.memory

  network {
    model     = "virtio"
    bridge    = "vmbr0"
    ipconfig0 = "ip=dhcp"
  }
}
Enter fullscreen mode Exit fullscreen mode

The DHCP Decision (And Why It Matters)

This was one of the most important lessons in my journey.

In my old Mac Studio setup, I used pure DHCP for Kubernetes nodes.

It worked… until every restart broke my cluster access.

What went wrong?

  • Control plane nodes changed IPs
  • kubeconfig became invalid
  • API server endpoints broke
  • etcd stability was at risk

Even with 3 control planes, the cluster wasn’t truly stable.

Why Not Static IPs?

Because static IPs inside the OS mean:

  • Manual netplan configuration
  • Hardcoding network logic into templates
  • Reduced rebuild flexibility

That’s not how cloud-native systems behave.

The Solution: DHCP + Reservations

I used DHCP everywhere — but configured static reservations in OPNsense.

  • ✔ Nodes auto-configure
  • ✔ IPs never change
  • ✔ Rebuilds are seamless
  • ✔ etcd remains stable

💡 The Real Insight

Kubernetes doesn’t care how IPs are assigned — only that they don’t change.

Level 4: The Cluster Layer (Ansible + Kubeadm)

Once the infrastructure exists, it must be transformed.

Using Ansible:

  • - OS hardening
  • - Swap disabled
  • - containerd installed
  • kubeadm, kubelet, kubectl configured

HA Control Plane

  • 3 control plane nodes
  • Stacked etcd (homelab-friendly)
  • kube-vip for API virtual IP

Level 5: The Application Layer (GitOps with FluxCD)

This is where everything changes.

Instead of imperative deployments or declarative deployment with kubectl, I use GitOps FluxCD.

GitOps From Day One

FluxCD is not an add-on.

It is deployed during cluster creation via Ansible.

That means:

  • Cluster is GitOps-ready immediately
  • No manual bootstrap later
  • No drift from day one

The Pull Model

  • Flux runs inside the cluster
  • Watches Git repository
  • Pulls changes automatically

No inbound access required.

Homelab3

Homelab4

Homelab5

Traffic Flow

Mac Studio (192.168.1.x)
        
        
OPNsense Firewall (10.0.1.x)
        
        
Proxmox Cluster (10.0.1.1x–1x)
        
        
Kubernetes Nodes (DHCP → Reserved IPs)
        
        
FluxCD Controllers (inside cluster)
        
        
GitHub (OUTBOUND pull model)

Enter fullscreen mode Exit fullscreen mode

Key Insight:

  • ❌ GitHub never connects to your cluster
  • ❌ No firewall holes needed
  • ✅ Flux initiates outbound sync

** Current State of the Shrine**

  • 3 control plane nodes ✅
  • 3 worker nodes ✅
  • etcd cluster healthy ✅
  • Flux controllers distributed across nodes ✅
  • Calico networking active ✅

This is no longer a lab.

It is a self-healing platform.

Homelab2

Homelab6

What I Learned

  • DHCP + reservations is the sweet spot
  • etcd requires stable identity, not static config
  • GitOps removes human drift completely
  • Terraform + Ansible + FluxCD = powerful combination
  • Firewalls must allow internal routing for automation
  • Never use root API for automation — use scoped tokens

What’s Next on the Altar

  • Ceph or Longhorn for HA storage
  • Velero for cluster backups
  • External Secrets + Vault
  • Cluster autoscaler experiments

Final Words

This homelab is more than a project.

It is a practice ground for real-world platform engineering.

The move from a single ARM node to a distributed HA cluster wasn’t just an upgrade in hardware — it was an upgrade in mindset.

My Mac Studio is no longer the host.

It is the pulpit.

The Shrine runs independently.

If you’re thinking of building something like this — do it.
Start small. Break things. Rebuild them better.

Now go build your own altar. 🛐

🤝 Stay Connected

Found this guide helpful? Follow my journey into Homelabing on LinkedIn! Click the blue LinkedIn button to connect: George Ezejiofor . Let’s keep building scalable, secure cloud-native systems, one project at a time! 🌐🔧

Top comments (0)