Otobong Edoho

Posted on May 13

Building a Production-Grade Kubernetes Cluster from Scratch — Part 1: Cluster Setup, Workloads, and a Real App

#devops #infrastructure #tutorial #kubernetes

Part 1 of 2 — From bare VMs to a fully running 3-service application on a self-managed Kubernetes cluster. No managed services. No shortcuts. Just raw kubeadm.

Series navigation:

Part 1 (you are here): Cluster setup, foundational workloads, deploying a full-stack app with ConfigMaps, Secrets, StatefulSets, and CI pipelines

Part 2: Networking deep dive — MetalLB, Nginx Ingress, clean hostnames, and NetworkPolicy enforcement

Full source code: All application code, Kubernetes manifests, and CI pipelines are available at
github.com/otie16/k8s-homelab-vm-project

There are two kinds of Kubernetes engineers.

The first kind provisions an EKS cluster, deploys a workload, and moves on. They know Kubernetes from the outside — the API, the manifests, the kubectl commands.

The second kind wants to know what's happening underneath. How does the scheduler actually decide where to place a pod? What does kubeadm actually do when you run kubeadm init? Why does Calico need kernel modules? What breaks when your pod CIDR overlaps with your host network?

This series is for the second kind.

I built a two-node Kubernetes cluster from scratch on two Ubuntu VMs in my homelab, deployed a production-style full-stack application on top of it, built CI pipelines with SAST and vulnerability scanning, debugged every error the cluster threw at me, and documented every step. This is that documentation — written so you can follow along, break things, fix them, and walk away understanding why Kubernetes works the way it does.

By the end of Part 1, you'll have:

A fully functional two-node kubeadm cluster
A 3-service application running on it (Next.js + Django REST API + PostgreSQL)
Proper use of ConfigMaps, Secrets, StatefulSets, Jobs, probes, and resource limits
CI pipelines with SAST, dependency scanning, and Docker image vulnerability scanning
A deep understanding of every concept you implemented

The Architecture

Your Laptop
    │
    ├── k8s-master  (192.168.1.100) — Control Plane
    │       kube-apiserver, etcd, scheduler,
    │       controller-manager, CoreDNS, Calico
    │
    └── k8s-worker-node (192.168.1.101) — Worker
            kubelet, kube-proxy, Calico
            Your actual workloads run here

Node specs:

Node	Role	IP	OS	Specs
k8s-master	Control Plane	192.168.1.100	Ubuntu 24.04 LTS	2 vCPU, 4GB RAM
k8s-worker-node	Worker	192.168.1.101	Ubuntu 24.04 LTS	2 vCPU, 4GB RAM

Both VMs run on VMware. You can use VirtualBox, Hyper-V, or any hypervisor — the Kubernetes setup is identical.

Part 1 — Bootstrapping the Cluster

Why kubeadm Instead of a Managed Service?

Managed Kubernetes (EKS, GKE, AKS) hides the control plane from you. You never see the API server. You never touch etcd. You never configure a CNI plugin from scratch. That's great for production but terrible for learning.

kubeadm is the official Kubernetes cluster bootstrapping tool. It handles the hard parts — generating certificates, writing control plane manifests, configuring etcd — while still giving you full access to everything. Running kubeadm once teaches you more about how Kubernetes actually works than months of using managed services.

Step 1 — Set Hostnames

On the control plane VM:

sudo hostnamectl set-hostname k8s-master

On the worker VM:

sudo hostnamectl set-hostname k8s-worker-node

On both VMs, add entries to /etc/hosts so nodes can resolve each other by name:

sudo nano /etc/hosts

Add at the bottom:

192.168.1.100  k8s-master
192.168.1.101  k8s-worker-node

Step 2 — Disable Swap (Both VMs)

Kubernetes requires swap to be off. The kubelet enforces this because swap causes unpredictable memory behaviour that breaks scheduling guarantees — if a container exceeds its memory limit, it should be OOMKilled immediately, not start swapping to disk and silently degrading.

# Disable immediately
sudo swapoff -a

# Disable permanently across reboots
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

Verify:

free -h
# Swap row should show 0B

Step 3 — Load Kernel Modules (Both VMs)

Kubernetes networking needs two kernel modules:

overlay — handles the layered filesystem that containers use
br_netfilter — allows iptables to see traffic crossing network bridges (required for pod-to-pod networking)

sudo modprobe overlay
sudo modprobe br_netfilter

# Make them persist across reboots
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

Step 4 — Set Kernel Networking Parameters (Both VMs)

These sysctl settings tell the kernel to let iptables process bridged traffic and to forward IPv4 packets — both required for Kubernetes networking:

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# Apply without rebooting
sudo sysctl --system

Step 5 — Install containerd (Both VMs)

Kubernetes needs a container runtime that implements the CRI (Container Runtime Interface). containerd is the standard choice — it's what Docker uses under the hood.

sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg lsb-release

# Add Docker's GPG key (containerd ships in Docker's repo)
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add the repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update
sudo apt-get install -y containerd.io

Critical — configure containerd to use the systemd cgroup driver:

Both containerd and kubelet must agree on the cgroup driver. On modern Ubuntu that's systemd. Mismatching them causes kubelet crashes that look completely unrelated to cgroups.

sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml

# Set SystemdCgroup = true
sudo sed -i 's/SystemdCgroup \= false/SystemdCgroup \= true/g' \
  /etc/containerd/config.toml

sudo systemctl restart containerd
sudo systemctl enable containerd
sudo systemctl status containerd

Step 6 — Install kubeadm, kubelet, kubectl (Both VMs)

sudo apt-get install -y apt-transport-https ca-certificates curl gpg

# Add Kubernetes apt repo — write as a single line, not multi-line
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.29/deb/Release.key | \
  sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /" | \
  sudo tee /etc/apt/sources.list.d/kubernetes.list

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl

# Pin versions — prevents accidental upgrades that break version skew rules
sudo apt-mark hold kubelet kubeadm kubectl
sudo systemctl enable kubelet

Important: Write the Kubernetes apt repo entry as a single unbroken line. Multi-line echo commands with backslashes cause malformed entries in the .list file that break apt-get update with E: Malformed entry 1.

Part 2 — Initialising the Control Plane

Step 7 — kubeadm init (Master Only)

sudo kubeadm init \
  --pod-network-cidr=10.244.0.0/16 \
  --apiserver-advertise-address=192.168.1.100

Why 10.244.0.0/16 and not 192.168.0.0/16?

This is a mistake I made the first time. My VMs are on 192.168.1.x. If I used 192.168.0.0/16 as the pod CIDR, it would overlap with the host network. Calico would get confused about which interface belongs to the pod network and which belongs to the host, and every pod would fail to start with stat /var/lib/calico/nodename: no such file or directory. Always choose a pod CIDR that doesn't overlap with your host network.

What kubeadm init does behind the scenes:

Generates all TLS certificates for cluster components
Writes static pod manifests for the API server, etcd, scheduler, and controller manager
Starts the control plane components
Installs CoreDNS
Outputs a kubeadm join command — copy this immediately

Step 8 — Configure kubectl

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Test it:

kubectl get nodes
# k8s-master should show NotReady — normal, CNI isn't installed yet

Step 9 — Install Calico CNI

Without a CNI plugin, pods can't communicate. The NotReady status is because of this.

# Download the manifest so we can edit it
curl -O https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml

# Set the correct pod CIDR to match what we used in kubeadm init
sed -i 's|# - name: CALICO_IPV4POOL_CIDR|- name: CALICO_IPV4POOL_CIDR|' calico.yaml
sed -i 's|#   value: "192.168.0.0/16"|  value: "10.244.0.0/16"|' calico.yaml

# Verify the change
grep -A1 "CALICO_IPV4POOL_CIDR" calico.yaml

kubectl apply -f calico.yaml

Watch Calico come up:

watch kubectl get pods -n kube-system

The calico-node pod goes through Init:0/3 → Init:1/3 → Init:2/3 → Running. The init containers pull ~250MB of images so this takes a few minutes. Once calico-node hits Running, the master goes Ready within 60 seconds.

Step 10 — Join the Worker Node

Generate a fresh join command on the master:

kubeadm token create --print-join-command

Run the output on the worker node with sudo:

sudo kubeadm join 192.168.1.100:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash>

Watch the worker appear:

watch kubectl get nodes

Expected output:

NAME              STATUS   ROLES           AGE   VERSION
k8s-master        Ready    control-plane   10m   v1.28.15
k8s-worker-node   Ready    <none>          2m    v1.29.15

Step 11 — Install the Storage Provisioner

On bare metal, there's no cloud provider to fulfill PersistentVolumeClaim requests automatically. Without a storage provisioner, any pod that requests a PVC will be stuck with FailedScheduling: pod has unbound immediate PersistentVolumeClaims.

Install Rancher's local-path provisioner:

kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.26/deploy/local-path-storage.yaml

# Set it as the default StorageClass
kubectl patch storageclass local-path \
  -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

# Verify
kubectl get storageclass

Expected:

NAME                   PROVISIONER             AGE
local-path (default)   rancher.io/local-path   1m

From this point every PVC in the cluster gets fulfilled automatically. No manual PV creation ever again.

Part 3 — The Application

We're deploying a real 3-service task manager:

Next.js frontend — React UI for creating, completing, and deleting tasks
Django REST API — CRUD endpoints backed by PostgreSQL
PostgreSQL — StatefulSet with persistent storage

All application source code is on GitHub:
github.com/otie16/k8s-homelab-vm-project

The repo contains:

backend/ — Django REST API (models, serializers, views, urls, wsgi, settings)

frontend/ — Next.js task manager UI with App Router

k8s/ — All Kubernetes manifests

.github/workflows/ — CI pipelines for both services

Project Structure

k8s-homelab-vm-project/
├── backend/
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── manage.py
│   ├── core/
│   │   ├── settings.py       # DB config from env, INSTALLED_APPS, TEMPLATES
│   │   ├── urls.py           # health/, ready/, api/ endpoints
│   │   └── wsgi.py           # get_wsgi_application() with django.setup()
│   └── tasks/
│       ├── apps.py           # TasksConfig AppConfig
│       ├── models.py         # Task model
│       ├── serializers.py
│       ├── views.py          # ModelViewSet
│       └── migrations/
│           └── 0001_initial.py
├── frontend/
│   ├── Dockerfile            # Multi-stage with Next.js standalone output
│   ├── next.config.js        # output: 'standalone'
│   └── app/
│       ├── layout.js         # Required root layout for App Router
│       └── page.js           # Task manager UI
├── k8s/
│   ├── namespace.yaml
│   ├── configmap.yaml
│   ├── secret.yaml
│   ├── postgres-statefulset.yaml
│   ├── postgres-service.yaml
│   ├── migrate-job.yaml
│   ├── backend-deployment.yaml
│   ├── backend-service.yaml
│   ├── frontend-deployment.yaml
│   ├── frontend-service.yaml
│   └── deploy.sh
└── .github/
    └── workflows/
        ├── backend-ci.yml
        └── frontend-ci.yml

Key Implementation Decisions Worth Understanding

wsgi.py — the bug that cost me the most time

The single most frustrating error in this entire project was AppRegistryNotReady: Apps aren't loaded yet. The cause: Django's WSGI handler instantiated at module import time before django.setup() runs.

# Wrong — WSGIHandler() called at import time, app registry not ready
from django.core.handlers.wsgi import WSGIHandler
application = WSGIHandler()

# Correct — handles initialisation order properly
import django
from django.core.wsgi import get_wsgi_application
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'core.settings')
django.setup()
application = get_wsgi_application()

This looks like a trivial difference but it determines whether Django's app registry is populated before URL patterns are loaded. The get_wsgi_application() function is the correct public API for exactly this reason. Without it, every request returns 500 and the pod enters CrashLoopBackOff.

Next.js standalone output — why it matters for image size

Without standalone mode, copying node_modules into the final Docker image produces ~800MB. With standalone mode enabled in next.config.js, Next.js traces exactly which files the production server needs. The final image runs node server.js directly — no npm, no Next.js CLI, no node_modules at runtime. Result: ~160MB instead of ~800MB.

The init container pattern — dependency ordering without hacks

Both the migration job and the backend deployment use an init container to wait for PostgreSQL:

initContainers:
- name: wait-for-postgres
  image: busybox
  command: ['sh', '-c',
    'until nc -z postgres 5432; do echo waiting; sleep 2; done']

This blocks the main container from starting until port 5432 responds. No sleep hacks, no retry logic in application code, no race conditions. The main container never starts until the dependency is ready.

Liveness vs Readiness probes — they're not the same thing

Both probes hit different endpoints for a reason:

/health/ → liveness: "Is this container alive?" Failure triggers a container restart.
/ready/ → readiness: "Is this container ready for traffic?" Failure removes the pod from the Service endpoint list without restarting it.

Your Django app might be alive but still warming up. Readiness handles that gracefully — the pod stays up but doesn't receive traffic until it signals it's ready.

Part 4 — Kubernetes Manifests Deep Dive

The full manifests are in k8s/ in the GitHub repo. Here's what each one does and the important decisions behind them.

Namespace

Everything lives in k8s-vm-app. Namespaces isolate resources, apply RBAC boundaries, and scope NetworkPolicy.

ConfigMap and Secret — The Right Separation

This is one of the most important patterns to get right in Kubernetes.

ConfigMap — anything you'd commit to a public git repo. Hostnames, ports, feature flags, non-sensitive config. In our case: DB_HOST, DB_PORT, DB_NAME, DEBUG, ALLOWED_HOSTS, NEXT_PUBLIC_API_URL.

Secret — anything you'd never commit. Passwords, API keys, tokens. In our case: DB_USER, DB_PASSWORD, POSTGRES_PASSWORD, DJANGO_SECRET_KEY.

Both are consumed via envFrom in the pod spec — containers get all keys as environment variables automatically without any hardcoded credentials touching the manifest files.

See → k8s/configmap.yaml | k8s/secret.yaml

PostgreSQL — Why StatefulSet and Not Deployment

This is the most important architectural decision for databases on Kubernetes.

A Deployment treats pods as interchangeable. Any pod can replace any other. No stable identity, no guaranteed ordering.

A StatefulSet gives pods:

Stable identity — pods are named postgres-0, postgres-1, not random hashes. postgres-0.postgres.k8s-vm-app.svc.cluster.local is always that specific pod.
Ordered startup/shutdown — pods start in order and terminate in reverse. Critical for primary/replica database setups.
Per-pod PVCs via volumeClaimTemplates — each replica gets its own PersistentVolumeClaim that follows it even if rescheduled to a different node.

The headless service (clusterIP: None) is required for StatefulSets — it allows DNS to resolve directly to individual pod IPs rather than a virtual cluster IP.

See → k8s/postgres-statefulset.yaml

The Migration Job

Database migrations run once, must complete before the application starts, and should retry on failure. A Kubernetes Job is the exact right primitive for this.

The manifest uses an init container that blocks until postgres:5432 responds, then runs python manage.py migrate --noinput. backoffLimit: 3 means Kubernetes retries up to 3 times on failure.

See → k8s/migrate-job.yaml

Backend and Frontend Deployments

Both deployments use:

RollingUpdate with maxUnavailable: 0 — zero downtime deploys, new pods must be ready before old ones are removed
imagePullPolicy: Always — ensures every rollout pulls the latest image from Docker Hub even if the tag hasn't changed
envFrom consuming both ConfigMap and Secret
Liveness and readiness probes
Resource requests and limits to prevent any single pod from starving others on the node

See → k8s/backend-deployment.yaml | k8s/frontend-deployment.yaml

The Deploy Script

Never apply manifests one by one manually. The deploy script handles ordering and waiting automatically:

chmod +x /home/oty-k8s/k8s/deploy.sh
/home/oty-k8s/k8s/deploy.sh

It applies resources in the correct dependency order, waits for PostgreSQL readiness before migrations, waits for the migration job to complete before the application starts, and stops immediately on any failure (set -e).

See → k8s/deploy.sh

Part 5 — CI Pipelines with Real Security Gates

Every image passes through a security pipeline before reaching Docker Hub. The pipeline architecture:

Push to main
    ↓
Lint + unit tests (flake8 / eslint)
    ↓
SAST: Bandit + pip-audit + Trivy filesystem scan
    ↓
Docker build + Trivy image scan (CRITICAL = fail hard)
    ↓
Push to Docker Hub (only on main, only if all gates pass)

GitHub Secrets Required

Go to your repo → Settings → Secrets and variables → Actions:

DOCKERHUB_USERNAME    your Docker Hub username
DOCKERHUB_TOKEN       Docker Hub access token (not your password)

What Each Security Tool Does

Bandit scans Python source code for security anti-patterns — hardcoded passwords, subprocess with shell=True, SQL string formatting, weak cryptography. Reads your code the way a security reviewer would, without executing it.

pip-audit cross-references every package in requirements.txt against the Python Packaging Advisory Database for known CVEs. If your Django version has a known vulnerability, it fails before the image is built.

Trivy filesystem scan runs against the source directory before the Docker build. Catches secrets accidentally committed, misconfigured files, and dependency vulnerabilities through a different database than pip-audit — the overlap is intentional.

Trivy image scan runs against the final built image layers. This is the deepest scan — it catches OS-level vulnerabilities that no source-level tool would see. A vulnerable libssl in the Alpine base image, for example. CRITICAL severity fails the pipeline hard. HIGH severity generates a report but doesn't block.

The key gate: nothing reaches Docker Hub unless lint, SAST, and image scanning all pass, and only on pushes to main.

See the full pipeline files:

Part 6 — Deploying to the Cluster

Clone the repo:

git clone https://github.com/otie16/k8s-homelab-vm-project.git
cd k8s-homelab-vm-project

Update the image names in the deployment manifests to your Docker Hub username, then build and push both images:

# Backend
cd backend
docker build -t YOUR_USERNAME/k8s-vm-app-backend:latest .
docker push YOUR_USERNAME/k8s-vm-app-backend:latest

# Frontend
cd ../frontend
docker build -t YOUR_USERNAME/k8s-vm-app-frontend:latest .
docker push YOUR_USERNAME/k8s-vm-app-frontend:latest

Copy manifests to the master node:

scp -r k8s/ oty-k8s@192.168.1.100:/home/oty-k8s/

SSH to the master and run the deploy script:

ssh oty-k8s@192.168.1.100
chmod +x /home/oty-k8s/k8s/deploy.sh
/home/oty-k8s/k8s/deploy.sh

Watch everything come up:

kubectl get all -n k8s-vm-app

Expected final state:

NAME                                  READY   STATUS
pod/django-backend-xxx                1/1     Running
pod/django-backend-yyy                1/1     Running
pod/nextjs-frontend-xxx               1/1     Running
pod/nextjs-frontend-yyy               1/1     Running
pod/postgres-0                        1/1     Running
pod/django-migrate-job-xxx            0/1     Completed

NAME                      TYPE        PORT(S)
service/django-backend    NodePort    8000:30000/TCP
service/nextjs-frontend   NodePort    3000:30001/TCP
service/postgres          ClusterIP   None

Access the app:

Frontend:    http://192.168.1.100:30001
Backend API: http://192.168.1.100:30000/api/tasks/
Health:      http://192.168.1.100:30000/health/

The Errors That Cost Me The Most Time

No honest Kubernetes writeup skips the debugging. Here are the ones worth knowing about:

stat /var/lib/calico/nodename: no such file or directory
Pod CIDR overlapping with the host network. Calico can't figure out which interface is for pods vs the host. Fix: use a CIDR that doesn't overlap — 10.244.0.0/16 when hosts are on 192.168.x.x.

AppRegistryNotReady: Apps aren't loaded yet
Django's WSGI handler instantiated at module import time before django.setup() runs. Fix: use get_wsgi_application() with explicit django.setup(). One line of difference, hours of debugging.

E: Malformed entry 1 in list file /etc/apt/sources.list.d/kubernetes.list
Multi-line echo commands with backslashes write literal newlines into the apt sources file. Fix: always write the deb entry as a single unbroken line.

pod has unbound immediate PersistentVolumeClaims
No storage provisioner on bare metal. Fix: install local-path-provisioner.

secret "app-secret" not found
Secret name mismatch — created as app-secrets (with an s) but manifests referenced app-secret. Fix: audit all references with grep -r "secretRef" k8s/.

Calico token expiry — Unauthorized on pod sandbox creation
Calico's CNI kubeconfig uses a projected ServiceAccount token with a 24-hour TTL. When it expires, new pod sandboxes fail. Workaround: delete the calico-node pod on the affected node — the daemonset recreates it with a fresh token.

# Refresh the Calico token on the worker
kubectl delete pod -n kube-system \
  $(kubectl get pods -n kube-system -o wide | grep calico-node | grep worker | awk '{print $1}')

What's Next

You now have a production-style cluster running a real application with proper security patterns. But two things aren't production-ready yet:

Services exposed on ugly NodePort high ports (30000, 30001)
No network-level isolation between pods

Part 2 fixes both — MetalLB for real LoadBalancer IPs, Nginx Ingress for clean hostname routing on port 80, and NetworkPolicy with real tests to verify traffic isolation works.

👉 Continue to Part 2: MetalLB, Nginx Ingress, and NetworkPolicy

Key Takeaways

kubeadm is the best learning tool for Kubernetes. Managed services hide the control plane. kubeadm forces you to understand certificates, etcd, CNI plugins, and component communication at a level that makes you significantly better at operating any Kubernetes cluster.

Bare metal is harder and more educational. No cloud LoadBalancer. No storage provisioner. No managed node groups. Every abstraction you take for granted in EKS has to be built manually — and every time you build it manually, you understand it better.

The debugging process is the education. Every error in this post was a lesson. The AppRegistryNotReady error taught me how Django's WSGI initialisation works at a depth I never would have reached following a happy-path tutorial.

Version skew matters. My master runs v1.28.15 and my worker joined at v1.29.15 — one minor version difference. Kubernetes tolerates this, but in production you manage it carefully.

Source code: github.com/otie16/k8s-homelab-vm-project

Follow for Part 2 — MetalLB, Nginx Ingress, and NetworkPolicy.

Tags: Kubernetes DevOps Platform Engineering kubeadm Homelab Cloud Native Docker Django NextJS

DEV Community