I set up a kubeadm cluster on Hetzner Cloud last week.
It broke in 6 different ways before it worked.
Here's every error, every fix, and the exact commands that solved each one.
TL;DR: conntrack not installed, private NIC named enp7s0 not eth1, Falcosidekick nil pointer crash on missing secret, fluent-bit chart deprecated (use Promtail), Loki distributed defaults breaking on a two-node cluster (use SingleBinary + emptyDir), cpx21/cx32 unavailable in nbg1 (used cpx32/cpx22). All fixed. Commands below.
The Setup
Two-node kubeadm cluster on Hetzner Cloud (nbg1 region):
- Control plane:
cpx32— 4 vCPU, 8GB RAM, Ubuntu 22.04 - Worker node:
cpx22— 3 vCPU, 4GB RAM, Ubuntu 22.04 - Private network enabled (Hetzner Cloud Networks)
- CNI: Flannel
- Goal: foundation for a Kubernetes security detection stack — Falco, Loki, Grafana, Trivy Operator, kube-bench
Break 1 — The Node Types I Wanted Didn't Exist
What happened
I planned around cpx21 (control plane) and cx32 (worker). When I went to create them in nbg1:
Error: server type cpx21 is not available in location nbg1
Not deprecated. Not removed. Just not available in that datacentre at that moment.
The fix
# Check availability before planning
hcloud server-type list | grep cpx
Went one tier up: cpx32 and cpx22. Slightly more expensive but available immediately.
💡 Lesson: Hetzner inventory varies by location and changes without notice. Always run
hcloud server-type listfiltered by your target region before committing to a server type in your Terraform or scripts.
Break 2 — conntrack Was Missing on Both Nodes
What happened
First kubeadm init attempt on the control plane:
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileNotFound]: /usr/sbin/conntrack not found
conntrack handles network connection tracking and is required for kube-proxy. Not installed by default on Hetzner's Ubuntu 22.04 images. Not mentioned clearly in the official kubeadm docs.
The fix
apt-get install -y conntrack
Add this to your node provisioning script before you ever run kubeadm:
apt-get update
apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
conntrack \
socat \
ipset
💡 Lesson:
conntrackis missing from Hetzner's Ubuntu default image and the kubeadm docs don't mention it clearly. Add it to every node bootstrap script before running anything else.
Break 3 — The Private Network Interface Wasn't eth1
What happened
Every tutorial, Stack Overflow answer, and blog post assumes Hetzner's private NIC is named eth1. On these nodes it wasn't:
ip addr show
1: lo: <LOOPBACK>
2: eth0: <BROADCAST> inet 5.x.x.x/32 ← public interface
3: enp7s0: <BROADCAST> inet 10.0.0.2/24 ← private interface
The private NIC was enp7s0. This caused two downstream problems:
- kubeadm advertised the public IP for the API server — worker joins routed over the public internet
- Flannel defaulted to the public interface for pod-to-pod traffic
The fix
Find your actual interface name first:
ip route | grep "10.0.0" | awk '{print $3}'
For kubeadm init, explicitly set the advertise address and node IP:
kubeadm init \
--apiserver-advertise-address=10.0.0.2 \
--pod-network-cidr=10.244.0.0/16 \
--node-ip=10.0.0.2
For Flannel, patch the manifest to specify the interface:
# In kube-flannel.yml, under kube-flannel container args:
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=enp7s0 # Add this line
For the worker node, set the node IP before joining:
echo "KUBELET_EXTRA_ARGS=--node-ip=10.0.0.3" \
>> /etc/default/kubelet
systemctl daemon-reload
systemctl restart kubelet
💡 Lesson: Never assume
eth1. Runip addr showon your Hetzner nodes before planning your networking. The private NIC name depends on the server type and can change.
Break 4 — The fluent-bit Helm Chart Was Deprecated
What happened
My original logging plan used fluent-bit. I added the Helm repo and ran the install:
Error: chart "fluent-bit" not found in stable repository
WARNING: This chart is deprecated
The stable/fluent-bit chart was deprecated and the ecosystem had moved to Promtail as the standard Loki log collector.
The fix
Switch to Promtail — purpose-built for Loki with better Kubernetes metadata enrichment:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install promtail grafana/promtail \
--namespace monitoring \
--set config.clients[0].url=http://loki:3100/loki/api/v1/push
Promtail runs as a DaemonSet, picks up pod logs automatically via the Kubernetes API, and enriches every line with namespace, pod name, container name, and node name.
💡 Lesson: Use Promtail over fluent-bit for Loki pipelines. Tighter integration, actively maintained, and Kubernetes metadata enrichment works out of the box with zero configuration.
Break 5 — Falcosidekick Went Into CrashLoopBackOff
What happened
Falco installed cleanly. Falcosidekick — the component that routes Falco alerts to Slack — did not:
kubectl get pods -n falco
NAME READY STATUS RESTARTS AGE
falco-abcd1 1/1 Running 0 4m
falcosidekick-xyz99 0/1 CrashLoopBackOff 6 4m
kubectl logs falcosidekick-xyz99 -n falco
panic: runtime error: invalid memory address or nil pointer dereference
error: failed to load configuration:
SLACK_WEBHOOKURL is required when Slack output is enabled
The webhook URL wasn't being passed through correctly from Helm values. A nil pointer in config loading caused a crash rather than a clean validation error.
The fix
Create the Slack webhook URL as a Kubernetes secret:
kubectl create secret generic falcosidekick-secrets \
--from-literal=slackWebhookUrl="https://hooks.slack.com/services/YOUR/WEBHOOK/URL" \
-n falco
Reference it in Helm values using existingSecret:
# falcosidekick-values.yaml
config:
slack:
webhookurl: ""
minimumpriority: "notice"
existingSecret: "falcosidekick-secrets"
helm upgrade --install falcosidekick falcosecurity/falcosidekick \
--namespace falco \
-f falcosidekick-values.yaml
Pod came up clean. First Slack alert arrived within 30 seconds.
💡 Lesson: Falcosidekick config errors crash rather than validate gracefully. Always put webhook URLs in a Kubernetes secret and reference
existingSecretin Helm values — cleaner and avoids the nil pointer crash entirely.
Break 6 — Loki Refused to Start on a Two-Node Cluster
What happened
This was the most time-consuming of the six. Loki's default Helm chart assumes a distributed deployment with multiple replicas, persistent volumes, a gateway component, and a caching layer:
kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
loki-backend-0 0/1 Pending 0 8m
loki-read-0 0/1 Pending 0 8m
loki-write-0 0/1 Pending 0 8m
loki-gateway-xyz 0/1 CrashLoopBackOff 4 8m
The pending pods were waiting for PVCs that couldn't bind — no storage class configured. The gateway crashed because the backend wasn't ready. Classic dependency deadlock.
The fix
SingleBinary deployment mode — Loki as a single process, no distributed components, no PVC required:
# loki-values.yaml
loki:
commonConfig:
replication_factor: 1
storage:
type: filesystem
schemaConfig:
configs:
- from: "2024-01-01"
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: loki_index_
period: 24h
deploymentMode: SingleBinary
singleBinary:
replicas: 1
persistence:
enabled: false
extraVolumes:
- name: loki-data
emptyDir: {}
extraVolumeMounts:
- name: loki-data
mountPath: /var/loki
read:
replicas: 0
write:
replicas: 0
backend:
replicas: 0
gateway:
enabled: false
chunksCache:
enabled: false
resultsCache:
enabled: false
lokiCanary:
enabled: false
test:
enabled: false
helm upgrade --install loki grafana/loki \
--namespace monitoring \
-f loki-values.yaml
kubectl get pods -n monitoring
# NAME READY STATUS RESTARTS AGE
# loki-0 1/1 Running 0 45s
💡 Lesson: For Loki on small clusters (under 5 nodes, no storage class),
deploymentMode: SingleBinarywithemptyDirpersistence is the correct starting point. The distributed defaults are built for production scale — not a two-node homelab cluster.
Full Working Install Order
Click to expand — complete install sequence
# Step 1 — Node prerequisites (run on BOTH nodes)
apt-get update && apt-get install -y \
conntrack socat ipset curl
# Step 2 — Container runtime + kubeadm/kubelet/kubectl
# (standard Ubuntu kubeadm installation docs)
# Step 3 — Find your private NIC name
ip addr show
# Note the interface name next to your 10.x.x.x address
# Step 4 — kubeadm init (control plane only)
kubeadm init \
--apiserver-advertise-address=10.0.0.2 \
--pod-network-cidr=10.244.0.0/16
# Step 5 — Flannel with explicit interface
# Download manifest, add --iface=enp7s0 to container args, apply
kubectl apply -f kube-flannel-enp7s0.yml
# Step 6 — Worker node prep (run on worker)
echo "KUBELET_EXTRA_ARGS=--node-ip=10.0.0.3" \
>> /etc/default/kubelet
systemctl daemon-reload && systemctl restart kubelet
# Step 7 — Worker join (run on worker)
kubeadm join 10.0.0.2:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash>
# Step 8 — Namespaces
kubectl create namespace monitoring
kubectl create namespace falco
# Step 9 — Falco secret first
kubectl create secret generic falcosidekick-secrets \
--from-literal=slackWebhookUrl="YOUR_WEBHOOK_URL" \
-n falco
# Step 10 — Falco + Falcosidekick
helm repo add falcosecurity \
https://falcosecurity.github.io/charts
helm install falco falcosecurity/falco \
--namespace falco -f falco-values.yaml
helm install falcosidekick falcosecurity/falcosidekick \
--namespace falco -f falcosidekick-values.yaml
# Step 11 — Loki SingleBinary
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki \
--namespace monitoring -f loki-values.yaml
# Step 12 — Promtail
helm install promtail grafana/promtail \
--namespace monitoring \
--set config.clients[0].url=http://loki:3100/loki/api/v1/push
# Step 13 — Grafana
helm install grafana grafana/grafana \
--namespace monitoring \
--set adminPassword=changeme
# Step 14 — Trivy Operator
helm install trivy-operator \
aquasecurity/trivy-operator \
--namespace monitoring \
--set trivy.ignoreUnfixed=true
Summary
| Break | Root cause | Fix |
|---|---|---|
| Server types unavailable | Hetzner inventory varies by region | Check hcloud server-type list first |
| conntrack missing | Not in Ubuntu default image | Add to bootstrap script |
| Wrong NIC name | Hetzner uses enp7s0 not eth1 | Run ip addr show before planning |
| fluent-bit deprecated | Chart moved | Use Promtail instead |
| Falcosidekick crash | Nil pointer on missing secret | Use existingSecret in Helm values |
| Loki pending | Distributed defaults need PVC | Use SingleBinary + emptyDir
|
What's Next
With the stack running I moved straight into attack simulation.
The next post in this series covers Attack 1 — deploying a cryptominer into the cluster and watching Falco catch it in 47 seconds with three correlated alerts, full Loki log correlation, and MITRE ATT&CK evidence.
Full config files, patched Flannel manifest, and Helm values are in the repo: github.com/chrisazzo/k8s-soc-foundation
Building a DevSecOps portfolio targeting AI Security Architect work in London and Zurich. Follow the series for the full attack simulation, hardening, and CKS build logs.
Top comments (0)