Christopher Azzopardi

Posted on May 30 • Edited on Jun 5 • Originally published at Medium

The Six Things That Broke During My kubeadm Setup on Hetzner — and How I Fixed Them

#security #kubernetes #devops #tutorial

I set up a kubeadm cluster on Hetzner Cloud last week.

It broke in 6 different ways before it worked.

Here's every error, every fix, and the exact commands that solved each one.

TL;DR: conntrack not installed, private NIC named enp7s0 not eth1, Falcosidekick nil pointer crash on missing secret, fluent-bit chart deprecated (use Promtail), Loki distributed defaults breaking on a two-node cluster (use SingleBinary + emptyDir), cpx21/cx32 unavailable in nbg1 (used cpx32/cpx22). All fixed. Commands below.

The Setup

Two-node kubeadm cluster on Hetzner Cloud (nbg1 region):

Control plane: cpx32 — 4 vCPU, 8GB RAM, Ubuntu 22.04
Worker node: cpx22 — 3 vCPU, 4GB RAM, Ubuntu 22.04
Private network enabled (Hetzner Cloud Networks)
CNI: Flannel
Goal: foundation for a Kubernetes security detection stack — Falco, Loki, Grafana, Trivy Operator, kube-bench

Break 1 — The Node Types I Wanted Didn't Exist

What happened

I planned around cpx21 (control plane) and cx32 (worker). When I went to create them in nbg1:

Error: server type cpx21 is not available in location nbg1

Not deprecated. Not removed. Just not available in that datacentre at that moment.

The fix

# Check availability before planning
hcloud server-type list | grep cpx

Went one tier up: cpx32 and cpx22. Slightly more expensive but available immediately.

💡 Lesson: Hetzner inventory varies by location and changes without notice. Always run hcloud server-type list filtered by your target region before committing to a server type in your Terraform or scripts.

Break 2 — conntrack Was Missing on Both Nodes

What happened

First kubeadm init attempt on the control plane:

[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR FileNotFound]: /usr/sbin/conntrack not found

conntrack handles network connection tracking and is required for kube-proxy. Not installed by default on Hetzner's Ubuntu 22.04 images. Not mentioned clearly in the official kubeadm docs.

The fix

apt-get install -y conntrack

Add this to your node provisioning script before you ever run kubeadm:

apt-get update
apt-get install -y \
  apt-transport-https \
  ca-certificates \
  curl \
  conntrack \
  socat \
  ipset

💡 Lesson: conntrack is missing from Hetzner's Ubuntu default image and the kubeadm docs don't mention it clearly. Add it to every node bootstrap script before running anything else.

Break 3 — The Private Network Interface Wasn't eth1

What happened

Every tutorial, Stack Overflow answer, and blog post assumes Hetzner's private NIC is named eth1. On these nodes it wasn't:

ip addr show

1: lo: <LOOPBACK>
2: eth0: <BROADCAST> inet 5.x.x.x/32       ← public interface
3: enp7s0: <BROADCAST> inet 10.0.0.2/24    ← private interface

The private NIC was enp7s0. This caused two downstream problems:

kubeadm advertised the public IP for the API server — worker joins routed over the public internet
Flannel defaulted to the public interface for pod-to-pod traffic

The fix

Find your actual interface name first:

ip route | grep "10.0.0" | awk '{print $3}'

For kubeadm init, explicitly set the advertise address and node IP:

kubeadm init \
  --apiserver-advertise-address=10.0.0.2 \
  --pod-network-cidr=10.244.0.0/16 \
  --node-ip=10.0.0.2

For Flannel, patch the manifest to specify the interface:

# In kube-flannel.yml, under kube-flannel container args:
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=enp7s0    # Add this line

For the worker node, set the node IP before joining:

echo "KUBELET_EXTRA_ARGS=--node-ip=10.0.0.3" \
  >> /etc/default/kubelet
systemctl daemon-reload
systemctl restart kubelet

💡 Lesson: Never assume eth1. Run ip addr show on your Hetzner nodes before planning your networking. The private NIC name depends on the server type and can change.

Break 4 — The fluent-bit Helm Chart Was Deprecated

What happened

My original logging plan used fluent-bit. I added the Helm repo and ran the install:

Error: chart "fluent-bit" not found in stable repository
WARNING: This chart is deprecated

The stable/fluent-bit chart was deprecated and the ecosystem had moved to Promtail as the standard Loki log collector.

The fix

Switch to Promtail — purpose-built for Loki with better Kubernetes metadata enrichment:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install promtail grafana/promtail \
  --namespace monitoring \
  --set config.clients[0].url=http://loki:3100/loki/api/v1/push

Promtail runs as a DaemonSet, picks up pod logs automatically via the Kubernetes API, and enriches every line with namespace, pod name, container name, and node name.

💡 Lesson: Use Promtail over fluent-bit for Loki pipelines. Tighter integration, actively maintained, and Kubernetes metadata enrichment works out of the box with zero configuration.

Break 5 — Falcosidekick Went Into CrashLoopBackOff

What happened

Falco installed cleanly. Falcosidekick — the component that routes Falco alerts to Slack — did not:

kubectl get pods -n falco

NAME                          READY   STATUS             RESTARTS   AGE
falco-abcd1                   1/1     Running            0          4m
falcosidekick-xyz99           0/1     CrashLoopBackOff   6          4m

kubectl logs falcosidekick-xyz99 -n falco

panic: runtime error: invalid memory address or nil pointer dereference
error: failed to load configuration:
SLACK_WEBHOOKURL is required when Slack output is enabled

The webhook URL wasn't being passed through correctly from Helm values. A nil pointer in config loading caused a crash rather than a clean validation error.

The fix

Create the Slack webhook URL as a Kubernetes secret:

kubectl create secret generic falcosidekick-secrets \
  --from-literal=slackWebhookUrl="https://hooks.slack.com/services/YOUR/WEBHOOK/URL" \
  -n falco

Reference it in Helm values using existingSecret:

# falcosidekick-values.yaml
config:
  slack:
    webhookurl: ""
    minimumpriority: "notice"
  existingSecret: "falcosidekick-secrets"

helm upgrade --install falcosidekick falcosecurity/falcosidekick \
  --namespace falco \
  -f falcosidekick-values.yaml

Pod came up clean. First Slack alert arrived within 30 seconds.

💡 Lesson: Falcosidekick config errors crash rather than validate gracefully. Always put webhook URLs in a Kubernetes secret and reference existingSecret in Helm values — cleaner and avoids the nil pointer crash entirely.

Break 6 — Loki Refused to Start on a Two-Node Cluster

What happened

This was the most time-consuming of the six. Loki's default Helm chart assumes a distributed deployment with multiple replicas, persistent volumes, a gateway component, and a caching layer:

kubectl get pods -n monitoring

NAME                  READY   STATUS             RESTARTS   AGE
loki-backend-0        0/1     Pending            0          8m
loki-read-0           0/1     Pending            0          8m
loki-write-0          0/1     Pending            0          8m
loki-gateway-xyz      0/1     CrashLoopBackOff   4          8m

The pending pods were waiting for PVCs that couldn't bind — no storage class configured. The gateway crashed because the backend wasn't ready. Classic dependency deadlock.

The fix

SingleBinary deployment mode — Loki as a single process, no distributed components, no PVC required:

# loki-values.yaml
loki:
  commonConfig:
    replication_factor: 1
  storage:
    type: filesystem
  schemaConfig:
    configs:
      - from: "2024-01-01"
        store: tsdb
        object_store: filesystem
        schema: v13
        index:
          prefix: loki_index_
          period: 24h

deploymentMode: SingleBinary

singleBinary:
  replicas: 1
  persistence:
    enabled: false
  extraVolumes:
    - name: loki-data
      emptyDir: {}
  extraVolumeMounts:
    - name: loki-data
      mountPath: /var/loki

read:
  replicas: 0
write:
  replicas: 0
backend:
  replicas: 0
gateway:
  enabled: false
chunksCache:
  enabled: false
resultsCache:
  enabled: false
lokiCanary:
  enabled: false
test:
  enabled: false

helm upgrade --install loki grafana/loki \
  --namespace monitoring \
  -f loki-values.yaml

kubectl get pods -n monitoring
# NAME     READY   STATUS    RESTARTS   AGE
# loki-0   1/1     Running   0          45s

💡 Lesson: For Loki on small clusters (under 5 nodes, no storage class), deploymentMode: SingleBinary with emptyDir persistence is the correct starting point. The distributed defaults are built for production scale — not a two-node homelab cluster.

Full Working Install Order

Click to expand — complete install sequence

# Step 1 — Node prerequisites (run on BOTH nodes)
apt-get update && apt-get install -y \
  conntrack socat ipset curl

# Step 2 — Container runtime + kubeadm/kubelet/kubectl
# (standard Ubuntu kubeadm installation docs)

# Step 3 — Find your private NIC name
ip addr show
# Note the interface name next to your 10.x.x.x address

# Step 4 — kubeadm init (control plane only)
kubeadm init \
  --apiserver-advertise-address=10.0.0.2 \
  --pod-network-cidr=10.244.0.0/16

# Step 5 — Flannel with explicit interface
# Download manifest, add --iface=enp7s0 to container args, apply
kubectl apply -f kube-flannel-enp7s0.yml

# Step 6 — Worker node prep (run on worker)
echo "KUBELET_EXTRA_ARGS=--node-ip=10.0.0.3" \
  >> /etc/default/kubelet
systemctl daemon-reload && systemctl restart kubelet

# Step 7 — Worker join (run on worker)
kubeadm join 10.0.0.2:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash>

# Step 8 — Namespaces
kubectl create namespace monitoring
kubectl create namespace falco

# Step 9 — Falco secret first
kubectl create secret generic falcosidekick-secrets \
  --from-literal=slackWebhookUrl="YOUR_WEBHOOK_URL" \
  -n falco

# Step 10 — Falco + Falcosidekick
helm repo add falcosecurity \
  https://falcosecurity.github.io/charts
helm install falco falcosecurity/falco \
  --namespace falco -f falco-values.yaml
helm install falcosidekick falcosecurity/falcosidekick \
  --namespace falco -f falcosidekick-values.yaml

# Step 11 — Loki SingleBinary
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki \
  --namespace monitoring -f loki-values.yaml

# Step 12 — Promtail
helm install promtail grafana/promtail \
  --namespace monitoring \
  --set config.clients[0].url=http://loki:3100/loki/api/v1/push

# Step 13 — Grafana
helm install grafana grafana/grafana \
  --namespace monitoring \
  --set adminPassword=changeme

# Step 14 — Trivy Operator
helm install trivy-operator \
  aquasecurity/trivy-operator \
  --namespace monitoring \
  --set trivy.ignoreUnfixed=true

Summary

Break	Root cause	Fix
Server types unavailable	Hetzner inventory varies by region	Check `hcloud server-type list` first
conntrack missing	Not in Ubuntu default image	Add to bootstrap script
Wrong NIC name	Hetzner uses enp7s0 not eth1	Run `ip addr show` before planning
fluent-bit deprecated	Chart moved	Use Promtail instead
Falcosidekick crash	Nil pointer on missing secret	Use `existingSecret` in Helm values
Loki pending	Distributed defaults need PVC	Use `SingleBinary` + `emptyDir`

What's Next

With the stack running, I moved straight into the attack simulation. The first post in this series covers Attack 1 — cryptominer deployment — and how Falco caught it in 47 seconds with three correlated alerts.

Attacks 2 through 4 (privileged container escape, service account token abuse, kubectl exec) follow in subsequent posts.

Full config files, patched Flannel manifest, and Helm values are in the repo: github.com/chrisazzo/k8s-soc-foundation

Building a DevSecOps portfolio targeting AI Security Architect contract work. Follow the series for the full attack simulation, hardening, and CKS build logs.

Next: Attack 1 — Cryptominer Deployment →

DEV Community

The Six Things That Broke During My kubeadm Setup on Hetzner — and How I Fixed Them

The Setup

Break 1 — The Node Types I Wanted Didn't Exist

What happened

The fix

Break 2 — conntrack Was Missing on Both Nodes

What happened

The fix

Break 3 — The Private Network Interface Wasn't eth1

What happened

The fix

Break 4 — The fluent-bit Helm Chart Was Deprecated

What happened

The fix

Break 5 — Falcosidekick Went Into CrashLoopBackOff

What happened

The fix

Break 6 — Loki Refused to Start on a Two-Node Cluster

What happened

The fix

Full Working Install Order

Summary

What's Next

Top comments (0)