Ian Packard for Octasoft Ltd

Posted on Jan 31 • Originally published at wsl-ui.octasoft.co.uk

From Zero to K3s - Bootstrap Scripts and Time Sync Nightmares

#kubernetes #k3s #homelab #cilium

In Part 1, I explained why my homelab runs in a Hyper-V VM instead of WSL2. Now let's talk about how I actually bootstrap the cluster - and the time synchronisation issue that had me questioning my life choices.

The Goal: One Script to Rule Them All

I wanted a single bootstrap.sh that could take a fresh Ubuntu VM and produce a working Kubernetes cluster with:

K3s as the distribution
Cilium as the CNI (with Hubble for observability)
Gateway API CRDs installed
External Secrets Operator for secrets management
ArgoCD for GitOps

The script needed to be idempotent - safe to run multiple times. Because you will run it multiple times while debugging.

Phase-Based Installation

The bootstrap script runs in phases. Each phase completes fully before the next begins, and each phase can be re-run independently if needed.

This structure saved me countless hours. When something broke in Phase 4, I didn't have to start from scratch.

Phase 0: The Time Sync Disaster

Let me tell you about the most frustrating bug I've encountered in this entire project.

Everything would work perfectly after a fresh install. Then I'd close my laptop, come back the next day, resume the VM, and chaos. Pods failing. Certificate errors everywhere. DNS not resolving. ArgoCD unable to sync.

The culprit? Time drift.

Hyper-V VMs don't maintain accurate time when suspended. When you resume a VM that's been asleep for hours, the VM's clock can be significantly off. And Kubernetes really doesn't like that:

TLS certificates appear expired (or not yet valid)
Tokens fail validation
Let's Encrypt challenges time out
Istio's mTLS goes haywire

The fix is chrony, configured aggressively for VM environments:

setup_time_sync() {
    sudo apt-get install -y chrony

    # Configure for VM environment with aggressive correction
    sudo tee /etc/chrony/chrony.conf > /dev/null <<EOF
server time.google.com iburst
server time.cloudflare.com iburst
server pool.ntp.org iburst

# Allow instant time correction for any offset up to 1 day
makestep 86400 -1

# Log any time changes larger than 0.5 seconds
logchange 0.5
EOF

    sudo systemctl restart chrony
}

The key is makestep 86400 -1. This tells chrony to immediately step the clock (rather than gradually adjusting) for any offset up to 86400 seconds (24 hours), with no limit on how many times it can do this.

After adding this, resume-from-suspend just works. Clock jumps forward, chrony notices, fixes it immediately, and everything continues.

Reference ID    : A29FC801 (time.cloudflare.com)
Stratum         : 4
Ref time (UTC)  : Thu Jan 16 10:23:45 2026
System time     : 0.000000023 seconds fast of NTP time
Last offset     : +0.000000012 seconds
RMS offset      : 0.000000156 seconds
Frequency       : 1.234 ppm slow
Residual freq   : +0.000 ppm
Skew            : 0.012 ppm
Root delay      : 0.012345678 seconds
Root dispersion : 0.000123456 seconds
Update interval : 1024.0 seconds
Leap status     : Normal

Phase 2: K3s Installation

K3s is delightfully simple to install, but needs specific flags for our setup:

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server \
  --bind-address=${VM_IP} \
  --advertise-address=${VM_IP} \
  --disable=traefik \
  --flannel-backend=none \
  --disable-network-policy \
  --cluster-cidr=10.42.0.0/16 \
  --service-cidr=10.43.0.0/16 \
  --write-kubeconfig-mode=644" sh -

Why these flags:

--bind-address / --advertise-address: Bind to VM IP, not localhost, so WSL2 can reach it
--disable=traefik: We're using Istio Gateway, not Traefik
--flannel-backend=none: Disables K3s's default CNI - we're using Cilium
--disable-network-policy: Cilium handles network policies
--write-kubeconfig-mode=644: Makes kubeconfig readable without sudo

The script also updates the kubeconfig to use the VM's IP:

kubectl config set-cluster default --server=https://${VM_IP}:6443

Phase 3: Cilium Bootstrap

With K3s running but no CNI, pods are stuck in Pending. Time to install Cilium:

cilium install --version "1.18.1" \
    --set cluster.name="homelab" \
    --set cluster.id="1" \
    --set cni.exclusive=false \
    --set hubble.enabled=true \
    --set hubble.relay.enabled=true \
    --set hubble.ui.enabled=true

The cni.exclusive=false is important - it allows CNI chaining, which we need later when Istio's CNI joins the party.

After installation, the script waits for DNS to actually work:

verify_cilium_functionality() {
    # Wait for CoreDNS pods
    kubectl -n kube-system wait --for=condition=ready pod -l k8s-app=kube-dns --timeout=120s

    # Test actual DNS resolution
    for i in {1..30}; do
        if kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never \
            -- nslookup kubernetes.default.svc.cluster.local; then
            echo "DNS is working"
            return 0
        fi
        sleep 10
    done
    echo "DNS verification failed"
    return 1
}

This caught so many race conditions. CoreDNS pods can be "Ready" but not actually resolving queries yet.

Phase 4: CRDs and External Secrets

Before ArgoCD can deploy anything, we need:

Gateway API CRDs:

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.1.0/standard-install.yaml

External Secrets Operator (for pulling secrets from Infisical):

helm install external-secrets external-secrets/external-secrets \
    -n external-secrets --create-namespace \
    --set resources.requests.cpu=10m \
    --set resources.requests.memory=32Mi

I use Infisical as my secrets backend. The External Secrets Operator syncs secrets into Kubernetes automatically. No more committing secrets to git or manually creating them.

Phase 5: ArgoCD

Finally, ArgoCD:

helm install argocd argo/argo-cd \
    -n argocd --create-namespace \
    --set server.service.type=ClusterIP \
    --set configs.secret.argocdServerAdminPassword="$BCRYPT_PASSWORD" \
    --set controller.args.appResyncPeriod=60

The admin password comes from Infisical, fetched at the start of the bootstrap:

ARGOCD_PASSWORD=$(infisical secrets get ARGOCD_ADMIN_PASSWORD \
    --env=dev --projectId="${PROJECT_ID}" --plain)

The Full Flow

Here's what running the bootstrap looks like:

$ ./bootstrap.sh

Phase 0: System Prerequisites
  ✓ Installing chrony for time synchronisation
  ✓ Time sync configured and verified

Phase 1: Validation & Setup
  ✓ Infisical credentials validated
  ✓ ArgoCD password retrieved
  ✓ VM IP detected: 192.168.100.2
  ✓ External connectivity verified

Phase 2: K3s Installation
  ✓ Cleaning up any existing K3s installation
  ✓ Installing K3s (no CNI)
  ✓ Kubeconfig configured for external access

Phase 3: Cilium CNI
  ✓ Installing Cilium v1.18.1
  ✓ Waiting for Cilium to be ready
  ✓ DNS resolution verified

Phase 4: CRDs & Operators
  ✓ Gateway API CRDs installed
  ✓ External Secrets Operator deployed

Phase 5: ArgoCD
  ✓ ArgoCD installed
  ✓ ArgoCD server ready

Bootstrap complete!

Configuration Management

All the cluster-specific values live in a config.env file:

CLUSTER_NAME="homelab"
CLUSTER_ID="1"
CLUSTER_CIDR="10.42.0.0/16"
SERVICE_CIDR="10.43.0.0/16"
CILIUM_VERSION="1.18.1"

The bootstrap script sources this and uses the values throughout. Makes it easy to spin up a second cluster with different settings.

Lessons Learned

Time sync is critical - Add it to Phase 0 and never think about it again
Phase-based scripts save sanity - Isolate failures, enable partial re-runs
DNS verification is not optional - Don't assume CoreDNS is ready just because pods are running
Bind to real IPs - Localhost doesn't cut it when you're accessing from WSL2
Idempotency matters - You will run the script many times

What's Next

The cluster is up, but it's empty. In Part 3, I'll cover how ArgoCD and the app-of-apps pattern deploys everything else - Istio, cert-manager, monitoring, and all the applications.

This is Part 2 of a 4-part series on building a homelab Kubernetes setup on Windows.

Originally published at https://wsl-ui.octasoft.co.uk/blog/homelab-part-2-bootstrap