Boriss V

Posted on Mar 25

Hybrid k8s cluster | Talos & Kubespan | Kilo wireguard

#kubernetes #networking #cloud #proxmox

I was interested in trying out a hybrid Kubernetes setup and exploring some use cases.

This setup involves creating three control plane (master) nodes in the cloud (AWS) and booting Talos worker nodes on-premises, then connecting them to the master nodes in the cloud.

When it comes to on-premises worker nodes, you have the flexibility to choose any solution that fits your needs. Whether you're using a hypervisor or even local QEMU virtual machines, the choice is entirely up to you. In this guide, I will be using Proxmox because it makes things easier for me.

Repository: https://github.com/kubebn/aws-talos-terraform-hybrid

AWS master nodes provisioning

Setup vars in vars/dev.tfvars
Run Terraform

terraform apply -var-file=vars/dev.tfvars -auto-approve

Output:

local_file.kubeconfig: Creating...
local_file.kubeconfig: Creation complete after 0s [id=d3443f0dfed1dbbf0e71f99dfbf0684dc1ca8b95]

Apply complete! Resources: 25 added, 0 changed, 0 destroyed.

Outputs:

control_plane_private_ips = tolist([
  "192.168.1.135",
  "192.168.2.122",
  "192.168.0.157",
])
control_plane_public_ips = tolist([
  "18.184.164.166",
  "3.122.238.249",
  "3.77.57.73",
])

Install talosctl and kubectl:

curl -sL https://talos.dev/install | sh
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

Apply kubeconfig and talosconfig files. These are generated in the same folder where the terraform apply command was executed.

export TALOSCONFIG="${PWD}/talosconfig"
export KUBECONFIG=${PWD}/kubeconfig

After that, you will see that the master nodes are ready, Kubespan is up, and Cilium is fully installed:

kubectl get node -o wide
NAME                 STATUS   ROLES           AGE     VERSION   INTERNAL-IP      EXTERNAL-IP        OS-IMAGE         KERNEL-VERSION   CONTAINER-RUNTIME
aws-controlplane-1   Ready    control-plane   2m25s   v1.32.3   192.168.1.135    18.184.164.166     Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3
aws-controlplane-2   Ready    control-plane   2m36s   v1.32.3   192.168.2.122    3.122.238.249      Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3
aws-controlplane-3   Ready    control-plane   2m24s   v1.32.3   192.168.0.157    3.77.57.73         Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3

kubectl get po -A
NAMESPACE     NAME                                              READY   STATUS      RESTARTS        AGE
kube-system   cilium-599jz                                      1/1     Running     0               80s
kube-system   cilium-5j6wl                                      1/1     Running     0               80s
kube-system   cilium-fkkwv                                      1/1     Running     0               80s
kube-system   cilium-install-tkfrf                              0/1     Completed   0               111s
kube-system   cilium-operator-657bdd678b-lxblc                  1/1     Running     0               80s
kube-system   coredns-578d4f8ffc-5lqfm                          1/1     Running     0               111s
kube-system   coredns-578d4f8ffc-n4hwz                          1/1     Running     0               111s
kube-system   kube-apiserver-aws-controlplane-1                 1/1     Running     0               79s
kube-system   kube-apiserver-aws-controlplane-2                 1/1     Running     0               107s
kube-system   kube-apiserver-aws-controlplane-3                 1/1     Running     0               80s
kube-system   kube-controller-manager-aws-controlplane-1        1/1     Running     2 (2m11s ago)   79s
kube-system   kube-controller-manager-aws-controlplane-2        1/1     Running     0               107s
kube-system   kube-controller-manager-aws-controlplane-3        1/1     Running     0               80s
kube-system   kube-scheduler-aws-controlplane-1                 1/1     Running     2 (2m11s ago)   79s
kube-system   kube-scheduler-aws-controlplane-2                 1/1     Running     0               107s
kube-system   kube-scheduler-aws-controlplane-3                 1/1     Running     0               80s
kube-system   talos-cloud-controller-manager-599fddb46d-9mmdk   1/1     Running     0               111s

Important Notes for Talos Machine Configuration on Master Nodes

We want to filter out the AWS private VPC from Kubespan, as on-premise workers won't be aware of it anyway.

        kubespan:
            enabled: true
            filters:
                endpoints:
                - 0.0.0.0/0
                - '!${vpc_subnet}'

Although, we have configured both kubelet and etcd to use the internal subnet.

    kubelet:
        nodeIP:
          validSubnets:
            - ${vpc_subnet}
    etcd:
        extraArgs:
          election-timeout: "5000"
          heartbeat-interval: "1000"
        advertisedSubnets:
          - ${vpc_subnet}

We have added the kubelet extraArgs for certificate rotation, and Talos CCM will handle that. You can find more details here.

    kubelet:
        defaultRuntimeSeccompProfileEnabled: true
        registerWithFQDN: true
        extraArgs:
          cloud-provider: external
          rotate-server-certificates: true
...
    externalCloudProvider:
      enabled: true
      manifests:
        - https://raw.githubusercontent.com/siderolabs/talos-cloud-controller-manager/main/docs/deploy/cloud-controller-manager.yml

DHCP/TFTP configuration in Proxmox

We have set up networking for virtual machines and LXC containers:

LAN - 10.1.1.0/24
Proxmox node - 10.1.1.1
DHCP/tftp LXC container - 10.1.1.2
Install curl and docker
Download vmlinuz and initramfs.xz from Talos release repository.
Copy matchbox contents into lxc container

pct create 1000 local:vztmpl/ubuntu-24.10-standard_24.10-1_amd64.tar.zst \
  --hostname net-lxc \
  --net0 name=eth0,bridge=vmbr0,ip=10.1.1.2/24,gw=10.1.1.1 \
  --nameserver 1.1.1.1 \
  --features keyctl=1,nesting=1 \
  --storage local-lvm \
  --rootfs local:8 \
  --ssh-public-keys .ssh/id_rsa.pub \
  --unprivileged=true

pct start 1000

ssh root@10.1.1.2

apt update && apt install curl -y
curl -s https://get.docker.com | sudo bash


curl -L  https://github.com/siderolabs/talos/releases/download/v1.9.5/initramfs-amd64.xz -o initramfs-amd64.xz
curl -L  https://github.com/siderolabs/talos/releases/download/v1.9.5/vmlinuz-amd64 -o vmlinuz-amd64

Configure DHCP range, gateway, matchbox endpoint ip address in matchbox/docker-compose.yaml file.

Start DHCP/TFTP server via docker compose:

docker compose up -d
---

docker ps
CONTAINER ID   IMAGE                                               COMMAND                  CREATED       STATUS       PORTS     NAMES
9f04e46194bf   quay.io/poseidon/dnsmasq:v0.5.0-32-g4327d60-amd64   "/usr/sbin/dnsmasq -…"   2 hours ago   Up 2 hours             dnsmasq
5db45718aa0a   root-matchbox                                       "/matchbox -address=…"   2 hours ago   Up 2 hours             matchbox

Connecting on-premise workers

Talos machine configuration for workers is generated by terraform, look for worker.yaml. We are going to apply this to each worker in Proxmox.

Create VMs

for id in {1001..1003}; do
  qm create $id --name vm$id --memory 12088 --cores 3 --net0 virtio,bridge=vmbr0 --ostype l26 --scsihw virtio-scsi-pci --sata0 lvm1:32 --cpu host && qm start $id
done

Scan for Talos API open ports and extract IP addresses

WORKER_IPS=$(nmap -Pn -n -p 50000 10.1.1.0/24 -vv | grep 'Discovered' | awk '{print $6}')

Apply configuration to each discovered IP

echo "$WORKER_IPS" | while read -r WORKER_IP; do
  talosctl apply-config --insecure --nodes $WORKER_IP --file worker.yaml
done

Let’s take a look at the result

kubectl get node -o wide
NAME                 STATUS   ROLES           AGE     VERSION   INTERNAL-IP      EXTERNAL-IP        OS-IMAGE         KERNEL-VERSION   CONTAINER-RUNTIME
aws-controlplane-1   Ready    control-plane   2m25s   v1.32.3   192.168.1.135    18.184.164.166     Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3
aws-controlplane-2   Ready    control-plane   2m36s   v1.32.3   192.168.2.122    3.122.238.249      Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3
aws-controlplane-3   Ready    control-plane   2m24s   v1.32.3   192.168.0.157    3.77.57.73         Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3
talos-9mf-ujc        Ready    <none>          29s     v1.32.3   10.1.1.24        <none>             Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3
talos-lv8-bc7        Ready    <none>          29s     v1.32.3   10.1.1.10        <none>             Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3
talos-ohr-1c3        Ready    <none>          29s     v1.32.3   10.1.1.23        <none>             Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3

talosctl get kubespanpeerstatuses -n 192.168.1.135,192.168.2.122,192.168.0.157,10.1.1.24,10.1.1.19,10.1.1.12
NODE            NAMESPACE   TYPE                 ID                                             VERSION   LABEL                ENDPOINT                STATE   RX         TX
192.168.1.135   kubespan    KubeSpanPeerStatus   Hh2ldeBX7kuct6Bynehjdgo6xkcOlZ4yUWWQVBqHLWQ=   11        talos-fgw-qpv        proxmo-publicip:53639   up      84312      69860
192.168.1.135   kubespan    KubeSpanPeerStatus   If8ZqCQ1jp0mFV8igLGpfXpYycQSVBTSlT88YUr7eEM=   11        talos-4re-kxo        proxmo-publicip:53923   up      82160      99412
192.168.1.135   kubespan    KubeSpanPeerStatus   iKXIODHF3Tx2b4JsA433j8ey9+CWTKrx5To+4lsofTg=   11        talos-7ye-8j0        proxmo-publicip:51820   up      43608      58280
192.168.1.135   kubespan    KubeSpanPeerStatus   oxU0e9yTFvN+lCGcIO4s13erkWjtIKrVzb8dX+GYLxE=   26        aws-controlplane-3   3.77.57.73:51820        up      7484600    16097980
192.168.1.135   kubespan    KubeSpanPeerStatus   vAHCI1pwTbaP/LHwO0MnCGELXsEstQahS0o9WkdVK0g=   26        aws-controlplane-2   3.122.238.249:51820     up      6742536    15472144
192.168.2.122   kubespan    KubeSpanPeerStatus   Hh2ldeBX7kuct6Bynehjdgo6xkcOlZ4yUWWQVBqHLWQ=   11        talos-fgw-qpv        proxmo-publicip:53639   up      178960     354776
192.168.2.122   kubespan    KubeSpanPeerStatus   If8ZqCQ1jp0mFV8igLGpfXpYycQSVBTSlT88YUr7eEM=   11        talos-4re-kxo        proxmo-publicip:53923   up      232064     897928
192.168.2.122   kubespan    KubeSpanPeerStatus   iKXIODHF3Tx2b4JsA433j8ey9+CWTKrx5To+4lsofTg=   11        talos-7ye-8j0        proxmo-publicip:51820   up      92856      254520
192.168.2.122   kubespan    KubeSpanPeerStatus   oxU0e9yTFvN+lCGcIO4s13erkWjtIKrVzb8dX+GYLxE=   23        aws-controlplane-3   3.77.57.73:51820        up      1557896    1954328
192.168.2.122   kubespan    KubeSpanPeerStatus   uzi7NCL64o+ILeyqa6/Pq0UWdcVyfjWulZQB+a2Av30=   27        aws-controlplane-1   18.184.164.166:51820    up      15506236   6773440
192.168.0.157   kubespan    KubeSpanPeerStatus   Hh2ldeBX7kuct6Bynehjdgo6xkcOlZ4yUWWQVBqHLWQ=   11        talos-fgw-qpv        proxmo-publicip:53639   up      261464     916524
192.168.0.157   kubespan    KubeSpanPeerStatus   If8ZqCQ1jp0mFV8igLGpfXpYycQSVBTSlT88YUr7eEM=   11        talos-4re-kxo        proxmo-publicip:53923   up      141868     277180
192.168.0.157   kubespan    KubeSpanPeerStatus   iKXIODHF3Tx2b4JsA433j8ey9+CWTKrx5To+4lsofTg=   11        talos-7ye-8j0        proxmo-publicip:51820   up      172996     830504
192.168.0.157   kubespan    KubeSpanPeerStatus   uzi7NCL64o+ILeyqa6/Pq0UWdcVyfjWulZQB+a2Av30=   25        aws-controlplane-1   18.184.164.166:51820    up      16126096   7507456
192.168.0.157   kubespan    KubeSpanPeerStatus   vAHCI1pwTbaP/LHwO0MnCGELXsEstQahS0o9WkdVK0g=   24        aws-controlplane-2   3.122.238.249:51820     up      1954180    1557896
10.1.1.24       kubespan    KubeSpanPeerStatus   Hh2ldeBX7kuct6Bynehjdgo6xkcOlZ4yUWWQVBqHLWQ=   10        talos-fgw-qpv        10.1.1.12:51820         up      12204      11960
10.1.1.24       kubespan    KubeSpanPeerStatus   iKXIODHF3Tx2b4JsA433j8ey9+CWTKrx5To+4lsofTg=   10        talos-7ye-8j0        10.1.1.19:51820         up      16316      15624
10.1.1.24       kubespan    KubeSpanPeerStatus   oxU0e9yTFvN+lCGcIO4s13erkWjtIKrVzb8dX+GYLxE=   10        aws-controlplane-3   3.77.57.73:51820        up      278988     143592
10.1.1.24       kubespan    KubeSpanPeerStatus   uzi7NCL64o+ILeyqa6/Pq0UWdcVyfjWulZQB+a2Av30=   10        aws-controlplane-1   18.184.164.166:51820    up      100932     84220
10.1.1.24       kubespan    KubeSpanPeerStatus   vAHCI1pwTbaP/LHwO0MnCGELXsEstQahS0o9WkdVK0g=   10        aws-controlplane-2   3.122.238.249:51820     up      897880     231472
10.1.1.19       kubespan    KubeSpanPeerStatus   Hh2ldeBX7kuct6Bynehjdgo6xkcOlZ4yUWWQVBqHLWQ=   10        talos-fgw-qpv        10.1.1.12:51820         up      12692      12732
10.1.1.19       kubespan    KubeSpanPeerStatus   If8ZqCQ1jp0mFV8igLGpfXpYycQSVBTSlT88YUr7eEM=   10        talos-4re-kxo        10.1.1.24:51820         up      15476      16316
10.1.1.19       kubespan    KubeSpanPeerStatus   oxU0e9yTFvN+lCGcIO4s13erkWjtIKrVzb8dX+GYLxE=   10        aws-controlplane-3   3.77.57.73:51820        up      832688     176132
10.1.1.19       kubespan    KubeSpanPeerStatus   uzi7NCL64o+ILeyqa6/Pq0UWdcVyfjWulZQB+a2Av30=   10        aws-controlplane-1   18.184.164.166:51820    up      59680      45224
10.1.1.19       kubespan    KubeSpanPeerStatus   vAHCI1pwTbaP/LHwO0MnCGELXsEstQahS0o9WkdVK0g=   10        aws-controlplane-2   3.122.238.249:51820     up      255472     94296
10.1.1.12       kubespan    KubeSpanPeerStatus   If8ZqCQ1jp0mFV8igLGpfXpYycQSVBTSlT88YUr7eEM=   10        talos-4re-kxo        10.1.1.24:51820         up      11812      12204
10.1.1.12       kubespan    KubeSpanPeerStatus   iKXIODHF3Tx2b4JsA433j8ey9+CWTKrx5To+4lsofTg=   10        talos-7ye-8j0        10.1.1.19:51820         up      12732      12840
10.1.1.12       kubespan    KubeSpanPeerStatus   oxU0e9yTFvN+lCGcIO4s13erkWjtIKrVzb8dX+GYLxE=   10        aws-controlplane-3   3.77.57.73:51820        up      920660     265640
10.1.1.12       kubespan    KubeSpanPeerStatus   uzi7NCL64o+ILeyqa6/Pq0UWdcVyfjWulZQB+a2Av30=   10        aws-controlplane-1   18.184.164.166:51820    up      67776      82092
10.1.1.12       kubespan    KubeSpanPeerStatus   vAHCI1pwTbaP/LHwO0MnCGELXsEstQahS0o9WkdVK0g=   10        aws-controlplane-2   3.122.238.249:51820     up      355552     180240

Let’s check if the Kubernetes networking is functioning correctly:

kubectl create ns network-test
kubectl label ns network-test pod-security.kubernetes.io/enforce=privileged
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/refs/heads/main/examples/kubernetes/connectivity-check/connectivity-check.yaml -n network-test

kubectl get po -n network-test
NAME                                                     READY   STATUS    RESTARTS   AGE
echo-a-54dcdd77c-6wjnw                                   1/1     Running   0          62s
echo-b-549fdb8f8c-j4sw7                                  1/1     Running   0          61s
echo-b-host-7cfdb688b7-ppz9f                             1/1     Running   0          61s
host-to-b-multi-node-clusterip-c54bf67bf-hhm5h           1/1     Running   0          60s
host-to-b-multi-node-headless-55f66fc4c7-f8fc4           1/1     Running   0          60s
pod-to-a-5f56dc8c9b-kk6c2                                1/1     Running   0          61s
pod-to-a-allowed-cnp-5dc859fd98-pvxzj                    1/1     Running   0          61s
pod-to-a-denied-cnp-68976d7584-wm52m                     1/1     Running   0          61s
pod-to-b-intra-node-nodeport-5884978697-c5rs2            1/1     Running   0          60s
pod-to-b-multi-node-clusterip-7d65578cf5-2jh97           1/1     Running   0          61s
pod-to-b-multi-node-headless-8557d86d6f-shvzx            1/1     Running   0          61s
pod-to-b-multi-node-nodeport-7847b5df8f-9kg89            1/1     Running   0          60s
pod-to-external-1111-797c647566-666l4                    1/1     Running   0          61s
pod-to-external-fqdn-allow-google-cnp-5688c867dd-dkvpk   0/1     Running   0          61s # can be ignored

Kubernetes Network Benchmark

We will use k8s-bench-suite to benchmark networking performance between the nodes.

From the control plane to worker:

kubectl taint nodes aws-controlplane-1 node-role.kubernetes.io/control-plane:NoSchedule-

knb --verbose --client-node aws-controlplane-1 --server-node talos-4re-kxo

=========================================================
 Benchmark Results
=========================================================
 Name            : knb-3420914
 Date            : 2025-03-18 04:14:12 UTC
 Generator       : knb
 Version         : 1.5.0
 Server          : talos-4re-kxo
 Client          : aws-controlplane-1
 UDP Socket size : auto
=========================================================
  Discovered CPU         : Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
  Discovered Kernel      : 6.12.18-talos
  Discovered k8s version :
  Discovered MTU         : 1420
  Idle :
    bandwidth = 0 Mbit/s
    client cpu = total 13.45% (user 3.81%, nice 0.00%, system 2.92%, iowait 0.53%, steal 6.19%)
    server cpu = total 3.28% (user 1.75%, nice 0.00%, system 1.53%, iowait 0.00%, steal 0.00%)
    client ram = 985 MB
    server ram = 673 MB
  Pod to pod :
    TCP :
      bandwidth = 887 Mbit/s
      client cpu = total 75.29% (user 4.07%, nice 0.00%, system 57.20%, iowait 0.29%, steal 13.73%)
      server cpu = total 36.16% (user 2.06%, nice 0.00%, system 34.10%, iowait 0.00%, steal 0.00%)
      client ram = 995 MB
      server ram = 697 MB
    UDP :
      bandwidth = 350 Mbit/s
      client cpu = total 79.95% (user 6.68%, nice 0.00%, system 54.20%, iowait 0.22%, steal 18.85%)
      server cpu = total 20.95% (user 2.61%, nice 0.00%, system 18.34%, iowait 0.00%, steal 0.00%)
      client ram = 1000 MB
      server ram = 653 MB
  Pod to Service :
    TCP :
      bandwidth = 1063 Mbit/s
      client cpu = total 80.58% (user 3.89%, nice 0.00%, system 68.36%, iowait 0.05%, steal 8.28%)
      server cpu = total 42.44% (user 2.26%, nice 0.00%, system 40.18%, iowait 0.00%, steal 0.00%)
      client ram = 1008 MB
      server ram = 696 MB
    UDP :
      bandwidth = 322 Mbit/s
      client cpu = total 78.57% (user 6.24%, nice 0.00%, system 57.02%, iowait 0.18%, steal 15.13%)
      server cpu = total 21.02% (user 2.54%, nice 0.00%, system 18.48%, iowait 0.00%, steal 0.00%)
      client ram = 995 MB
      server ram = 668 MB
=========================================================

From worker to worker locally:

knb --verbose --client-node talos-7ye-8j0 --server-node talos-4re-kxo

=========================================================
 Benchmark Results
=========================================================
 Name            : knb-3423407
 Date            : 2025-03-18 04:16:29 UTC
 Generator       : knb
 Version         : 1.5.0
 Server          : talos-4re-kxo
 Client          : talos-7ye-8j0
 UDP Socket size : auto
=========================================================
  Discovered CPU         : Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
  Discovered Kernel      : 6.12.18-talos
  Discovered k8s version :
  Discovered MTU         : 1420
  Idle :
    bandwidth = 0 Mbit/s
    client cpu = total 2.97% (user 1.50%, nice 0.00%, system 1.47%, iowait 0.00%, steal 0.00%)
    server cpu = total 3.34% (user 1.72%, nice 0.00%, system 1.62%, iowait 0.00%, steal 0.00%)
    client ram = 552 MB
    server ram = 688 MB
  Pod to pod :
    TCP :
      bandwidth = 1868 Mbit/s
      client cpu = total 61.44% (user 2.59%, nice 0.00%, system 58.82%, iowait 0.03%, steal 0.00%)
      server cpu = total 70.40% (user 2.82%, nice 0.00%, system 67.58%, iowait 0.00%, steal 0.00%)
      client ram = 546 MB
      server ram = 801 MB
    UDP :
      bandwidth = 912 Mbit/s
      client cpu = total 64.14% (user 3.75%, nice 0.06%, system 60.30%, iowait 0.03%, steal 0.00%)
      server cpu = total 52.89% (user 4.29%, nice 0.00%, system 48.60%, iowait 0.00%, steal 0.00%)
      client ram = 556 MB
      server ram = 679 MB
  Pod to Service :
    TCP :
      bandwidth = 1907 Mbit/s
      client cpu = total 58.43% (user 2.15%, nice 0.00%, system 56.25%, iowait 0.00%, steal 0.03%)
      server cpu = total 61.83% (user 2.35%, nice 0.03%, system 59.45%, iowait 0.00%, steal 0.00%)
      client ram = 547 MB
      server ram = 813 MB
    UDP :
      bandwidth = 887 Mbit/s
      client cpu = total 57.50% (user 3.88%, nice 0.00%, system 53.62%, iowait 0.00%, steal 0.00%)
      server cpu = total 52.33% (user 4.18%, nice 0.03%, system 48.12%, iowait 0.00%, steal 0.00%)
      client ram = 556 MB
      server ram = 685 MB
=========================================================

Interestingly, all our traffic is actually routed through the Kubespan/WireGuard tunnel. For comparison, I created a new local cluster without Kubespan, and the results were different:

k get node -o wide
NAME            STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION   CONTAINER-RUNTIME
talos-pft-tax   Ready    control-plane   52s   v1.32.0   10.1.1.17     <none>        Talos (v1.9.1)   6.12.6-talos     containerd://2.0.1
talos-tfy-nig   Ready    <none>          49s   v1.32.0   10.1.1.16     <none>        Talos (v1.9.1)   6.12.6-talos     containerd://2.0.1

knb --verbose --client-node talos-pft-tax --server-node talos-tfy-nig

=========================================================
 Benchmark Results
=========================================================
 Name            : knb-3432162
 Date            : 2025-03-18 04:32:21 UTC
 Generator       : knb
 Version         : 1.5.0
 Server          : talos-tfy-nig
 Client          : talos-pft-tax
 UDP Socket size : auto
=========================================================
  Discovered CPU         : QEMU Virtual CPU version 2.5+
  Discovered Kernel      : 6.12.6-talos
  Discovered k8s version :
  Discovered MTU         : 1450
  Idle :
    bandwidth = 0 Mbit/s
    client cpu = total 3.56% (user 1.85%, nice 0.00%, system 1.62%, iowait 0.09%, steal 0.00%)
    server cpu = total 1.41% (user 0.59%, nice 0.00%, system 0.82%, iowait 0.00%, steal 0.00%)
    client ram = 775 MB
    server ram = 401 MB
  Pod to pod :
    TCP :
      bandwidth = 6276 Mbit/s
      client cpu = total 23.32% (user 2.97%, nice 0.00%, system 20.24%, iowait 0.11%, steal 0.00%)
      server cpu = total 25.49% (user 2.13%, nice 0.00%, system 23.36%, iowait 0.00%, steal 0.00%)
      client ram = 709 MB
      server ram = 391 MB
    UDP :
      bandwidth = 861 Mbit/s
      client cpu = total 55.67% (user 6.55%, nice 0.00%, system 49.02%, iowait 0.05%, steal 0.05%)
      server cpu = total 46.81% (user 7.85%, nice 0.00%, system 38.96%, iowait 0.00%, steal 0.00%)
      client ram = 711 MB
      server ram = 382 MB
  Pod to Service :
    TCP :
      bandwidth = 6326 Mbit/s
      client cpu = total 24.26% (user 4.05%, nice 0.00%, system 20.11%, iowait 0.10%, steal 0.00%)
      server cpu = total 25.99% (user 2.22%, nice 0.00%, system 23.77%, iowait 0.00%, steal 0.00%)
      client ram = 693 MB
      server ram = 333 MB
    UDP :
      bandwidth = 877 Mbit/s
      client cpu = total 52.58% (user 7.26%, nice 0.00%, system 45.27%, iowait 0.05%, steal 0.00%)
      server cpu = total 46.81% (user 7.76%, nice 0.00%, system 39.05%, iowait 0.00%, steal 0.00%)
      client ram = 705 MB
      server ram = 334 MB
=========================================================

Although Kubespan works well out of the box, it does not yet support meshed topologies, where you need to control how traffic is routed for specific node pools.

There is already an open issue for this, which you can find here.

Kilo

A solution that supports meshed logical topologies - Kilo. It enables you to manage traffic between nodes in multiple datacenters while keeping native networking intact within each datacenter, for intra-datacenter communication.

The downside is that Kilo requires some customizations, which means additional logic and automation would need to be applied.

Apply terraform using Talos machine configuration for Kilo

The kilo-controlplane.tpl file has Kubespan disabled and kube-proxy enabled. For the CNI, we deploy Kilo, and we also add a CustomResourceDefinition for peers.kilo.squat.ai in the inlineManifests.

Update the paths in the talos.tf file by changing controlplane.tpl and worker.tpl to kilo-controlplane.tpl and kilo-worker.tpl, respectively.

We follow the same process for spinning up the nodes in both AWS and Proxmox.

Once the masters are ready, Kilo will fail because it can't find the kubeconfig access. This issue arises because we are using Talos instead of plain kubeadm. To fix this:

kubectl create configmap kube-proxy --from-file=kubeconfig.conf=kubeconfig -n kube-system

kubectl get node -o wide
NAME                 STATUS   ROLES           AGE     VERSION   INTERNAL-IP    EXTERNAL-IP      OS-IMAGE         KERNEL-VERSION   CONTAINER-RUNTIME
aws-controlplane-1   Ready    control-plane   15m     v1.32.3   192.168.1.55   3.73.119.119     Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3
aws-controlplane-2   Ready    control-plane   16m     v1.32.3   192.168.0.75   18.195.244.87    Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3
aws-controlplane-3   Ready    control-plane   16m     v1.32.3   192.168.2.68   18.185.241.183   Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3
talos-8vs-lte        Ready    <none>          2m20s   v1.32.3   10.1.1.22      <none>           Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3
talos-8wf-r6g        Ready    <none>          2m19s   v1.32.3   10.1.1.21      <none>           Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3
talos-ub5-bc2        Ready    <none>          2m14s   v1.32.3   10.1.1.23      <none>           Talos (v1.9.5)   6.12.18-talos    containerd://2.0.3

...
kube-system   kilo-286hz  1/1     Running            0             6m46s   10.1.1.22      talos-8vs-lte              <none>           <none>
kube-system   kilo-4h8kn  1/1     Running            0             12m     10.1.1.21      talos-8wf-r6g              <none>           <none>
kube-system   kilo-gcclp  1/1     Running            0             13m     192.168.2.68   aws-controlplane-3         <none>           <none>
kube-system   kilo-rq2sl  1/1     Running            0             4m17s   192.168.1.55   aws-controlplane-1         <none>           <none>

Next, we need to specify the topology, set logical locations, and ensure that at least one node in each location has an IP address that is routable from the other locations.

For aws control planes (location):

for node in $(kubectl get nodes | grep -i aws | awk '{print $1}'); do kubectl annotate node $node kilo.squat.ai/location="aws"; done

For workers (location):

kubectl annotate node talos-8vs-lte talos-8wf-r6g talos-ub5-bc2 kilo.squat.ai/location="on-prem"

Endpoint for each location:

kubectl annotate node talos-8vs-lte kilo.squat.ai/force-endpoint="proxmox-public-ip:51820"

kubectl annotate node aws-controlplane-1 kilo.squat.ai/force-endpoint="3.73.119.119:51820"

Rolling out and checking the network again:

kubectl rollout restart ds/kilo -n kube-system

kubectl get po -n network-test
NAME                                                     READY   STATUS    RESTARTS   AGE
echo-a-54dcdd77c-psgqb                                   1/1     Running   0          34s
echo-b-549fdb8f8c-5pjbk                                  1/1     Running   0          34s
echo-b-host-7cfdb688b7-zff5b                             1/1     Running   0          34s
host-to-b-multi-node-clusterip-c54bf67bf-f7d6c           1/1     Running   0          33s
host-to-b-multi-node-headless-55f66fc4c7-kd2wv           1/1     Running   0          33s
pod-to-a-5f56dc8c9b-64k7v                                1/1     Running   0          34s
pod-to-a-allowed-cnp-5dc859fd98-684lh                    1/1     Running   0          34s
pod-to-a-denied-cnp-68976d7584-6w6fn                     1/1     Running   0          34s
pod-to-b-intra-node-nodeport-5884978697-ddtz4            1/1     Running   0          32s
pod-to-b-multi-node-clusterip-7d65578cf5-rx9ws           1/1     Running   0          33s
pod-to-b-multi-node-headless-8557d86d6f-8dqr8            1/1     Running   0          33s
pod-to-b-multi-node-nodeport-7847b5df8f-zjlzt            1/1     Running   0          33s
pod-to-external-1111-797c647566-qnc22                    1/1     Running   0          34s
pod-to-external-fqdn-allow-google-cnp-5688c867dd-9btvt   1/1     Running   0          33s

Kilo knb benchmark

Let’s test the network performance by running the same test between the master and worker nodes, as well as between worker nodes.

Master to worker:

kubectl taint nodes aws-controlplane-1 node-role.kubernetes.io/control-plane:NoSchedule-
knb --verbose --client-node aws-controlplane-1 --server-node talos-8vs-lte

=========================================================
 Benchmark Results
=========================================================
 Name            : knb-3461466
 Date            : 2025-03-18 05:25:08 UTC
 Generator       : knb
 Version         : 1.5.0
 Server          : talos-8vs-lte
 Client          : aws-controlplane-1
 UDP Socket size : auto
=========================================================
  Discovered CPU         : Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
  Discovered Kernel      : 6.12.18-talos
  Discovered k8s version :
  Discovered MTU         : 1420
  Idle :
    bandwidth = 0 Mbit/s
    client cpu = total 15.67% (user 4.70%, nice 0.00%, system 2.74%, iowait 0.44%, steal 7.79%)
    server cpu = total 2.82% (user 1.41%, nice 0.00%, system 1.41%, iowait 0.00%, steal 0.00%)
    client ram = 799 MB
    server ram = 417 MB
  Pod to pod :
    TCP :
      bandwidth = 942 Mbit/s
      client cpu = total 61.10% (user 5.12%, nice 0.00%, system 36.35%, iowait 0.39%, steal 19.24%)
      server cpu = total 22.78% (user 1.35%, nice 0.00%, system 21.43%, iowait 0.00%, steal 0.00%)
      client ram = 787 MB
      server ram = 479 MB
    UDP :
      bandwidth = 448 Mbit/s
      client cpu = total 75.97% (user 6.76%, nice 0.00%, system 49.81%, iowait 0.28%, steal 19.12%)
      server cpu = total 18.07% (user 2.96%, nice 0.00%, system 15.11%, iowait 0.00%, steal 0.00%)
      client ram = 798 MB
      server ram = 391 MB
  Pod to Service :
    TCP :
      bandwidth = 1253 Mbit/s
      client cpu = total 69.80% (user 3.58%, nice 0.00%, system 45.60%, iowait 0.36%, steal 20.26%)
      server cpu = total 30.59% (user 1.88%, nice 0.00%, system 28.71%, iowait 0.00%, steal 0.00%)
      client ram = 787 MB
      server ram = 592 MB
    UDP :
      bandwidth = 478 Mbit/s
      client cpu = total 80.83% (user 6.97%, nice 0.00%, system 53.98%, iowait 0.25%, steal 19.63%)
      server cpu = total 18.77% (user 2.70%, nice 0.03%, system 16.04%, iowait 0.00%, steal 0.00%)
      client ram = 795 MB
      server ram = 391 MB
=========================================================

Worker to worker:

knb --verbose --client-node talos-8vs-lte --server-node talos-8wf-r6g

=========================================================
 Benchmark Results
=========================================================
 Name            : knb-3467118
 Date            : 2025-03-18 05:27:23 UTC
 Generator       : knb
 Version         : 1.5.0
 Server          : talos-8wf-r6g
 Client          : talos-8vs-lte
 UDP Socket size : auto
=========================================================
  Discovered CPU         : Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
  Discovered Kernel      : 6.12.18-talos
  Discovered k8s version :
  Discovered MTU         : 1420
  Idle :
    bandwidth = 0 Mbit/s
    client cpu = total 2.57% (user 1.29%, nice 0.00%, system 1.28%, iowait 0.00%, steal 0.00%)
    server cpu = total 2.05% (user 1.10%, nice 0.00%, system 0.95%, iowait 0.00%, steal 0.00%)
    client ram = 416 MB
    server ram = 521 MB
  Pod to pod :
    TCP :
      bandwidth = 8409 Mbit/s
      client cpu = total 17.34% (user 1.85%, nice 0.00%, system 15.49%, iowait 0.00%, steal 0.00%)
      server cpu = total 22.70% (user 1.96%, nice 0.00%, system 20.74%, iowait 0.00%, steal 0.00%)
      client ram = 398 MB
      server ram = 508 MB
    UDP :
      bandwidth = 1403 Mbit/s
      client cpu = total 36.43% (user 3.27%, nice 0.00%, system 33.16%, iowait 0.00%, steal 0.00%)
      server cpu = total 35.20% (user 5.36%, nice 0.00%, system 29.84%, iowait 0.00%, steal 0.00%)
      client ram = 405 MB
      server ram = 542 MB
  Pod to Service :
    TCP :
      bandwidth = 8366 Mbit/s
      client cpu = total 21.10% (user 1.64%, nice 0.04%, system 19.42%, iowait 0.00%, steal 0.00%)
      server cpu = total 22.15% (user 1.85%, nice 0.00%, system 20.30%, iowait 0.00%, steal 0.00%)
      client ram = 398 MB
      server ram = 515 MB
    UDP :
      bandwidth = 1349 Mbit/s
      client cpu = total 36.52% (user 3.21%, nice 0.00%, system 33.31%, iowait 0.00%, steal 0.00%)
      server cpu = total 34.03% (user 5.38%, nice 0.00%, system 28.65%, iowait 0.00%, steal 0.00%)
      client ram = 414 MB
      server ram = 535 MB
=========================================================

Conclusion

We can see that the nodes are aware of each other and use the internal connection based on their logical location, as demonstrated in the diagram below Generated using this :

To conclude, since it's not possible to set annotations in advance for the nodes, automating the process completely when Talos/Kubeadm is bootstrapped could be problematic. This would still require an additional layer to set up the mesh. From this perspective, Kubespan is much easier to implement, but it currently doesn't support logical separation.

Delete cluster

# Delete VMs
for id in {1001..1003}; do
  qm stop $id && qm destroy $id
done

terraform destroy -auto-approve -var-file=vars/dev.tfvars

References

Built for developers, by developers.

Whether you're building a simple prototype or a business-critical product, Heroku's fully-managed platform gives you the simplest path to delivering apps quickly — using the tools and languages you already love!

Learn More

DEV Community