TruongHoangPhuLoc

Posted on Aug 25, 2025 • Originally published at Medium on Aug 23, 2025

The Journey To CAPI — Part 1: Building A Scalable K8S Cluster At Home

#kubernetes #platformengineering #openstack #devops

The Journey To CAPI — Part 1: Building A Scalable K8S Cluster At Home

Context

For more than a year, I’ve been running a Kubernetes cluster in my homelab. It worked fine, but since it was fully on-premises, I had to live with the usual DIY trade-offs. Features that cloud providers take for granted — like native external load balancing and CSI integration — weren’t available out of the box.

To keep things running, I had to stitch together my own solutions:

Networking & ingress: HAProxy as an external load balancer (with PROXY protocol v2) and NGINX ingress.

Networking & ingress: HAProxy as an external load balancer (with PROXY protocol v2) and NGINX ingress.
Storage: Longhorn for CSI and persistent volumes.
Automation & certificates: externalDNS for DNS records, cert-manager for TLS.
Observability: Loki for logging, Prometheus + Grafana for monitoring.

This setup worked, but it wasn’t exactly what I’d call “cloud-native”. I started wondering: what if I could build a Kubernetes platform that feels closer to what you’d get in a real cloud environment?

That’s when I turned to Cluster API (CAPI). For me, OpenStack was simply the environment I had available in my homelab — but the real point is that CAPI is cloud-provider agnostic. Whether you’re on AWS, Azure, vSphere, or even bare metal, the model is the same: CAPI treats the infrastructure as a pluggable provider and focuses on managing Kubernetes clusters consistently across them.

My goals with this project were simple:

Deploy Kubernetes clusters in a cloud-native way
Achieve real scalability, e.g., dynamically scale worker nodes with Cluster Autoscaler (even from zero)
Lay the groundwork for multi-tenant workloads in future setups

This post is Part 1 of my journey: deploying a kubeadm-based CAPI cluster on OpenStack, with two node pools and autoscaling from zero.

What is Cluster API (CAPI)?

Cluster API is a Kubernetes sub-project that brings the Kubernetes-style declarative model to managing Kubernetes clusters themselves. Instead of manually provisioning clusters or relying on vendor-specific tools, you define clusters the same way you define Pods or Deployments — using YAML manifests and Kubernetes controllers.

At its core, CAPI revolves around two main concepts:

Management Cluster (“mothership”): A Kubernetes cluster where the Cluster API controllers run. This is the control center that provisions, upgrades, and manages other clusters.
Workload Clusters: The actual Kubernetes clusters where your applications run. These clusters are created and managed by the management cluster, but the magic comes from the Custom Resource Definitions (CRDs) that Cluster API introduces.

To give a clearer picture, here’s the CAPI architecture Cluster API: A Deep Dive On Declarative Cluster Lifecycle Management

CAPI doesn’t invent a new toolchain — instead, it extends Kubernetes itself with new API objects that describe clusters and their underlying infrastructure. For example:

Cluster — the high-level definition of a Kubernetes cluster (control plane + workers).
Machine — represents a single node (VM, bare-metal host, cloud instance).
MachineDeployment — manages a scalable set of machines (similar to a Deployment for Pods).
MachineSet — lower-level replica set of Machines.
Infrastructure Providers — CRDs that describe how to provision infra (AWS, Azure, OpenStack, vSphere, BareMetal, etc).
Bootstrap Providers — CRDs that handle how nodes get bootstrapped (e.g., via kubeadm).

Each Infrastructure Provider (e.g., AWS, Azure, OpenStack, vSphere, BareMetal, Docker) defines its own CRDs and controllers that extend CAPI’s core objects. These provider-specific APIs describe how to provision and configure infrastructure in that environment. For example:

AWS Provider (CAPA) adds AWSCluster, AWSMachine, AWSMachineTemplate.
OpenStack Provider (CAPO) adds OpenStackCluster, OpenStackMachine.
Docker Provider (CAPD) adds DockerCluster, DockerMachine.
vSphere Provider (CAPV) adds VSphereCluster, VSphereMachine.

So the generic Cluster object might reference an AWSCluster or an OpenStackCluster, depending on which provider you’re using.

Cluster API’s separation of core lifecycle logic (CAPI) and provider-specific implementations (CAPx) makes it possible to manage Kubernetes clusters in a consistent way across many environments: public clouds (AWS, Azure, GCP), on-prem/private cloud (OpenStack, vSphere, bare metal), and even edge (IoT devices).

With a single management cluster, we can provision workload clusters in different environments — without tying ourselves to one platform.

This flexibility gives us freedom of placement: the ability to choose where workloads run depending on cost, compliance, or locality, without changing the workflow.

Even more interesting:

A workload cluster can itself be bootstrapped into a management cluster (the “grandmothership” idea).
This “cluster of clusters” pattern enables nested orchestration:
Management cluster → provisions workload clusters
Some workload clusters → promoted into management clusters
These can then manage Hosted Control Plane (HCP) child clusters

In effect, the grandmothership model lets you grow an organically expanding “cluster of clusters”. Each layer can take responsibility for the next, ultimately allowing one seed cluster to fan out into a fleet spanning cloud, on-prem, and edge — all declaratively, with Kubernetes-style APIs.

👉 In Part 2, I’ll take this grandmothership idea even further — giving each user tenant their own Hosted Control Plane (HCP) model, so they can run and scale Kubernetes clusters with full isolation

Now that we’ve covered the basics of CAPI and the grandmothership concept, let’s roll up our sleeves and actually build something. In this first part of the journey, I’ll start simple but powerful: deploying a kubeadm-based CAPI cluster on my own cloud. I’ll set up two separate node pools, configure them for autoscaling (from zero nodes!), and prove that this isn’t just theory — it works in a real homelab.

Prerequisites

Infrastructure: OpenStack environment (already set up, you can choose whatever Infrastucture Providers based on your demand)

Compute Service (Nova)
Network Service (Neutron)
Image Service (Glance)
Storage Service (Cinder)
LBAAS (Octavia)

Base tools:

clusterctl (CAPI CLI)
kubectl
k9s

Networking:

Reachable Neutron network for control plane and worker nodes, Floating IP attached to LB-based kube-api server if required
For simplicity, I designed the external provider network of neutron, which is used to assign Floating IPs for resources, to reside in the same L2 domain with my Legacy Cluster. This results in the workload cluster’s kube-api server is accessible from the management cluster, allows the management cluster to communicate the workload’s cluster through it’s public kube-api server endpoint

The diagram below illustrates the network and architecture:

Images: Kubernetes-compatible VM images uploaded to Glance

The image should contains prebuilt tools like kubeadm, kubelet, containerd,… in order to work with kubeadm-based cluster because I’ll be using kubeadm bootstrap provider. Here is the github repository that I used to build the images kubernetes-sigs/image-builder. Here’s the tutorial that we could follow to build image.

Initialize Management Cluster

Start with a management cluster (can be kind-based or a small k0s/kubeadm cluster). For now I have an existing cluster mentioned above, which is quite decent. Thus I decided to promote that on-premise cluster into a management cluster, which I’ll call a Legacy Cluster from now on.
Install Cluster API components for OpenStack:

# In order to use some experimental features, e.g ClusterClass, enabling the following flag is required 
export CLUSTER_TOPOLOGY=true 
clusterctl init --infrastructure openstack

init --infrastructure openstack
Fetching providers
Skipping installing cert-manager as it is already installed
Installing provider="cluster-api" version="v1.10.5" targetNamespace="capi-system"
Installing provider="bootstrap-kubeadm" version="v1.10.5" targetNamespace="capi-kubeadm-bootstrap-system"
Installing provider="control-plane-kubeadm" version="v1.10.5" targetNamespace="capi-kubeadm-control-plane-system"
Installing provider="infrastructure-openstack" version="v0.12.4" targetNamespace="capo-system"

Your management cluster has been initialized successfully!

You can now create your first workload cluster by running the following:

  clusterctl generate cluster [name] --kubernetes-version [version] | kubectl apply -f -

It will deploy controlllers as well as CRDs for CAPI

Now my legacy cluster has been turned into a management cluster and ready to provision other clusters

Defining the Workload Cluster: From YAML to Reality

Now that the management cluster is in place, the next step is to describe the desired state of the workload cluster.

CAPI does this through a set of Custom Resource Definitions (CRDs), each focusing on a different layer of the cluster.

I’ll break it down into four key parts:

Cluster — the top-level resource that ties everything together (infrastructure + control plane).
Control Plane — defines how the Kubernetes API servers and etcd are deployed and managed.
Worker Nodes (Node Pool) — the scalable pool of worker machines that run workloads.
Optional Add-ons — extras like a CNI plugin or Cluster Autoscaler to complete the cluster.

Cloud Credentials Preparation

Cluster API needs access to OpenStack APIs to create networks, servers, and volumes. This is done by creating a Kubernetes Secret that stores your clouds.yaml

apiVersion: v1
kind: Secret
metadata:
  name: capi-ubuntu-openstack-cloud-config
  namespace: capi-ubuntu-openstack
  labels:
    clusterctl.cluster.x-k8s.io/move: "true"
type: Opaque
data:
  cacert: Cg==
  clouds.yaml: # base64-encoded content from clouds.yaml

How this is used later

When you define the OpenStackCluster and OpenStackMachineTemplate, you’ll reference this secret like so:

spec:
  identityRef:
    kind: Secret
    name: capi-ubuntu-openstack-cloud-config

That’s how CAPI controllers authenticate against your OpenStack APIs.

Building the Cluster

The Cluster CRD ties everything together. It doesn’t launch machines itself — instead, it points to:

A KubeadmControlPlane (how to bootstrap control plane nodes).

An OpenStackCluster (how to talk to the underlying OpenStack infrastructure).

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: capi-ubuntu-openstack
  namespace: capi-ubuntu-openstack
  labels:
    cloud: capi-ubuntu-openstack
spec:
  clusterNetwork:
    pods:
      # CNI network CIDRs for the pods
      cidrBlocks:
      - <POD CIDR Block>
    serviceDomain: cluster.local
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: capi-ubuntu-openstack-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: OpenStackCluster
    name: capi-ubuntu-openstack
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackCluster
metadata:
  name: capi-ubuntu-openstack
  namespace: capi-ubuntu-openstack
spec:
  # Enabled LB for API server to communicate with legacy cluster
  apiServerLoadBalancer:
    enabled: true
  externalNetwork:
  # The network ID of external IP pools which assigns FIP to LB for API server
    id: <External-NetworkID>
  identityRef:
    cloudName: openstack
    name: capi-ubuntu-openstack-cloud-config
  # Define the Security Groups. Basically, we will have 2 groups, control plane and worker. Each group will be assigned the respective rules
  managedSecurityGroups:
    allowAllInClusterTraffic: true
    allNodesSecurityGroupRules:
    - description: Created by cluster-api-provider-openstack - BGP (calico)
      direction: ingress
      etherType: IPv4
      name: BGP (Calico)
      portRangeMax: 179
      portRangeMin: 179
      protocol: tcp
      remoteManagedGroups:
      - controlplane
      - worker
    - description: Created by cluster-api-provider-openstack - IP-in-IP (calico)
      direction: ingress
      etherType: IPv4
      name: IP-in-IP (calico)
      protocol: "4"
      remoteManagedGroups:
      - controlplane
      - worker
  # Create a subnet for my cluster to live in
  managedSubnets:
  - cidr: <Subnet CIDR block>
    dnsNameservers:
    - <NameServer1>
    - <NameServer2>

Notice how this YAML doesn’t specify any nodes yet. It’s simply saying:

➡️ “This cluster will use kubeadm for control plane, and OpenStack for infra”

Control Plane — The Brain of Your Kubernetes

With our Cluster object in place, the next step is telling CAPI how to manage the control plane. In Cluster API, this is handled by the KubeadmControlPlane resource (used with the kubeadm bootstrap provider).

Think of it as answering the key questions:

How many control plane nodes should exist?

Which Kubernetes version should they run?
How many control plane nodes should exist?
How should the control plane’s components be configured?
Are there any kubeadm-specific tweaks, or pre/post bootstrap steps?

Here’s what my setup looks like:

Control Plane Definition

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: capi-ubuntu-openstack-control-plane
  namespace: capi-ubuntu-openstack
spec:
  kubeadmConfigSpec:
    clusterConfiguration:
      controllerManager:
        extraArgs:
          cloud-provider: external
    files: []
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: external
          provider-id: openstack:///'{{ instance_id }}'
        name: '{{ local_hostname }}'
    joinConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: external
          provider-id: openstack:///'{{ instance_id }}'
        name: '{{ local_hostname }}'
  machineTemplate:
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: OpenStackMachineTemplate
      name: capi-ubuntu-openstack-control-plane
  replicas: 3
  version: <k8s-version>

Here’s what this does for us:

HA control plane: 3 nodes for high availability.
Infrastructure-aware: Nodes are provisioned with an OpenStackMachineTemplate.
The cloud provider is set to external, which integrates with a cloud controller manager later.
kubeadm bootstraps nodes with OpenStack-specific provider settings.

Granular kubelet tuning: Allows extra kubelet arguments doc

Control Plane Machines Template

The OpenStackMachineTemplate defines what VM specs the control plane nodes should have:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
metadata:
  name: capi-ubuntu-openstack-control-plane
  namespace: capi-ubuntu-openstack
spec:
  template:
    spec:
      flavor: m1.k8s-master
      image:
        filter:
          name: <image-name>
      sshKeyName: <SSH-KeyName>

Key points:

Flavor: Determines compute resources (CPU/RAM).
Image: The prebuilt Ubuntu VM image with kubeadm, kubelet, and container runtime.
SSH Key: Allows admin to ssh to the nodes

With this, our control plane is fully defined and ready for the management cluster to bring it to life.

Worker Nodes (Node Pools) — Powering Your Workloads

Once the control plane is defined and ready to be deployed, the next step is to define where your applications will actually run: the worker nodes. In CAPI, worker nodes are managed via MachineDeployments, which give you fine-grained control over how many nodes you want, how they are provisioned, and how they scale.

Think of MachineDeployments like a Deployment for VMs:

They manage replicas (number of nodes).
They allow scaling, even dynamically via Cluster Autoscaler.
They reference MachineTemplates and KubeadmConfigTemplates to configure how each node boots and what resources it gets.
Labels, taints, and kubelet extra args help you customize node behavior.

Here’s a practical example of how I set up two worker node pools: a “Normal” pool for general workloads, and a “Small” pool for lightweight or bursty workloads.

Bootstrap Config for Worker Nodes

# Worker Nodes sections
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: capi-ubuntu-openstack-config-template
  namespace: capi-ubuntu-openstack
spec:
  template:
    spec:
      files: []
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            cloud-provider: external
            provider-id: openstack:///'{{ instance_id }}'
          name: '{{ local_hostname }}'

This template ensures every worker node registers correctly with the control plane and integrates with the external OpenStack cloud provider.

Normal Worker Pool

# Normal Pool
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: capi-ubuntu-openstack-normal-worker
  namespace: capi-ubuntu-openstack
  labels:
    cluster.x-k8s.io/cluster-name: capi-ubuntu-openstack
  annotations:
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "0"
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "2"
    capacity.cluster-autoscaler.kubernetes.io/memory: "8G"
    capacity.cluster-autoscaler.kubernetes.io/cpu: "4"
    capacity.cluster-autoscaler.kubernetes.io/ephemeral-disk: "30Gi"
spec:
  clusterName: capi-ubuntu-openstack
  replicas: 0
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: capi-ubuntu-openstack-config-template
      clusterName: capi-ubuntu-openstack
      failureDomain: nova
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: OpenStackMachineTemplate
        name: capi-ubuntu-openstack-normal-worker
      version: <k8s-version>

Normal Worker Machine Template

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
metadata:
  name: capi-ubuntu-openstack-normal-worker
  namespace: capi-ubuntu-openstack
spec:
  template:
    spec:
      flavor: m1.k8s-worker
      image:
        filter:
          name: <image-name>
      sshKeyName: <SSH-KeyName>

Small Worker Pool

apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: capi-ubuntu-openstack-small-worker
  namespace: capi-ubuntu-openstack
  labels:
    cluster.x-k8s.io/cluster-name: capi-ubuntu-openstack
  annotations:
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "0"
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "2"
    capacity.cluster-autoscaler.kubernetes.io/memory: "4G"
    capacity.cluster-autoscaler.kubernetes.io/cpu: "2"
    capacity.cluster-autoscaler.kubernetes.io/ephemeral-disk: "30Gi"
spec:
  clusterName: capi-ubuntu-openstack
  replicas: 0
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: capi-ubuntu-openstack-config-template
      clusterName: capi-ubuntu-openstack
      failureDomain: nova
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: OpenStackMachineTemplate
        name: capi-ubuntu-openstack-small-worker
      version: <k8s-version>

Small Worker Machine Template

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
metadata:
  name: capi-ubuntu-openstack-small-worker
  namespace: capi-ubuntu-openstack
spec:
  template:
    spec:
      flavor: m1.k8s-worker-small
      image:
        filter:
          name: <image-name>
      sshKeyName: <SSH-KeyName>

✅ Key Highlights

Each pool has its own flavor, size, and autoscaler annotations.
Initial replicas are set to 0 so Cluster Autoscaler can scale from zero.
Node pools reference a shared KubeadmConfigTemplate for consistency.
This setup allows fine-grained control and flexible scaling across workloads.

⚡ Tip: Autoscaling From Zero

Simply setting replicas: 0 is not enough for Cluster Autoscaler to scale from zero. Some providers (including OpenStack) don’t know the size of nodes that don’t exist yet.

That’s where capacity annotations come in:

capacity.cluster-autoscaler.kubernetes.io/memory: "4G"
capacity.cluster-autoscaler.kubernetes.io/cpu: "2"
capacity.cluster-autoscaler.kubernetes.io/ephemeral-disk: "30Gi"

💡 Without these annotations, the autoscaler cannot bring nodes to life, and your zero-replica node groups will fail to scale. Reference

Add-ons — Giving Your Cluster Its Superpowers

While the control plane orchestrates and the workers run workloads, add-ons bring the essential capabilities that make your cluster truly functional. Some add-ons are so critical, though, that your cluster won’t even get off the ground without them.

According to the Cluster API kubeadm bootstrap provider (CABPK):

Bootstrap Orchestration
CABPK supports multiple control plane machines initing at the same time. The generation of cloud-init scripts of different machines is orchestrated in order to ensure a cluster bootstrap process that will be compliant with the correct Kubeadm init/join sequence. More in detail:

1. cloud-config-data generation starts only after Cluster.Status.InfrastructureReady flag is set to true.

2. at this stage, cloud-config-data will be generated for the first control plane machine only, keeping on hold additional control plane machines existing in the cluster, if any (kubeadm init).

3. after the ControlPlaneInitialized conditions on the cluster object is set to true, the cloud-config-data for all the other machines are generated (kubeadm join/join —control-plane).

Without a CNI, the ControlPlaneInitialized condition never becomes true, and your cluster remains stuck waiting forever.

Cluster Autoscaler — Dynamically scales your worker nodes based on workload. When paired with capacity annotations on your MachineDeployments, it even allows scaling from zero.
Cloud Controller Manager (CCM) — Bridges Kubernetes with your cloud provider. For OpenStack, this manages floating IPs, volumes, LB provision,…
Optional Add-ons — Logging, monitoring, ingress controllers, or any custom tools that make your cluster production-ready.

I’ve structured the addons into a kustomization style

addons/
├── calico/
├── cluster-autoscaler/
└── openstack-cloud-controller-manager/

Handing off the addons deployment onto the workload cluster

CAPI defines methods to let the workload cluster to automatically deploy the addons you demand:

Using Clusterresource CRD — The standard, declarative way to apply YAMLs for CNIs, Cluster Autoscaler, CCM, and more.
Workload bootstrap using GitOps — An alternative where a GitOps agent pulls manifests from a repository. To be honest, I haven’t explored this yet, but it’s a promising method for multi-cluster automation.

ClusterResourceSet in CAPI

In Cluster API, ClusterResourceSet allows us to automatically apply Kubernetes resources (like ConfigMaps, Secrets, or manifests) to newly created workload clusters. This is extremely useful for bootstrapping clusters with networking, storage, or other essential components.

Basically, we handle two types of resources:

Deployable resources (Manifests, ConfigMaps, etc.)
Secrets

However, we ran into a practical issue with Calico:

The ConfigMap "calico-install" is invalid: metadata.annotations: Too long: must have at most 262144 bytes

Embedding the full Calico YAML in a ConfigMap exceeded Kubernetes’ size limit.

Solution: instead of embedding the whole manifest, we delegate the installation to a Job that runs after the API server is ready.

# ServiceAccount & ClusterRoleBinding for the Job
apiVersion: v1
kind: ServiceAccount
metadata:
  name: calico-installer
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: calico-installer-binding
subjects:
- kind: ServiceAccount
  name: calico-installer
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
---
# Job to install Calico dynamically
apiVersion: batch/v1
kind: Job
metadata:
  name: install-calico
  namespace: kube-system
spec:
  backoffLimit: 3
  completions: 1
  parallelism: 1
  template:
    spec:
      serviceAccountName: calico-installer
      restartPolicy: OnFailure
      # Bypass CNI because at this time, we dont have CNI installed yet
      hostNetwork: true
      # At this stage, we only have 1 control plane in place, we need to set toleration to schedule the pod on this control plane
      tolerations:
      - key: "node-role.kubernetes.io/control-plane"
        operator: "Exists"
        effect: "NoSchedule"
      initContainers:
      - name: wait-for-api
        image: busybox:1.36
        command:
        - /bin/sh
        - -c
        - |
          echo "Waiting for API server..."
          until nc -z ${KUBERNETES_SERVICE_HOST} ${KUBERNETES_SERVICE_PORT}; do sleep 5; done
      containers:
      - name: install-calico
        image: bitnami/kubectl:latest
        command:
        - /bin/sh
        - -c
        - |
          # Just a double check
          until kubectl get nodes; do sleep 5; done
          echo "API server ready, applying Calico..."
          kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/calico.yaml
          # I think I could check the exit code of above command to trigger restart policy if needed.

To see how this approach fits into the broader cluster bootstrap, check out the full directory structure of our CAPI setup — including the Calico Job, ClusterResourceSets, and other addons — in the sanitized repository tree below.

.
├── addons
│ ├── calico
│ │ ├── calico-bootstrap-job.yaml
│ │ ├── calico-bootstrap-rbac.yaml
│ │ ├── clusterresourceset.yaml
│ │ └── kustomization.yaml
│ ├── cluster-autoscaler
│ │ ├── cluster-autoscaler.yaml.template
│ │ ├── clusterresourceset.yaml
│ │ ├── kustomization.yaml
│ │ └── secret.yaml
│ └── openstack-cloud-controller-manager
│ ├── clusterresourceset.yaml
│ ├── kustomization.yaml
│ ├── openstack-cloud-controller.yaml
│ └── secret.yaml
└── capi-ubuntu-openstack
    ├── capi-ubuntu-openstack.yaml
    └── kustomization.yaml

Each addon is organized in its own folder (Calico, Cluster Autoscaler, OpenStack CCM), making it easy to see how ClusterResourceSets, jobs, and secrets are structured.

The capi-ubuntu-openstack folder contains the main cluster manifests for your kubeadm-based CAPI cluster. All these manifests are available in a sanitized GitHub repository here for reference. You can browse, clone, and adapt them to your own homelab or OpenStack environment without exposing sensitive data

🚀 Hands-On: Deploying Our Scalable CAPI Cluster

Generate the deployment file for cluster autoscaler

There are some placeholders in the cluster-autoscaler.yaml.template file we need to substitute with our actual values

export NAMESPACE=capi-ubuntu-openstack  
export CLUSTER_NAME=capi-ubuntu-openstack
envsubst < cluster-autoscaler.yaml.template > cluster-autoscaler-deployment.yaml

Apply kustomization file

kubectl apply -k capi-ubuntu-openstack/
configmap/calico-bootstrap-job created
configmap/calico-bootstrap-rbac created
configmap/cloud-provider-openstack created
configmap/cluster-autoscaler created
secret/capi-ubuntu-openstack-cloud-config created
secret/cloud-config created
secret/kubeconfig-secret created
clusterresourceset.addons.cluster.x-k8s.io/calico created
clusterresourceset.addons.cluster.x-k8s.io/cloud-provider-openstack created
clusterresourceset.addons.cluster.x-k8s.io/cluster-autoscaler-openstack created
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/capi-ubuntu-openstack-config-template created
cluster.cluster.x-k8s.io/capi-ubuntu-openstack created
machinedeployment.cluster.x-k8s.io/capi-ubuntu-openstack-normal-worker created
machinedeployment.cluster.x-k8s.io/capi-ubuntu-openstack-small-worker created
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/capi-ubuntu-openstack-control-plane created
openstackcluster.infrastructure.cluster.x-k8s.io/capi-ubuntu-openstack created
openstackmachinetemplate.infrastructure.cluster.x-k8s.io/capi-ubuntu-openstack-control-plane created
openstackmachinetemplate.infrastructure.cluster.x-k8s.io/capi-ubuntu-openstack-normal-worker created
openstackmachinetemplate.infrastructure.cluster.x-k8s.io/capi-ubuntu-openstack-small-worker created

The CAPI will spin up the LB for kube-apiserver as well as the first control plane

Get the kubeconfig file of workload cluster and observe the resources creation

clusterctl get kubeconfig capi-ubuntu-openstack -n capi-ubuntu-openstack > ubuntu.conf
export KUBECONFIG=ubuntu.conf
k9s

After installation period, we now have a cluster with 3 control planes up and running

🔥 Hands-On: Testing node autoscaling from Zero

To validate our cluster’s autoscaling behavior, I prepared two workloads:

Small worker pool : 2vCPU / 4GiB RAM per node

We deploy a flask-system-info-small app with replicas requiring ~2.5Gi memory and 500m CPU each.
The following deployment will force it to scale to nodes in the small pool. This happens because in cluster-autoscaler, I set -expander=least-waste, meaning the auto-scaler will prefer adding nodes from the pool which is least CPU-wise waste but still be able to fulfill the resource request

apiVersion: apps/v1
kind: Deployment
metadata:
  name: flask-system-info-small
  labels:
    app: flask-system-info-small
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flask-system-info-small
  template:
    metadata:
      labels:
        app: flask-system-info-small
    spec:
      hostNetwork: true
      # Add pod anti-affinity to spread pods across nodes
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - flask-system-info-small
              topologyKey: kubernetes.io/hostname
      containers:
      - name: flask-system-info-small
        image: locthp/flask_api_system_info:latest
        imagePullPolicy: Always
        env:
        - name: LISTEN_PORT
          value: "5000"
        ports:
        - containerPort: 5000
          protocol: TCP
        resources:
          requests:
            memory: "2.5Gi"
            cpu: "500m"
          limits:
            memory: "3Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /
            port: 5000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: 5000
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: flask-system-info-small-service
spec:
  selector:
    app: flask-system-info-small
  ports:
  - protocol: TCP
    port: 80
    targetPort: 5000
  type: ClusterIP

Normal worker pool : 4vCPU / 8GiB RAM per node

Deployment: flask-system-info-normal with replicas requesting ~6Gi memory and 500m CPU each.
Autoscaler will scale the normal pool only if the small pool cannot accommodate the resource requests.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: flask-system-info-normal
  labels:
    app: flask-system-info-normal
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flask-system-info-normal
  template:
    metadata:
      labels:
        app: flask-system-info-normal
    spec:
      hostNetwork: true
      containers:
      - name: flask-system-info-normal
        image: locthp/flask_api_system_info:latest
        imagePullPolicy: Always
        env:
        - name: LISTEN_PORT
          value: "5001"
        ports:
        - containerPort: 5001
          protocol: TCP
        resources:
          requests:
            memory: "6Gi"
            cpu: "500m"
          limits:
            memory: "7Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /
            port: 5001
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: 5001
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: flask-system-info-normal-service
spec:
  selector:
    app: flask-system-info-normal
  ports:
  - protocol: TCP
    port: 80
    targetPort: 5001
  type: ClusterIP

About the Test Deployments

Each deployment simply runs a small Flask-based service that reports the system resources of the node it’s running on. This includes:

CPU and memory usage
Node IP addresses
System uptime

These apps don’t do any heavy computation; they are designed purely to force the Cluster Autoscaler to scale nodes according to resource requests. By observing which nodes the pods land on, we can validate that the autoscaler correctly adds nodes to the appropriate pool (small or normal) and respects the --expander=least-waste strategy.

Deploy the Test Deployments

kubectl apply -f test-small.yaml
deployment.apps/flask-system-info-small created
service/flask-system-info-small-service created

kubectl apply -f test-normal.yaml
deployment.apps/flask-system-info-normal created
service/flask-system-info-normal-service created

Small deployment triggered the scaling, the small pool

Normal deployment triggered the scaling as well, the normal pool

Checked the log of cluster scaler, it triggered the autoscaling through managment cluster

Eventually, we got 2 worker nodes from 2 pools as well and ready deployments

🔥 Hands-On: Testing Cloud Controller Manager — Expose The Services Publicly

Now that our small and normal worker deployments are running, let’s expose them through an ingress using an external load balancer. This allows us to validate our Cloud Controller Manager (CCM) functionality.

Install Ingress Controller

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.13.1/deploy/static/provider/cloud/deploy.yaml
namespace/ingress-nginx created
serviceaccount/ingress-nginx created
serviceaccount/ingress-nginx-admission created
role.rbac.authorization.k8s.io/ingress-nginx created
role.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrole.rbac.authorization.k8s.io/ingress-nginx created
clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission created
rolebinding.rbac.authorization.k8s.io/ingress-nginx created
rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
configmap/ingress-nginx-controller created
service/ingress-nginx-controller created
service/ingress-nginx-controller-admission created
deployment.apps/ingress-nginx-controller created
job.batch/ingress-nginx-admission-create created
job.batch/ingress-nginx-admission-patch created
ingressclass.networking.k8s.io/nginx created
validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created

✅ Ingress controller is installed, and the external LB is provisioned.

Create Ingress Resource

Next, we define an Ingress resource to route traffic to our small and normal services:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: flask-system-info-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
  ingressClassName: nginx
  rules:
  - http:
      paths:
      - path: /small
        pathType: Prefix
        backend:
          service:
            name: flask-system-info-small-service
            port:
              number: 80

      - path: /normal
        pathType: Prefix
        backend:
          service:
            name: flask-system-info-normal-service
            port:
              number: 80

Test access

locthp@tasks-automation-server:~$ curl 172.16.1.130/small
{
  "hostname": "capi-ubuntu-openstack-small-worker-jtcd9-9rd5b",
  "main_ip": "10.8.0.194",
  "cpu_count": 2,
  "memory_gb": 3.82,
  "uptime_seconds": 969.43,
  "uptime_formatted": "16m 9s",
  "interfaces": {
    "lo": "127.0.0.1",
    "ens3": "10.8.0.194",
    "tunl0": "192.168.224.128"
  }
}locthp@tasks-automation-server:~$ curl 172.16.1.130/normal
{
  "hostname": "capi-ubuntu-openstack-normal-worker-lcqjv-hn4xs",
  "main_ip": "10.8.0.237",
  "cpu_count": 4,
  "memory_gb": 7.76,
  "uptime_seconds": 982.59,
  "uptime_formatted": "16m 22s",
  "interfaces": {
    "lo": "127.0.0.1",
    "ens3": "10.8.0.237",
    "tunl0": "192.168.7.64"
  }
}locthp@tasks-automation-server:~$

🧹 Cleaning Up & Testing Scale-Down

After validating that the Cluster Autoscaler successfully scaled nodes up, it’s time to test scale-down behavior. First, we clean up the test workloads:

kubectl delete -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.13.1/deploy/static/provider/cloud/deploy.yaml
kubectl delete test-small.yaml
kubectl delete -f test-normal.yaml
kubectl delete -f ingress.yaml

After about 10 minutes (the default idle time before the Cluster Autoscaler triggers scale-down), we observe the scaling-down event:

🔍 What’s Happening

When a node no longer hosts any non-DaemonSet pods, Cluster Autoscaler taints it with DeletionCandidateOfClusterAutoscaler=:PreferNoSchedule. This marks it as a removal candidate. By default, the autoscaler waits 10 minutes ( — scale-down-unneeded-time) before proceeding to delete the node if it remains idle
Idle nodes are then removed from the cluster, returning it to its minimal state.
This demonstrates that the autoscaler not only handles scale-up effectively but also optimizes resource usage by scaling down when nodes are no longer needed

And we only have 3 control planes remained

kubectl get nodes
NAME STATUS ROLES AGE VERSION
capi-ubuntu-openstack-control-plane-6psbk Ready control-plane 85m v1.32.4
capi-ubuntu-openstack-control-plane-pv5l4 Ready control-plane 86m v1.32.4
capi-ubuntu-openstack-control-plane-rvmvl Ready control-plane 92m v1.32.4

The CAPI controllers helped us remove the node at infra side as well

Bonus: 🌐 Exposing Services to the Internet

While our focus so far has been on cluster architecture and node pools, it’s worth noting that both “small” and “normal” nodes can serve real workloads accessible externally. For example, a simple web service or visualization app can be deployed on a normal node and exposed via an Ingress or LoadBalancer. I can then simply plug the public IP of the Ingress/LoadBalancer into the tunnel I mentioned in my previous blog

For example, if the public IP address of the LoadBalancer for the Ingress is:

kubectl get svc -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx-controller LoadBalancer 10.110.55.92 172.16.1.144 80:31200/TCP,443:31010/TCP 56m

I just need to plug the IP 172.16.1.144 into the end of the tunnel and set the hostname. Cloudflare will handle SSL termination by default, so there’s no need to set up SSL manually.

Wrapping Up & Looking Ahead

We now have a homelab Kubernetes cluster that behaves like a cloud-native cluster: it scales dynamically, integrates with cloud infrastructure, and is managed declaratively via CAPI. This lays a solid foundation for multi-tenant clusters, automated add-on deployment, and future expansion using the “grandmothership” model.

In Part 2, we’ll take this further by enabling multi-tenant hosted control planes — giving each user a fully isolated, with automatically scaling workloads, all managed seamlessly via Cluster API

Originally published at https://truonghoangphuloc.github.io on August 23, 2025.

DEV Community

The Journey To CAPI — Part 1: Building A Scalable K8S Cluster At Home

The Journey To CAPI — Part 1: Building A Scalable K8S Cluster At Home

Context

What is Cluster API (CAPI)?

Prerequisites

Infrastructure: OpenStack environment (already set up, you can choose whatever Infrastucture Providers based on your demand)

Base tools:

Networking:

Images: Kubernetes-compatible VM images uploaded to Glance

Initialize Management Cluster

Defining the Workload Cluster: From YAML to Reality

Cloud Credentials Preparation

Building the Cluster

Control Plane — The Brain of Your Kubernetes

Control Plane Definition

Control Plane Machines Template

Worker Nodes (Node Pools) — Powering Your Workloads

Bootstrap Config for Worker Nodes

Normal Worker Pool

Normal Worker Machine Template

Small Worker Pool

Small Worker Machine Template

Add-ons — Giving Your Cluster Its Superpowers

Handing off the addons deployment onto the workload cluster

ClusterResourceSet in CAPI

🚀 Hands-On: Deploying Our Scalable CAPI Cluster

🔥 Hands-On: Testing node autoscaling from Zero

About the Test Deployments

Deploy the Test Deployments

🔥 Hands-On: Testing Cloud Controller Manager — Expose The Services Publicly

🧹 Cleaning Up & Testing Scale-Down

Bonus: 🌐 Exposing Services to the Internet

Wrapping Up & Looking Ahead

Top comments (0)