DEV Community

Constantine Ukah
Constantine Ukah

Posted on

How to upgrade an Enterprise Grade Kubernetes Cluster with Zero Downtime.

Introduction

One of the common tasks performed by DevOps Engineers is upgrade of their organization's Kubernetes Cluster at least once every 3 months as Kubernetes release newer version while maintaining on the last 3 released versions.

For instance, if the newest version is v1.34, the supported versions would be v1.34, v1.33 & v1.32.
Hence, the need to understand how this upgrade process can be achieved with zero downtime.

Prerequisites:

  • Cordon your Nodes: This simply means making your nodes unschedulable. No new deployments would be scheduled on the node.

  • Review and understand the change logs in the release notes - Ensure that the change logs or updated components won't affect your production environment.

  • Kubernetes upgrade are irreversible - You can't downgrade your cluster after an upgrade. A fresh installation would be required in the event of an issue with the upgraded version. Hence

  • Lower Level Environment Test (Unit, Staging or Pre-Production) - Given that Kubernetes upgrades are irreversible, always test the newer version and allow monitoring for about 2-weeks before production cluster upgrade.

  • Control Plane & Nodes should be on the same versions.

  • Cluster Auto-Scaler: If you are using this feature within your Kubernetes environment, ensure that it is on the same or compatible version with your control plane to avoid issues during the cluster upgrade.

  • IP Addresses: Make available at least 5 IP addresses within the cluster subnet.

  • Kubelet: This component should also match the version of your control plane before the upgrade.

What are the actual upgrade processes

  1. Control Plane Upgrade: If using the Managed Kubernetes Cluster (EKS, AKS, GKS), the Cloud Company will take care of managing the control plane. However, upgrade of the cluster doesn't happen automatically. Hence, you will be required to action this via the CLI, UI or EKSCLI etc.

  2. Node Group or Data Plane Upgrade:

    • Managed Node Groups - This is easier because you can use the rollout deployment approach, each node would be upgraded node by node, one after the other.
    • Nodes managed by you or custom nodes - This is a bit tricky as you would need to firstly cordon the node, making it unschedulable before individually upgrading each of the nodes.
    • Hybrid - The combination of both approaches.
  3. Add-ons Upgrade: This can be done by a click of a button.

  4. Using a function test to ensure proper working of all components of your cluster.

Top comments (0)