EKS Cluster Upgrade steps with Zero downtime.
Introduction:
Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes service offered by Amazon Web Services (AWS) that simplifies the process of deploying, managing, and scaling containerized applications using Kubernetes on AWS infrastructure. Kubernetes is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications.
EKS abstracts away the complexities of Kubernetes cluster management, allowing users to focus on developing and running their applications. With EKS, AWS manages the control plane, which includes components like the API server, scheduler, and etcd, ensuring high availability and scalability of the Kubernetes control plane.
One of the key concepts in EKS is node groups. Node groups are a collection of EC2 instances (virtual machines) that act as worker nodes in the Kubernetes cluster. These nodes run the containerized applications and execute tasks such as scheduling, networking, and storage. Node groups can be configured with various instance types, sizes, and configurations to meet specific workload requirements.
Node groups in EKS are highly flexible and scalable, allowing users to add or remove nodes dynamically based on workload demands. This elasticity ensures optimal resource utilization and cost efficiency. Additionally, node groups can be spread across multiple Availability Zones for improved fault tolerance and high availability.
Objective:
As a microservice cluster, EKS undergoes periodic upgrades to maintain its stability, security, and performance. It's essential to keep the EKS cluster updated as AWS may discontinue support for older clusters over time. Regularly updating the EKS cluster ensures compatibility with AWS services, receives ongoing support, and incorporates new features and security patches provided by AWS.
Overview:
When upgrading an EKS cluster from one version to another (e.g., from 1.24 to 1.28), it's crucial to minimize downtime for applications. Typically, during the upgrade process, you may encounter errors related to incompatible node groups. To avoid downtime,follow below steps.
Key components:
Create Manifest for Intermediate Version: Generate a manifest for the intermediate version (e.g., 1.26) of the EKS cluster.
Create Node Group with Intermediate Version: Deploy a new node group using the intermediate version (1.26). This ensures compatibility between the cluster and the new nodes.
Update Deployment Affinity: Update the affinity settings in your application's deployment configuration to ensure a smooth rolling update to the new node group. Set the max surge to 100% to maintain application availability during the update.
Delete Old Node Group: Once the new node group is successfully deployed and your application is running smoothly on it, delete the old node group with the previous version (1.24).
Update EKS Cluster Version: In the EKS console, update the cluster version from the intermediate version (1.26) to the target version (1.28).
Create Manifest for Target Version: Generate a new manifest for the target version (1.28) of the EKS cluster.
Create Node Group with Target Version: Deploy a new node group using the target version (1.28) of the EKS cluster.
Delete Intermediate Node Group: Once the new node group with the target version is operational and applications are running smoothly, delete the node group with the intermediate version (1.26).
Update kubectl Version: Finally, update the kubectl version on your local machine to the latest compatible version with the upgraded EKS cluster.
By following these steps, you can minimize downtime during the EKS cluster upgrade process and ensure the smooth transition of your applications to the new cluster version.
Cluster Upgrade process in EKS console:
To begin, ensure your cluster, currently at version 1.24. To ensure optimal performance and compatibility, aim to keep your cluster updated to the latest N-1 version, currently 1.28, while the latest is 1.29, ensuring access to new features while maintaining stability and support from AWS.
Within the cluster, there is one node group that also requires updating to the latest N-1 version, aligning with the cluster's version 1.24, to ensure seamless operation and compatibility with AWS services.
Select the "Upgrade Now" option to initiate the upgrade process from version 1.24 to version 1.25, ensuring the cluster remains up-to-date with the latest enhancements and features.
The cluster has been successfully updated to the newer version, ensuring it remains current with the latest features, improvements, and security enhancements.
Repeat the previous steps to upgrade the cluster from version 1.25 to version 1.26, ensuring continued alignment with the latest advancements and optimizations in Kubernetes.
After upgrading the cluster to version 1.26, you may encounter an error when attempting to further upgrade to version 1.27 due to the need to update the node groups to match the cluster version. Refer to the provided image for details on the error message.
Creating a Manifest file with 1.26version:
To minimize downtime, we can create new node groups with version 1.26, avoiding the need to delete existing ones. Below is the manifest file for creating the new node groups:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: zerodowntime-cluster
region: ap-south-2
version: "1.26"
vpc:
id: "vpc-0d13d6b46ae0a1e28"
cidr: "172.31.0.0/16"
subnets:
public:
Zero-Pub-2a:
id: "subnet-075876fd0bcead8a2"
Zero-Pub-2c:
id: "subnet-0d97057ad00f8f0c9"
managedNodeGroups:
- name: zerodowntime-clusterNG-new
ami: "ami-0879d75b626dd9f41"
amiFamily: AmazonLinux2
overrideBootstrapCommand: |
#!/bin/bash
/etc/eks/bootstrap.sh zerodowntime-cluster --container-runtime containerd
minSize: 1
maxSize: 2
desiredCapacity: 1
instanceType: "t3.xlarge"
volumeSize: 20
volumeEncrypted: true
privateNetworking: true
subnets:
- Zero-Pub-2a
- Zero-Pub-2c
labels: {role: zero-downtime-test}
ssh:
publicKeyName: zero-downgrade-cluster
tags:
nodegroup-role: Zero-downgrade-cluster-role
nodegroup-name: zerodowntime-clusterNG-new
Project: POC
Env: Zero-app
Layer: App-OD
Managedby: Workmates
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
withAddonPolicies:
autoScaler: true
externalDNS: true
certManager: true
ebs: true
efs: true
albIngress: true
cloudWatch: true
Access the Jenkins server and switch to the Jenkins user. Create a new YAML manifest file, save it with a .yaml extension, and paste the provided manifest for creating the new node groups with version 1.26.
The following is the content of the manifest file saved in YAML format.
Verify the available node groups in the cluster using the following command.
eksctl get nodegroup --cluster= --region=
As you can see in above picture there is only one nodegroup
Creating the nodegroups with 1.26 version manifest file:
Create new nodegroups with the new 1.26version manifest using below command.
eksctl create nodegroup -f
Now you can see the second node group with 1.26v is successfully created.
Update the affinity in deployment file with a rolling update of 100% as max surge.
Updating Affinity in Deployment:
Once the node groups are created:
Modify the deployment YAML files to adjust the affinity:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 100%
maxUnavailable: 0%
Also, update the labels.
Do the same Affinity changes for All Applications:
1) Repeat the above process for all deployments, ensuring consistent updates to the affinity and labels across all applications. This ensures that all applications utilize the new node groups effectively.
Deleting the node groups with older version:
Once check the nodegroups with cluster with below command.
eksctl get nodegroup --cluster= --region=
Since we have two nodegroups we can go-ahead to delete the old nodegroup with below command.
eksctl delete nodegroup --cluster= --name= --region --disable-eviction
Now you can see the old nodegroup is deleted.
Upgrading the cluster from 1.26 to 1.28 version in EKS console:
Since the new node groups with cluster version are matched go ahead and update the cluster from 1.26v to 1.27v.
The cluster has been successfully updated to the newer version, ensuring it remains current with the latest features, improvements, and security enhancements.
Repeat the previous steps to upgrade the cluster from version 1.27 to version 1.28, ensuring continued alignment with the latest advancements and optimizations in Kubernetes.
Now my cluster is upgraded to 1.28 version.
Creating the Manifest file with 1.28version:
Now we need to prepare a manifest with 1.28 version to update the node groups to 1.28.
First retrieve the ami id which supports cluster version 1.28. Use the below command to retrieve the ami id. Kindly specify the region.
aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.28/amazon-linux-2/recommended/image_id --region ap-south-2 --query "Parameter.Value" --output text
Add the ami id in manifest and save the file.
This manifest file is for 1.28version of eks cluster nodegroups.
Check the below manifest file and update the necessary fields.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: zerodowntime-cluster
region: ap-south-2
version: "1.28"
vpc:
id: "vpc-0d13d6b46ae0a1e28"
cidr: "172.31.0.0/16"
subnets:
public:
Zero-Pub-2a:
id: "subnet-075876fd0bcead8a2"
Zero-Pub-2c:
id: "subnet-0d97057ad00f8f0c9"
managedNodeGroups:
- name: zerodowntime-clusterNG
ami: "ami-06cd43657081b7a56"
amiFamily: AmazonLinux2
overrideBootstrapCommand: |
#!/bin/bash
/etc/eks/bootstrap.sh zerodowntime-cluster --container-runtime containerd
minSize: 1
maxSize: 2
desiredCapacity: 1
instanceType: "t3.xlarge"
volumeSize: 20
volumeEncrypted: true
privateNetworking: true
subnets:
- Zero-Pub-2a
- Zero-Pub-2c
labels: {role: zero-downtime-test}
ssh:
publicKeyName: zero-downgrade-cluster
tags:
nodegroup-role: Zero-downgrade-cluster-role
nodegroup-name: zerodowntime-clusterNG
Project: POC
Env: Zero-app
Layer: App-OD
Managedby: Workmates
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
withAddonPolicies:
autoScaler: true
externalDNS: true
certManager: true
ebs: true
efs: true
albIngress: true
cloudWatch: true
Creating the nodegroups with 1.28 version manifest file:
Once check the nodegroups available for the cluster with below command.
eksctl get nodegroup --cluster= --region=
As you can see in the above image i have only one node group with -new tag
Now go-ahead and run the manifest file to create new updated node groups with 1.28version.
eksctl create nodegroup -f
Now you can see a new nodegroup is created with 1.28version which supports eks cluster.
2) Updating Affinity in Deployment:
Once the node groups are created:
Modify the deployment YAML files to adjust the affinity:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 100%
maxUnavailable: 0%
Also, update the labels.
Do the same Affinity changes for All Applications:
3) Repeat the above process for all deployments, ensuring consistent updates to the affinity and labels across all applications. This ensures that all applications utilize the new node groups effectively.
Deleting the node groups with older version:
Once the affinity is updated delete the nodegroups with -new tag which is 1.26version with below command.
eksctl delete nodegroup --cluster= --name= --region ap-south-2 --disable-eviction
As you can see in below image the node group with -new name is deleting state.
Now the old node group is deleted which remains only one node group with 1.28version
Installing or updating kubectl:
Determine which version of kubectl you are using with below command.
kubectl version --client
Use the below AWS documentation link for information on kubectl commands.
https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html
Install or update kubectl on macOS, Linux, and Windows operating systems.
curl -O https://s3.us-west-2.amazonaws.com/amazon-eks/1.28.8/2024-04-19/bin/linux/amd64/kubectl
Apply execute permissions to the binary.
chmod +x ./kubectl
Copy the binary to a folder in your PATH. If you have already installed a version of kubectl, then we recommend creating a $HOME/bin/kubectl and ensuring that $HOME/bin comes first in your $PATH.
(Optional) Add the $HOME/bin path to your shell initialization file so that it is configured when you open a shell.
echo 'export PATH=$HOME/bin:$PATH' >> ~/.bashrc
Determine which version of kubectl you are using with below command.
kubectl version --client





























Top comments (0)