Ifedayo Adesiyan

Posted on Oct 9

How to Upgrade AWS EKS Node Groups Without Downtime (Step-by-Step Guide)

#kubernetes #devops #aws #deployment

Upgrading Kubernetes node groups is key for better performance, scaling, and cost savings. But doing it carelessly can cause downtime. If you’re running on Amazon EKS (Elastic Kubernetes Service), AWS’s managed Kubernetes platform, this process becomes easier, but you still need a careful approach.

This guide walks you through a safe, step-by-step upgrade (using an example of moving from t3.small to t3.medium in EKS) without any disruption to your workloads.

Prerequisites

Make sure your apps have more than one replica, you can access the cluster with kubectl or your cloud CLI, and you understand how node pools/node groups work

In this tutorial, I’ll demonstrate upgrading from an AWS t3.small node to t3.medium node. You’ll notice commands use a — kubeconfig =kubeconfig.yaml flag. This explicitly points kubectl to the correct Kubernetes cluster configuration file, which is useful when managing multiple clusters or when your kubeconfig isn’t in the default location (~/.kube/config)

Step 1: Ensure Your Deployment Has Multiple Replicas

Check that your Deployments or StatefulSets run more than one replica, this redundancy keeps apps available during rolling updates.

#deployment.yml
spec: 
  replicas: 3

You can confirm by running the command below:

kubectl --kubeconfig=kubeconfig.yaml get deploy

NAME            READY   UP-TO-DATE   AVAILABLE   AGE
test-api        3/3     3            3           12d

NAME → The deployment name (test-api).
READY (3/3) → Shows replicas: 3 pods are ready out of 3 desired replicas.
UP-TO-DATE (3) → Number of replicas that match the latest Deployment spec.
AVAILABLE (3) → Number of replicas actually available to serve traffic
AGE (12d) → How long the Deployment has existed.
So the replica count is reflected in the READY column (3/3). If your Deployment was set to 5 replicas, you’d see 5/5 here when all are ready.

If replicas = 1, scale up before starting the upgrade by running

kubectl --kubeconfig=kubeconfig.yaml scale deployment test-api --replicas=3

Step 2: Edit the Node Group / Instance Type

Add new EC2 instances with the t3.medium type (either by updating the existing node group or creating a new one).

Join them to your cluster using your EKS configuration (this happens automatically if you update the node group, or manually if you create new nodes).

Label the new nodes (e.g., node-type=medium) so you can target workloads with nodeSelector or nodeAffinity. This ensures pods are scheduled onto the new t3.medium nodes first, before draining the old t3.small ones.

kubectl --kubeconfig=kubeconfig.yaml label nodes test-api-new-instance instance-type=t3.medium

This command applies a label to a Kubernetes node. Specifically, it adds (or updates) the label instance-type=t3.medium on the node named test-api-new-instance.

These are the current 2 instances we have:

i-036xxxxxxxxxx49ab is the t3.small node we will like to take out.

Step 3: Use nodeSelector or Affinity

Now that you’ve added new EC2 t3.medium instances to your EKS cluster, you want Kubernetes to place your workloads on them instead of the older t3.small instances.

Each EC2 instance in the cluster gets a label that includes its instance type. For example, a t3.medium EC2 will have:

kubernetes.io/instance-type=t3.medium

By updating your Deployment with a nodeSelector, you instruct Kubernetes to run your pods only on the t3.medium EC2 instances:

# deployment.yml
spec:
  template:
    spec:
      nodeSelector:
        kubernetes.io/instance-type: t3.medium

Step 4: Trigger a rolling update

Once the new nodes are ready and labeled, restart your Deployment to shift workloads:

kubectl --kubeconfig=kubeconfig.yaml -n production rollout restart deployment test-api

Kubernetes will then:

Spin up new pods on t3.medium nodes
Wait for them to be Ready
Gradually terminate old pods running on t3.small nodes

Step 5: Drain & remove old nodes

Once your application is fully running on the new nodes, clean up the old ones by draining them:

kubectl --kubeconfig=kubeconfig.yaml drain i-036xxxxxxxxxx49ab --ignore-daemonsets --delete-emptydir-data

This gracefully evicts pods from the old node while ignoring system DaemonSets. After the node is drained, you can safely terminate the t3.small EC2 instance from the AWS console or CLI.

Conclusion

Upgrading EKS node groups safely ensures better performance and cost efficiency without downtime. With careful scaling, labeling, and draining, your workloads move smoothly to the new instances.

Additional Resources

If you encounter any issues following this guide, migrating or upgrading your node groups, feel free to reach me on LinkedIn

DEV Community