Upgrading Kubernetes node groups is key for better performance, scaling, and cost savings. But doing it carelessly can cause downtime. If you’re running on Amazon EKS (Elastic Kubernetes Service), AWS’s managed Kubernetes platform, this process becomes easier, but you still need a careful approach.
This guide walks you through a safe, step-by-step upgrade (using an example of moving from t3.small
to t3.medium
in EKS) without any disruption to your workloads.
Prerequisites
Make sure your apps have more than one replica, you can access the cluster with kubectl or your cloud CLI, and you understand how node pools/node groups work
In this tutorial, I’ll demonstrate upgrading from an AWS t3.small node to t3.medium node. You’ll notice commands use a — kubeconfig =kubeconfig.yaml
flag. This explicitly points kubectl to the correct Kubernetes cluster configuration file, which is useful when managing multiple clusters or when your kubeconfig isn’t in the default location (~/.kube/config
)
Step 1: Ensure Your Deployment Has Multiple Replicas
Check that your Deployments or StatefulSets run more than one replica, this redundancy keeps apps available during rolling updates.
#deployment.yml
spec:
replicas: 3
You can confirm by running the command below:
kubectl --kubeconfig=kubeconfig.yaml get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
test-api 3/3 3 3 12d
NAME → The deployment name (test-api
).
READY (3/3) → Shows replicas: 3 pods are ready out of 3 desired replicas.
UP-TO-DATE (3) → Number of replicas that match the latest Deployment spec.
AVAILABLE (3) → Number of replicas actually available to serve traffic
AGE (12d) → How long the Deployment has existed.
So the replica count is reflected in the READY column (3/3
). If your Deployment was set to 5 replicas, you’d see 5/5
here when all are ready.
If replicas = 1, scale up before starting the upgrade by running
kubectl --kubeconfig=kubeconfig.yaml scale deployment test-api --replicas=3
Step 2: Edit the Node Group / Instance Type
Add new EC2 instances with the t3.medium
type (either by updating the existing node group or creating a new one).
Join them to your cluster using your EKS configuration (this happens automatically if you update the node group, or manually if you create new nodes).
Label the new nodes (e.g., node-type=medium
) so you can target workloads with nodeSelector
or nodeAffinity
. This ensures pods are scheduled onto the new t3.medium
nodes first, before draining the old t3.small
ones.
kubectl --kubeconfig=kubeconfig.yaml label nodes test-api-new-instance instance-type=t3.medium
This command applies a label to a Kubernetes node. Specifically, it adds (or updates) the label instance-type=t3.medium on the node named test-api-new-instance.
These are the current 2 instances we have:
i-036xxxxxxxxxx49ab is the t3.small node we will like to take out.
Step 3: Use nodeSelector or Affinity
Now that you’ve added new EC2 t3.medium instances to your EKS cluster, you want Kubernetes to place your workloads on them instead of the older t3.small instances.
Each EC2 instance in the cluster gets a label that includes its instance type. For example, a t3.medium EC2 will have:
kubernetes.io/instance-type=t3.medium
By updating your Deployment with a nodeSelector, you instruct Kubernetes to run your pods only on the t3.medium EC2 instances:
# deployment.yml
spec:
template:
spec:
nodeSelector:
kubernetes.io/instance-type: t3.medium
Step 4: Trigger a rolling update
Once the new nodes are ready and labeled, restart your Deployment to shift workloads:
kubectl --kubeconfig=kubeconfig.yaml -n production rollout restart deployment test-api
Kubernetes will then:
Spin up new pods on t3.medium
nodes
Wait for them to be Ready
Gradually terminate old pods running on t3.small
nodes
Step 5: Drain & remove old nodes
Once your application is fully running on the new nodes, clean up the old ones by draining them:
kubectl --kubeconfig=kubeconfig.yaml drain i-036xxxxxxxxxx49ab --ignore-daemonsets --delete-emptydir-data
This gracefully evicts pods from the old node while ignoring system DaemonSets. After the node is drained, you can safely terminate the t3.small EC2 instance from the AWS console or CLI.
Conclusion
Upgrading EKS node groups safely ensures better performance and cost efficiency without downtime. With careful scaling, labeling, and draining, your workloads move smoothly to the new instances.
Additional Resources
If you encounter any issues following this guide, migrating or upgrading your node groups, feel free to reach me on LinkedIn
Top comments (0)