DEV Community

srinu nuthi
srinu nuthi

Posted on • Originally published at srinun.in

EKS Upgrades Are No Longer a One-Way Door: Kubernetes Version Rollback

For years, upgrading an EKS control plane was a one-way door. You bumped the Kubernetes minor version, and if something broke afterwards, you had to fix forward — under pressure, in production, with no escape hatch. AWS just changed that: Amazon EKS now supports Kubernetes version rollback.

The problem: upgrades you couldn't take back

You move a cluster from 1.29 to 1.30. The upgrade succeeds. Hours later, something misbehaves — a deprecated API your workloads still call, an add-on that isn't happy on the new version, a controller acting strangely under real traffic.

Until now, there was no going back. The control plane version was a one-way ratchet, and your only option was to debug and patch in place. That fear is exactly why so many teams sit several versions behind.

What AWS shipped

  • Revert to the previous minor version (roll 1.30 back to 1.29)
  • Within 7 days of the upgrade completing
  • Trigger it from the console, AWS CLI, or SDK
  • No additional cost, in all regions where EKS is available

It's not a blind revert: readiness checks

Before rolling back, EKS evaluates rollback readiness insights — automated checks including:

  • API compatibility — will your workloads still work on the older version?
  • Version skew — the allowed gap between control plane and nodes
  • Add-on compatibility — managed add-ons that must match the target version
  • Cluster health — general readiness signals

For EKS Auto Mode clusters, EKS rolls back the worker nodes first, then the control plane, honoring your disruption controls — the correct order given version skew rules.

"In-place upgrade only" — what that means

There are two ways to move an EKS cluster to a new version:

In-place upgrade Blue/green (migration)
What upgrades The existing cluster (1.29 → 1.30) A new cluster; migrate workloads over
Rollback method AWS's new 7-day rollback Point back to the old cluster (it still exists)
Effort Low — a click / one API call High — build, migrate, cut over

With blue/green you already have a rollback path (the old cluster is still running), so there's nothing for AWS to revert. The new feature targets the far more common in-place path, where you previously had no safety net at all.

The catch

This is a safety net, not an "undo anytime" button:

  • In-place upgrades only — blue/green isn't covered (and doesn't need it)
  • 7-day window — once it closes, the previous version is gone

The intent: use the window to validate the new version under real production load, and pull the ripcord fast if it goes wrong.

How this should change your playbook

  1. Still test in non-prod first. Rollback lowers risk; it doesn't remove the need to test.
  2. Treat the 7 days as an active validation window. Deliberately exercise the risky paths — deprecated APIs, finicky add-ons, peak load — while the escape hatch is open.
  3. Know your rollback command before you need it. Don't learn the API during an incident.
  4. Mind version skew on nodes — which is exactly why Auto Mode sequences nodes first.

Takeaway

This quietly removes one of the scariest parts of operating EKS. Upgrades stop being a leap of faith and become something you can validate and, if needed, reverse. If you've been sitting a few versions behind because upgrading felt too risky, it's a good moment to revisit that.


Originally published on srinun.in. I write about DevOps, AWS, and Kubernetes — connect with me on LinkedIn.

Top comments (0)