For years, upgrading an EKS control plane was a one-way door. You bumped the Kubernetes minor version, and if something broke afterwards, you had to fix forward — under pressure, in production, with no escape hatch. AWS just changed that: Amazon EKS now supports Kubernetes version rollback.
The problem: upgrades you couldn't take back
You move a cluster from 1.29 to 1.30. The upgrade succeeds. Hours later, something misbehaves — a deprecated API your workloads still call, an add-on that isn't happy on the new version, a controller acting strangely under real traffic.
Until now, there was no going back. The control plane version was a one-way ratchet, and your only option was to debug and patch in place. That fear is exactly why so many teams sit several versions behind.
What AWS shipped
-
Revert to the previous minor version (roll
1.30back to1.29) - Within 7 days of the upgrade completing
- Trigger it from the console, AWS CLI, or SDK
- No additional cost, in all regions where EKS is available
It's not a blind revert: readiness checks
Before rolling back, EKS evaluates rollback readiness insights — automated checks including:
- API compatibility — will your workloads still work on the older version?
- Version skew — the allowed gap between control plane and nodes
- Add-on compatibility — managed add-ons that must match the target version
- Cluster health — general readiness signals
For EKS Auto Mode clusters, EKS rolls back the worker nodes first, then the control plane, honoring your disruption controls — the correct order given version skew rules.
"In-place upgrade only" — what that means
There are two ways to move an EKS cluster to a new version:
| In-place upgrade | Blue/green (migration) | |
|---|---|---|
| What upgrades | The existing cluster (1.29 → 1.30) | A new cluster; migrate workloads over |
| Rollback method | AWS's new 7-day rollback | Point back to the old cluster (it still exists) |
| Effort | Low — a click / one API call | High — build, migrate, cut over |
With blue/green you already have a rollback path (the old cluster is still running), so there's nothing for AWS to revert. The new feature targets the far more common in-place path, where you previously had no safety net at all.
The catch
This is a safety net, not an "undo anytime" button:
- In-place upgrades only — blue/green isn't covered (and doesn't need it)
- 7-day window — once it closes, the previous version is gone
The intent: use the window to validate the new version under real production load, and pull the ripcord fast if it goes wrong.
How this should change your playbook
- Still test in non-prod first. Rollback lowers risk; it doesn't remove the need to test.
- Treat the 7 days as an active validation window. Deliberately exercise the risky paths — deprecated APIs, finicky add-ons, peak load — while the escape hatch is open.
- Know your rollback command before you need it. Don't learn the API during an incident.
- Mind version skew on nodes — which is exactly why Auto Mode sequences nodes first.
Takeaway
This quietly removes one of the scariest parts of operating EKS. Upgrades stop being a leap of faith and become something you can validate and, if needed, reverse. If you've been sitting a few versions behind because upgrading felt too risky, it's a good moment to revisit that.
Originally published on srinun.in. I write about DevOps, AWS, and Kubernetes — connect with me on LinkedIn.
Top comments (0)