DEV Community

cloud-sky-ops
cloud-sky-ops

Posted on

Post 3/10 — Safe Cadence: Kubernetes Upgrades, Version Skew, and Feature Gates

Executive Summary

Upgrades are where clusters live—or die—by their discipline. A rushed kubectl upgrade can silently break workloads, APIs, or CRDs.
In this post, I’ll walk you through how to plan, test, and execute Kubernetes upgrades safely, understand version-skew guarantees, manage feature gates, and validate everything before production.

By the end, you’ll know why cadence matters more than version chasing.


Prereqs

  • Familiarity with kubectl, kubeadm, or managed control planes (EKS, GKE, AKS).
  • Working understanding of pods, CRDs, controllers, and API objects.
  • Access to a staging or pre-prod environment where you can test.

Concepts

1️⃣ Support Windows & Version Skew

Kubernetes maintains a 1-year support window with three active minor releases (e.g., 1.29 → 1.31).
Control plane and kubelet skew rules:

Component Supported Skew
kube-apiserver vs others ±1 minor
kube-controller-manager, scheduler same as API Server
kubelet up to 1 minor older
kubectl ±1 minor

Decision cue: If your nodes or clients are >1 minor behind control plane → plan upgrade immediately.

Before → After

Before After
1.24 control plane, 1.21 nodes ❌ Unsupported (skew = 3)
1.28 control plane, 1.27 nodes ✅ Supported (skew = 1)

2️⃣ Upgrade Order: Control Plane → Nodes

Always upgrade control plane first, then worker nodes, then addons.

Flow:
etcd → apiserver → controller-manager → scheduler → kubelet → kube-proxy → CNI → CSI → custom controllers

Decision cue:

Never let newer nodes talk to an older API Server.
The API Server must always be the newest component in the cluster.


3️⃣ Deprecation Checklist & API Migration

Each minor release deprecates APIs. To find the list, run:

kubectl krew install deprecations
kubectl deprecations --k8s-version v1.31
kubectl deprecations --k8s-version 1.31
Enter fullscreen mode Exit fullscreen mode

Or if you don’t have that plugin:

kubectl get --raw /openapi/v2 | grep v1beta1
Enter fullscreen mode Exit fullscreen mode

Then use:

kubectl convert -f old.yaml --output-version apps/v1 > new.yaml
Enter fullscreen mode Exit fullscreen mode

Before → After Example

Before:

apiVersion: apps/v1beta2
kind: Deployment
Enter fullscreen mode Exit fullscreen mode

After:

apiVersion: apps/v1
kind: Deployment
Enter fullscreen mode Exit fullscreen mode

Decision cue:
→ Fix API versions before upgrading; otherwise manifests may fail silently during rollout.


4️⃣ Feature Gate Evaluation in Pre-Prod

Feature gates toggle experimental behavior.
Check current state:

kube-apiserver --help | grep feature-gates
Enter fullscreen mode Exit fullscreen mode

Enable safely:

apiServer:
  extraArgs:
    feature-gates: "PodDisruptionPolicy=false,JobPodFailurePolicy=true"
Enter fullscreen mode Exit fullscreen mode

Decision cue:

Always enable gates only in pre-prod first; validate e2e conformance before rollout.


🧪 Mini-Lab — Plan a Minor Upgrade

Goal: Upgrade 1.29 → 1.30 in staging safely.

# 1. View current versions
kubectl get nodes -o wide

# 2. Drain one node
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

# 3. Upgrade control plane
sudo kubeadm upgrade apply v1.30.1

# 4. Upgrade kubelet/kubectl
sudo apt install kubelet=1.30.1-00 kubectl=1.30.1-00
sudo systemctl restart kubelet

# 5. Uncordon and validate
kubectl uncordon node-1
kubectl get nodes
Enter fullscreen mode Exit fullscreen mode

Validate with conformance tests:

sonobuoy run --mode=certified-conformance
sonobuoy status
sonobuoy retrieve .
Enter fullscreen mode Exit fullscreen mode

Toggle a feature gate safely:

kubectl patch deployment kube-apiserver \
  -n kube-system --type merge \
  -p '{"spec":{"template":{"spec":{"containers":[{"name":"kube-apiserver","args":["--feature-gates=DynamicResourceAllocation=true"]}]}}}}'
Enter fullscreen mode Exit fullscreen mode

Rollback with kubectl diffkubectl apply -f rollback.yaml --server-side --dry-run=server.


Cheatsheet

Task Command
List deprecations kubectl deprecations --k8s-version X.Y
Convert manifest kubectl convert -f old.yaml --output-version apps/v1
Dry-run on server kubectl apply -f file.yaml --server-side --dry-run=server
Diff check kubectl diff -f manifests/
Upgrade order etcd → control plane → nodes → addons → CRDs

Pitfalls

CRD / Version Drift
CRDs registered via Helm or Operators may pin old API versions (e.g., apiextensions.k8s.io/v1beta1).
→ Always kubectl get crd and check .spec.versions.

Admission Policy Surprises
Mutating / Validating webhooks compiled against old libraries can block Pods post-upgrade.
→ Audit webhooks with kubectl get validatingwebhookconfigurations and test dry-runs in staging.

Etcd Storage Version Drift
Even after API migrated, etcd may still store old serialized versions.
→ Use kube-storage-version-migrator to reconcile.


Diagram — Upgrade Workflow

diagram-one


✅ Wrap-Up

Upgrades are not a sprint—they’re a cadence.
Each minor version should move through your environments with predictable rhythm, pre-validation, and rollback clarity.

Next in the series (Post 4) → Smart Scaling & Cost Control: HPA/KEDA + Cluster Autoscaler vs Karpenter.


Top comments (0)