We migrated 3 EKS clusters on AWS to Karpenter. Here's what nobody warns you about.

#devops #kubernetes #karpenter #aws

Most teams switch for the cost savings. They stay for the headaches they didn't see coming.

Before Karpenter, we were running fixed managed node groups — over-provisioned, expensive, and sized by gut feel. Cluster Autoscaler helped, but it scaled based on what you pre-configured, not what your workloads actually needed. You were still making the sizing decisions. You were just automating the slow parts.

Karpenter changes the contract entirely. It provisions exactly what your pods ask for. Which sounds great - until you realise your pods have been lying about what they need.

Here's what actually happened, and what to do instead.

The bootstrap deadlock nobody mentions
Karpenter can't run on nodes it provisions. It has a nodeAffinity that actively avoids them.If you remove your managed node group too early, Karpenter has nowhere to live.
Fix: Always keep a small managed node group tainted with karpenter.sh/controller. It runs Karpenter and CoreDNS. Nothing else.

Karpenter only sees requests. Not limits.
If your pods have requests of 200Mi but limits of 1Gi, Karpenter packs 5 pods onto a node and thinks it's fine.
When those pods actually use 400Mi each — the node explodes.
Fix: Align requests closer to actual usage. Use VPA in recommendation-only mode to get data from real historical usage, not guesses.

Never allow "small" instances in production NodePools
On smaller instance types, DaemonSets alone (aws-node, kube-proxy, ebs-csi, log shippers) can eat 30-40% of allocatable memory before a single application pod lands.
Fix: Remove small instance sizes from your NodePool requirements. Set a minimum memory floor. We use 4Gi as the baseline.

PDBs won't save you from OOMKill
PodDisruptionBudgets protect against voluntary disruptions — consolidation, node drains.
The kernel doesn't ask permission. It kills the process instantly.
Worse: pods deployed at the same time leak memory at the same rate and OOM simultaneously. nginx has no backends. 503s for everyone.
Fix: For Node.js workloads, set your V8 heap limit below your container memory limit. Give the garbage collector a chance before the kernel gets involved.

Missing IAM permissions show up as cryptic failures
Everyone knows Karpenter needs the obvious EC2 and IAM permissions.
Almost nobody mentions iam:AddRoleToInstanceProfile and iam:RemoveRoleFromInstanceProfile.Missing these doesn't throw a clear error. It just quietly fails to launch nodes and you spend an hour staring at logs.
Fix: Audit your Karpenter controller IAM role before your first migration, not during it.

The migration order matters more than you think
Infrastructure dependencies have a sequence — get it wrong and you'll be untangling failures instead of validating success.
The same applies to environment rollouts. Don't treat lower environments as a formality. They exist to surface exactly the problems your production can't afford to have.
Fix: Define your rollout order before you start and treat each environment as a proper stage gate with explicit validation criteria.

The real lesson?
Karpenter is genuinely excellent. But it shifts responsibility — from "provision nodes manually" to "understand exactly what your pods need."

Cluster Autoscaler let you get away with sloppy resource definitions. Karpenter exposes them. Loudly. In production.

Get your requests and limits honest before you migrate. Everything else is manageable.