Vertical Pod Autoscaler in Production: In-Place Resize Works — Until It Doesn't

#devops #platformengineering #cloud #kubernetes

Kubernetes 1.35 made in-place pod resize stable. Most of the coverage stopped there.

The narrative wrote itself: Vertical Pod Autoscaler finally works for stateful workloads. No more restarts. Enable InPlaceOrRecreate and let the autoscaler do its job. The restart tax is gone.

That framing is accurate about one thing and misleading about everything else.

In-place resize eliminates restart disruption. It does not eliminate the other reasons VPA automation fails in production. VPA has five failure modes. Kubernetes 1.35 fixed one of them.

Here's what's still live.

Failure Mode 1: Node Capacity Constraints

InPlaceOrRecreate has a fallback path the feature name obscures: if the node can't satisfy the resize in-place, VPA evicts the pod anyway.

The failure is silent. The pod restarts on a new node. It looks like a normal rescheduling event.

kubectl describe vpa <vpa-name> -n <namespace>
kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace>

Look for EvictedByVPA events followed by scheduling delays. If the pod is landing on a new node each cycle, the in-place attempt failed and the fallback fired.

Fix: Node headroom ≥ 20% before enabling automation. Tight clusters trigger the fallback path regularly.

Failure Mode 2: The JVM Heap Ceiling

VPA raises the cgroup memory limit. The JVM ignores it.

-Xmx is a startup parameter. Raising the container memory limit after startup doesn't raise the heap ceiling. Container memory usage stays flat. VPA interprets this as confirmation the recommendation worked. The GC pressure continues.

containers:
- name: app
  resources:
    limits:
      memory: "4Gi"   # VPA raised this from 2Gi
  env:
  - name: JAVA_OPTS
    value: "-Xmx1536m"   # Heap ceiling hasn't moved

Fix: resizePolicy: RestartContainer for memory on JVM workloads. Silent heap expansion is worse than no resize — VPA thinks it fixed the problem.

Failure Mode 3: Memory Shrink Is Dangerous

VPA recommends less memory. The kubelet attempts to lower the cgroup limit. If the process is still holding peak allocator footprint — glibc and the JVM GC don't release memory proactively — the gap between VPA's target and retained memory is where OOM kills happen.

kubectl top pod <pod-name> -n <namespace>
kubectl describe vpa <vpa-name> -n <namespace> | grep -A 5 "Lower Bound\|Target\|Upper Bound"

Compare VPA's lower bound against current consumption. If the gap is small, you're operating close to the OOM boundary.

Fix: Disable downward memory automation on workloads with large allocator footprints. Set updatePolicy.minAllowed conservatively.

Failure Mode 4: Scheduler Fragmentation

In-place resize keeps the pod on its node and consumes more of it. At scale — hundreds of services, tight bin-packing — repeated resizes fragment node capacity. Pending pods appear even when aggregate cluster capacity looks sufficient.

kubectl get pods --all-namespaces --field-selector=status.phase=Pending
kubectl describe node <node-name> | grep -A 10 "Allocated resources"

Pending pods + fragmented node capacity (enough total, no single node with enough headroom) = in-place resize fragmentation contributing.

Fix: VPA + cluster autoscaler coordination. Headroom settings must account for in-place resize growth, not just new pod scheduling demand.

Failure Mode 5: VPA Recommendation Drift

VPA's default observation window is 8 days. For batch jobs, weekly traffic cycles, or anything with irregular load — the recommendation window may capture only the low-load period. Automation applies the recommendation. High-load returns. OOM kill or CPU throttle.

resourcePolicy:
  containerPolicies:
  - containerName: app
    minAllowed:
      cpu: 100m
      memory: 512Mi
    maxAllowed:
      cpu: 4
      memory: 8Gi

Fix: minAllowed acts as a floor. Set it conservatively for variable workloads. Cross-reference Status.Recommendation against your observability stack before trusting automation.

Before You Enable Automation

Node headroom ≥ 20%
resizePolicy: RestartContainer for memory on JVM workloads
Disable downward memory automation on allocator-heavy workloads
Extend observation window for variable load patterns
Start with CPU automation only
Run Off mode for 2+ weeks before enabling
Test per workload class in non-production

The Verdict

1.35 solved the restart problem. That was the right problem to solve.

The mistake is treating restart elimination as the whole problem. Start in observation mode, validate per workload class, bound recommendations with minAllowed and maxAllowed, and treat memory shrink automation as higher risk than memory growth.

The teams that get the most out of this feature are the ones who understood why VPA automation was disabled in the first place — and addressed those reasons deliberately.

Full post on Rack2Cloud with detailed diagnostics, YAML configs, and the pre-automation checklist:
Vertical Pod Autoscaler in Production: In-Place Resize Works — Until It Doesn't