Kubernetes VPA In-Place Pod Resize Is GA — Here's What Actually Changes for Stateful Workloads
The resource sizing trap on Kubernetes is well-documented. Set requests too low and your pod gets evicted. Set limits too high and you waste money. Set them at exactly the wrong level and your app gets CPU-throttled at the worst possible moment.
VPA (Vertical Pod Autoscaler) was supposed to fix this. And it does — for stateless workloads. The problem was always stateful services: traditional VPA has to evict and restart your pod to change resource allocations.
For Postgres, Kafka, Elasticsearch, or Redis, that's a restart. Which means connection drops, replication lag, or at minimum, a disruption your users notice. So teams running stateful workloads on Kubernetes either ignored VPA entirely, or used it with carefully tuned PodDisruptionBudgets and prayed during maintenance windows.
Kubernetes 1.35 changes this. VPA in-place pod resize — in development since 2020, alpha in 1.27, beta in 1.29 — is now GA.
Here's what that actually means in practice.
The Problem With Traditional VPA on Stateful Workloads
Quick recap on how VPA works:
VPA watches your pods, analyzes actual resource usage over time (via the VPA recommender), and applies adjustments to CPU and memory requests/limits. Three modes: Off (recommendations only), Initial (set once at pod creation), and Auto (continuously applies recommendations).
The "Auto" mode is where you want to be for real cost optimization. But until 1.35, "Auto" on a stateful pod meant this sequence at every resource change:
- VPA decides the pod needs different resources
- VPA evicts the pod
- Pod terminates — container stops, connections drop
- Pod reschedules — usually on the same node, sometimes not
- Container restarts — init time, warmup time, replica catch-up time
For a stateless API pod behind a load balancer: annoying but manageable. For a primary Postgres with 200 active connections or a Kafka broker mid-replication: that's an incident.
What In-Place Resize Changes
The core change: the kubelet can now update a container's resource allocations without restarting the container.
Specifically, it patches cgroups on the running container. For CPU, this is truly zero-disruption — cgroup limits change, no process restart, no connection drop. For memory, it's more nuanced:
- Increasing memory limits → zero disruption. Kubelet expands the cgroup.
- Decreasing memory limits → the container must have already freed that memory. If it hasn't, the kubelet waits (and optionally falls back to eviction).
This is the caveat you need to internalize: in-place resize for memory decreases isn't magic. It works when there's headroom. If your Postgres instance is actively using 4GB of the 8GB limit and VPA wants to lower it to 5GB, fine. If it wants to lower it to 3.5GB and the container is still holding 4GB in use — you're still looking at an eviction path, or the resize just waits.
Start with CPU-only in-place resize for critical stateful workloads. Add memory once you've observed behavior for a week or two.
Setting It Up
Assuming you're on K8s 1.35 and have VPA installed (kubernetes-sigs/autoscaler chart):
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: postgres-vpa
namespace: data
spec:
targetRef:
apiVersion: apps/v1
kind: StatefulSet
name: postgres
updatePolicy:
updateMode: "InPlaceOrRecreate"
resourcePolicy:
containerPolicies:
- containerName: postgres
minAllowed:
cpu: 250m
memory: 512Mi
maxAllowed:
cpu: 4
memory: 8Gi
controlledResources: ["cpu"] # start CPU-only
InPlaceOrRecreate is the key mode: VPA tries in-place first, falls back to recreate only if in-place isn't possible.
Your pod template also needs the resizePolicy field in the container spec:
containers:
- name: postgres
resizePolicy:
- resourceName: cpu
restartPolicy: NotRequired
- resourceName: memory
restartPolicy: NotRequired
restartPolicy: NotRequired is the optimistic setting — kubelet tries not to restart. If it can't satisfy the change without a restart (e.g., a memory decrease that exceeds headroom), it falls back regardless. This is the right default for most stateful workloads.
What This Enables in Practice
Three patterns that become genuinely viable now:
Right-sizing without scheduled maintenance windows. Before in-place resize, right-sizing a production Postgres meant picking a low-traffic window, updating the resource spec, watching the pod restart, monitoring for replication lag. With VPA in-place resize on CPU, that happens continuously and automatically — no maintenance window, no planned disruption.
Faster reaction to CPU spikes before HPA kicks in. HPA scales horizontally; VPA scales vertically. Traditionally they conflict and you pick one. With in-place resize, VPA can respond to a CPU spike by expanding the current pod's allocation within seconds, while HPA takes a few minutes to provision and warm up a new replica. Better first-response behavior, fewer cold-start gaps during traffic surges.
Cost optimization with lower operational risk. The main reason teams over-provision stateful workloads is fear of the eviction disruption during VPA-driven changes. With CPU in-place resize, that fear is gone for CPU. You can run VPA Auto mode and let it trim idle CPU from your Elasticsearch cluster at 3am without touching any running process. On a mid-size cluster, this kind of idle CPU recovery can add up to meaningful monthly savings.
What Will Bite You If You Don't Plan
Admission controllers and webhooks. If you have admission webhooks validating resource specs (Kyverno, OPA Gatekeeper, custom validators), they may reject VPA's in-place patches if the policy only allows resource changes at pod creation time. Test this before enabling Auto mode on production workloads.
StatefulSet rolling update contention. VPA operates at the pod level. StatefulSets have their own update controller. If both try to make changes simultaneously — say, a Helm upgrade changes the pod spec while VPA is mid-resize — behavior can be surprising. Add updatePolicy.updateMode: "Off" temporarily during planned StatefulSet rollouts.
Memory VPA recommendations that exceed headroom. Monitor what VPA is recommending vs. what it can actually apply in-place. A VPA recommendation that consistently requires eviction because memory headroom never materializes is worse than no VPA — you get the disruption without the control.
Observability gap. Add a panel to your Grafana dashboard tracking VPA recommendation vs. actual resource usage per pod. Without this, you're flying blind on whether VPA is converging on good values or oscillating. The VPA metrics are available in the vpa-recommender pod — expose them to Prometheus.
The Bottom Line
In-place pod resize has been in development for six years. That long runway means the feature is genuinely production-ready — not a "try it in staging" situation.
For teams currently avoiding VPA on stateful workloads because of the eviction problem: that blocker is largely gone for CPU. Memory follows, with the headroom caveat.
The practical starting point: add a VPA in Initial mode first to get baseline recommendations. Review them for a week. Then switch to InPlaceOrRecreate with controlledResources: ["cpu"]. Watch the behavior. Add memory once you're confident.
I still don't fully understand why it took six years for something that's conceptually "just update the cgroup." The kubelet surface area is apparently not "just" anything.
Have you hit the admission controller edge case in production, or found that memory headroom is the main limiting factor in practice? Curious what the real-world friction points are as more teams migrate to 1.35.
Top comments (0)