Seamless scaling with VPA In-place Pod Resize on GKE

#kubernetes #ai #gke #googlecloud

Right-sizing Kubernetes workloads is a common platform engineering challenge. Set your requests too high, and you burn cloud budgets on idle capacity; set your limits too low, and your applications face throttling or dreaded OOMKills.

For years, the Vertical Pod Autoscaler (VPA) has been the standard answer to this problem, automatically adjusting CPU and memory requirements based on actual usage. However, this method of scaling came with a significant catch that prevented widespread adoption for critical workloads: applying new resource parameters required evicting and restarting the pod.

This disruption was often unacceptable for stateful applications, long-running connections, or latency-sensitive services.

Introducing In-place Pod Resize (IPPR) on GKE

In-place Pod Resize (IPPR) changes the game by allowing Kubernetes to modify resource requests and limits on live, running containers directly through the underlying container runtime, without triggering a restart.

By combining the intelligence of VPA with the non-disruptive nature of IPPR, GKE users finally have a viable path to dynamic, seamless, and automated right-sizing.

Note: As of writing, VPA IPPR is in Preview on GKE. While it is a massive step forward, I recommend evaluating it in staging environments before rolling it out to production workloads.

Getting started with IPPR

To use In-place Pod Resize, you need a GKE cluster running version 1.34.0-gke.2201000 or later.

GKE Autopilot: VPA is enabled by default.
GKE Standard: Requires the Vertical Pod Autoscaling feature to be enabled.

1. Enable the feature

If you aren't using Autopilot, ensure your cluster is created or updated with the necessary feature flags:

gcloud container clusters create CLUSTER_NAME \
  --project=PROJECT_ID \
  --location=us-east1 \
  --release-channel=rapid \
  --enable-vertical-pod-autoscaling

2. Define your VPA object

Create a VerticalPodAutoscaler resource targeting your Deployment or StatefulSet. The crucial element here is setting spec.updatePolicy.updateMode to InPlaceOrRecreate.

apiVersion: "autoscaling.k8s.io/v1"
kind: "VerticalPodAutoscaler"
metadata:
  name: "my-vpa"
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: "Deployment"
    name: "my-deployment"
  updatePolicy:
    updateMode: "InPlaceOrRecreate"

3. Watch it scale

Apply the resource to your cluster and monitor your application under load. Instead of watching Pods terminate and recreate, you can watch the resources modify live using kubectl describe.

kubectl describe pod POD_NAME

Look for the AllocatedResources field or check the events section. You will see the requests change in real-time to match the VPA recommendations, while the Restart Count remains exactly the same.

The "Or Recreate" Fallback: Keep in mind that physics still apply. If VPA recommends a resource size that exceeds the remaining capacity of the Node your Pod is currently running on, an in-place resize is impossible. In this scenario, VPA will fall back to evicting and recreating the Pod so it can be scheduled onto a larger or emptier Node.

Ready to dive deeper?

While this introduction covers the basics of IPPR, right-sizing is just one part of a robust scaling strategy. Implementing VPA often goes hand-in-hand with horizontal scaling and cluster autoscaling. Check out the guide to master scaling on GKE: Run full-stack workloads at scale on GKE.