DEV Community

Cover image for Memory Ballooning Effects in Virtualized Cloud Environments
Daya Shankar
Daya Shankar

Posted on

Memory Ballooning Effects in Virtualized Cloud Environments

Memory ballooning is a host memory reclaim method used during VM overcommit. The hypervisor inflates a balloon driver inside a VM to claw back RAM. 

It can avoid host swapping, but it also shrinks guest page cache and can trigger paging. In Kubernetes, you see MemoryPressure, pod evictions, and tail-latency spikes. 

What memory ballooning is

Ballooning is cooperative reclaim. It’s not “free memory.”

On VMware, the balloon driver (vmmemctl) works with the host to reclaim pages the guest considers least valuable. 

VMware’s own perf guidance is blunt: avoid overcommit that forces regular host swapping, because that’s where performance collapses. 

What you actually see in a managed Kubernetes service

You don’t see “ballooned MB.” You see consequences.

Kubelet enforces node-pressure eviction. Default hard threshold on Linux is memory.available<100Mi, and hard evictions have no grace period.
So any reclaim event that drops memory.available can turn into kills.

How ballooning pressure turns into outages on K8s nodes

This is the failure chain you should expect under overcommit.

  1. Cache gets punched → more disk reads → p95 climbs.
  2. Paging starts → jitter rises.
  3. Kubelet evicts → restarts + thundering herd.

You don’t need hypervisor access to catch this. You just need node metrics and events.

What AceCloud gives you to control blast radius

You control node sizing and node-group policy, not the host reclaim knobs.

AceCloud Managed Kubernetes exposes worker node configurations like 2 vCPU/4 GiB4 vCPU/8 GiB8 vCPU/16 GiB (their published comparison table).
If you need bigger worker nodes, AceCloud’s flavor catalog shows Standard Instance options like S3a.2xlarge (8 vCPU/32 GiB) up through S3a.8xlarge (32 vCPU/128 GiB) and beyond. 

Guardrails that work when overcommit is “yes”

These are defaults you can deploy without cluster-specific tuning.

Split worker node groups by risk

One node pool for prod latency. One for batch.

  • Protected pool: ingress, API, user-facing services.
  • Best-effort pool: ETL, async jobs, rebuilds.

This keeps batch from turning your prod nodes into the provider’s pressure valve.

Enforce requests/limits everywhere

The scheduler packs based on requests. If you don’t set them, you’re gambling.

Use Kubernetes resource requests/limits for CPU and memory.
For latency pods, run Guaranteed QoS: requests == limits.

resources: 
requests: 
cpu: "1" 
memory: "2Gi" 
limits: 
cpu: "1" 
memory: "2Gi" 

Keep headroom by design

If you can’t tune Node Allocatable, you simulate it with request budgets.

Kubernetes calls this concept Node Allocatable (reserving resources for system daemons).
In a managed service, you may not get to set kube-reserved / system-reserved, so leave headroom in pod requests.

Baseline rule (protected nodes): don’t schedule more than 75% of node RAM by requested memory.

Pod density: don’t chase 110

Kubernetes “supported scale” guidance says no more than 110 pods per node.
Some platforms can configure higher, but pod IP and CNI limits usually bite first. 

Use caps that match memory, not bragging rights.

Starting caps for AceCloud-sized worker nodes

Assumptions: typical daemonsets, no hugepages/DPDK, overcommit exists somewhere upstream.

Worker node

Role

Max total pod memory requests

Pod cap

Why

4 GiB

best-effort

2.5–3.0 GiB

15–25

leaves OS+kube headroom

8 GiB

protected

5.5–6.0 GiB

25–40

avoids eviction on small dips

16 GiB

protected

11–12 GiB

40–70

room for spikes + cache

32 GiB

mixed

24–26 GiB

70–110

only if requests are real

Anchor: the 110-pods/node guidance is a ceiling, not a target. 

Evictions: make them predictable

If you can’t set kubelet flags, you still control which pods die first.

  • Assign PriorityClasses.
  • Put best-effort on best-effort nodes.
  • Put strict limits on batch so it can’t eat the node.

Know the kubelet defaults: memory.available<100Mi is the hard tripwire on Linux. 

Swap: pick a stance and document it

Swap support exists now, but it’s not “turn it on and pray.”

Kubernetes documents swap memory management and node swap behaviors (including LimitedSwap). 

Practical policy:

  • Protected nodes: swap off unless you’ve load-tested tail latency with swap on.
  • Best-effort nodes: consider LimitedSwap if you accept slower jobs.

What to alert on (works in any managed K8s)

You don’t need vCenter. You need signals.

Kubernetes-level

  • Node condition: MemoryPressure=True
  • Events: eviction messages

(Default eviction behavior is documented upstream.) 

Node-level (Prometheus / node-exporter)

Alert on:

  • sustained low MemAvailable
  • paging activity (pgmajfault, pswpin, pswpout)
  • memory PSI pressure rising

If those light up during latency spikes, you’re in reclaim/paging territory.

Where AceCloud fits in this story

This is how you use their catalog without lying to yourself.

  • Start with AceCloud’s published worker sizes (4/8/16 GiB) for general pools. 
  • For memory-heavy services (Kafka, JVM heaps, model servers), move the protected pool to bigger flavors from the standard catalog (ex: 8 vCPU/32 GiB and up). 
  • Scale node groups earlier instead of packing nodes to the cliff. Node-group autoscaling is part of their managed Kubernetes offering. 

If you want the “tight” version for your cluster

You can do it later from any terminal with cluster access, but you don’t need it to start.

  • Use the caps table above.
  • Enforce requests/limits + PriorityClasses.
  • Split node groups.
  • Keep 20–25% memory headroom on protected nodes.

That stops the common eviction storm even when the provider is running overcommit behind the scenes.

Top comments (0)