Memory ballooning is a host memory reclaim method used during VM overcommit. The hypervisor inflates a balloon driver inside a VM to claw back RAM.
It can avoid host swapping, but it also shrinks guest page cache and can trigger paging. In Kubernetes, you see MemoryPressure, pod evictions, and tail-latency spikes.
What memory ballooning is
Ballooning is cooperative reclaim. It’s not “free memory.”
On VMware, the balloon driver (vmmemctl) works with the host to reclaim pages the guest considers least valuable.
VMware’s own perf guidance is blunt: avoid overcommit that forces regular host swapping, because that’s where performance collapses.
What you actually see in a managed Kubernetes service
You don’t see “ballooned MB.” You see consequences.
Kubelet enforces node-pressure eviction. Default hard threshold on Linux is memory.available<100Mi, and hard evictions have no grace period.
So any reclaim event that drops memory.available can turn into kills.
How ballooning pressure turns into outages on K8s nodes
This is the failure chain you should expect under overcommit.
- Cache gets punched → more disk reads → p95 climbs.
- Paging starts → jitter rises.
- Kubelet evicts → restarts + thundering herd.
You don’t need hypervisor access to catch this. You just need node metrics and events.
What AceCloud gives you to control blast radius
You control node sizing and node-group policy, not the host reclaim knobs.
AceCloud Managed Kubernetes exposes worker node configurations like 2 vCPU/4 GiB, 4 vCPU/8 GiB, 8 vCPU/16 GiB (their published comparison table).
If you need bigger worker nodes, AceCloud’s flavor catalog shows Standard Instance options like S3a.2xlarge (8 vCPU/32 GiB) up through S3a.8xlarge (32 vCPU/128 GiB) and beyond.
Guardrails that work when overcommit is “yes”
These are defaults you can deploy without cluster-specific tuning.
Split worker node groups by risk
One node pool for prod latency. One for batch.
- Protected pool: ingress, API, user-facing services.
- Best-effort pool: ETL, async jobs, rebuilds.
This keeps batch from turning your prod nodes into the provider’s pressure valve.
Enforce requests/limits everywhere
The scheduler packs based on requests. If you don’t set them, you’re gambling.
Use Kubernetes resource requests/limits for CPU and memory.
For latency pods, run Guaranteed QoS: requests == limits.
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "1"
memory: "2Gi"
Keep headroom by design
If you can’t tune Node Allocatable, you simulate it with request budgets.
Kubernetes calls this concept Node Allocatable (reserving resources for system daemons).
In a managed service, you may not get to set kube-reserved / system-reserved, so leave headroom in pod requests.
Baseline rule (protected nodes): don’t schedule more than 75% of node RAM by requested memory.
Pod density: don’t chase 110
Kubernetes “supported scale” guidance says no more than 110 pods per node.
Some platforms can configure higher, but pod IP and CNI limits usually bite first.
Use caps that match memory, not bragging rights.
Starting caps for AceCloud-sized worker nodes
Assumptions: typical daemonsets, no hugepages/DPDK, overcommit exists somewhere upstream.
|
Worker node |
Role |
Max total pod memory requests |
Pod cap |
Why |
|
4 GiB |
best-effort |
2.5–3.0 GiB |
15–25 |
leaves OS+kube headroom |
|
8 GiB |
protected |
5.5–6.0 GiB |
25–40 |
avoids eviction on small dips |
|
16 GiB |
protected |
11–12 GiB |
40–70 |
room for spikes + cache |
|
32 GiB |
mixed |
24–26 GiB |
70–110 |
only if requests are real |
Anchor: the 110-pods/node guidance is a ceiling, not a target.
Evictions: make them predictable
If you can’t set kubelet flags, you still control which pods die first.
- Assign PriorityClasses.
- Put best-effort on best-effort nodes.
- Put strict limits on batch so it can’t eat the node.
Know the kubelet defaults: memory.available<100Mi is the hard tripwire on Linux.
Swap: pick a stance and document it
Swap support exists now, but it’s not “turn it on and pray.”
Kubernetes documents swap memory management and node swap behaviors (including LimitedSwap).
Practical policy:
- Protected nodes: swap off unless you’ve load-tested tail latency with swap on.
- Best-effort nodes: consider LimitedSwap if you accept slower jobs.
What to alert on (works in any managed K8s)
You don’t need vCenter. You need signals.
Kubernetes-level
- Node condition: MemoryPressure=True
- Events: eviction messages
(Default eviction behavior is documented upstream.)
Node-level (Prometheus / node-exporter)
Alert on:
- sustained low MemAvailable
- paging activity (pgmajfault, pswpin, pswpout)
- memory PSI pressure rising
If those light up during latency spikes, you’re in reclaim/paging territory.
Where AceCloud fits in this story
This is how you use their catalog without lying to yourself.
- Start with AceCloud’s published worker sizes (4/8/16 GiB) for general pools.
- For memory-heavy services (Kafka, JVM heaps, model servers), move the protected pool to bigger flavors from the standard catalog (ex: 8 vCPU/32 GiB and up).
- Scale node groups earlier instead of packing nodes to the cliff. Node-group autoscaling is part of their managed Kubernetes offering.
If you want the “tight” version for your cluster
You can do it later from any terminal with cluster access, but you don’t need it to start.
- Use the caps table above.
- Enforce requests/limits + PriorityClasses.
- Split node groups.
- Keep 20–25% memory headroom on protected nodes.
That stops the common eviction storm even when the provider is running overcommit behind the scenes.
Top comments (0)