Recently, I noticed something strange on one of my KVM hypervisors.
The server wasn’t heavily loaded, but earlier I saw:
-
qemu-system-x86consuming 800%+ CPU -
kswapdrunning hot - Swap usage near 100%
But when I checked later:
- CPU was low
- RAM had plenty free
- Swap was still full
Here’s the exact troubleshooting flow I followed — and how you can do the same.
🧠 Environment Context
- Hypervisor: KVM + libvirt
- Host RAM: 314 GB
- Swap: 976 MB
- Multiple VMs running
- Problem VM:
testnet-node3
🔍 Step 1 — Identify High CPU Process
First signal:
ps -eo pid,comm,%cpu,%mem --sort=-%cpu | head -n 10
Output showed:
qemu-system-x86 818%
⚠️ Important: In Linux, 100% = 1 core.
So:
- 800% = ~8 cores fully used
That means one VM was heavily consuming CPU.
🔎 Step 2 — Identify Which VM Maps to That Process
Each VM is a qemu-system-x86 process.
To map PID to VM:
ps -fp <PID>
Or list VMs:
virsh list --all
To see details:
virsh dominfo <vm-name>
This is how I identified:
testnet-node3
📊 Step 3 — Check Host Memory & Swap
Next, I checked memory:
free -h
Output:
Mem: 314Gi total
217Gi used
94Gi free
Swap: 976Mi total
963Mi used
Swap was 98% used.
But RAM still had 94GB free.
This is where people panic incorrectly.
🧪 Step 4 — Check If System Is Under Active Memory Pressure
The key command:
vmstat 1 5
Focus on:
-
si→ swap in -
so→ swap out
If both are 0:
You are NOT under active memory pressure.
In my case:
si = 0
so = 0
Meaning:
- Swap usage was historical
- Not current
- System was stable
🔥 Why Swap Can Stay Full Even With Free RAM
Linux does NOT automatically move swapped pages back into RAM unless needed.
So:
- VM previously caused pressure
- Kernel swapped ~1GB
- Memory pressure disappeared
- Swap remained full
This is normal Linux behavior.
🧠 Step 5 — Check VM Memory Allocation
Then I inspected the VM:
virsh dominfo testnet-node3
Output:
Max memory: 98304000 KiB
Convert:
98304000 KiB ≈ 94 GB
So the VM had ~94GB allocated.
❓ Was The VM Actually Memory Starved?
Before increasing RAM, you must check inside the guest.
Inside VM:
free -h
vmstat 1 5
If inside the VM:
- Swap used
- OOM killer logs
- Memory >90% used
Then increasing RAM makes sense.
If not — CPU issue may be workload-related instead.
🚀 Step 6 — Increase VM RAM Safely
Since the VM was already stopped:
Target: 128 GB
128GB in KiB:
128 × 1024 × 1024 = 134217728 KiB
Commands:
virsh setmaxmem testnet-node3 128G --config
virsh setmem testnet-node3 128G --config
Verify:
virsh dominfo testnet-node3
Then start:
virsh start testnet-node3
📊 Step 7 — Verify Host Stability After Resize
After starting the VM:
free -h
Mem: 314Gi total
221Gi used
89Gi free
Swap: 0B used
Swap cleared.
Then:
vmstat 1 5
Confirmed:
si = 0so = 0- CPU idle high
System healthy.
🧩 Root Cause Pattern
Here’s the chain that usually happens:
- VM workload spikes
- Guest consumes heavy memory
- Host experiences memory pressure
- Host swap fills
- kswapd increases CPU
- qemu process CPU rises
- After workload stabilizes → swap remains full
Without checking vmstat, people misdiagnose this.
🛑 Common Mistakes
❌ Increasing RAM without checking guest usage
❌ Assuming 100% swap = system dying
❌ Ignoring vmstat
❌ Allocating 100% host RAM to VMs
📐 Capacity Planning Rule for KVM Hosts
For large-memory hosts (like 314GB):
- Leave 16–32GB minimum for host OS
- Never allocate 100% to guests
- Monitor swap regularly
- Keep swap small (1–4GB is fine for large RAM systems)
🧠 Pro Tips
Check total VM memory allocation:
virsh list --name | while read vm; do
virsh dominfo $vm | grep -i memory
done
See if swapping is active:
vmstat 1
See which process consumes most memory:
ps -eo pid,comm,%mem,%cpu --sort=-%mem | head
🎯 Final Takeaway
Swap usage alone does NOT mean memory problem.
The real indicators are:
- Active swap in/out (
vmstat) - OOM events
- Sustained high CPU from kswapd
- Guest-level memory pressure
In my case:
- VM memory was increased from 94GB → 128GB
- Host remained healthy
- No swap pressure
- System stable
If you're running KVM in production, understanding this memory + swap + CPU interaction is critical.
Blindly adding RAM is easy.
Diagnosing correctly is what makes you a good systems engineer.
Top comments (0)