Introduction
Recently, I faced a production issue where Observability tools flagged sustained CPU utilization >95% on the particular pod in Kubernetes. Investigation revealed the Java process was hitting the pod’s 3-core CPU limit, even though the node had spare capacity—pointing to application-level saturation.
Using kubectl and in-container diagnostics, I confirmed the JVM as the source.
In this post, I’ll walk through the step-by-step process: how I diagnosed it, the safe remediation (increasing pod CPU limits and optionally scaling replicas), and the follow-up JVM and query checks to prevent recurrence.
Goals
- - Confirm the alert (pod and node metrics).
- - Determine whether node or pod (application) caused high CPU.
- - Identify what inside the pod is CPU-hot (process / JVM threads / GC / queries).
- - Apply safe remediation and verify.
Step 1 — Confirm pod metrics (kubectl top)
Command:
kubectl top pod webapp-deployment-rfc4f -n stgapp
Output:
NAME CPU(cores) MEMORY(bytes)
webapp-deployment-rfc4f 2863m 2662Mi
Values:
CPU = 2863m (≈ 2.86 cores) → round ≈ 2.9 cores
Memory = 2662Mi (≈ 2.6 GB)
Step 2 — Check the pod's resource limits (deployment spec)
kubectl get deployment webapp-deployment-rfc4f -n stgapp-o yaml | grep -A5 resources
Output:
resources:
limits:
cpu: "3"
memory: 3Gi
requests:
cpu: "3"
Interpretation: The pod is consuming ~2.86 cores — very close to its configured CPU limit.
CPU request = 3 cores
CPU limit = 3 cores
The pod is configured to have 3 cores; pod usage ~2.86 cores explains the alert (>95% of 3 cores).
Step 3 — Confirm node health (is it node or pod that’s saturated?)
Command:
kubectl top node
Output:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-xxxxxxxxxx.us-west-2.compute.internal 122m 1% 5575Mi 37%
ip-xxxxxxxxxx.us-west-2.compute.internal 181m 2% 9653Mi 65%
ip-xxxxxxxxxx.us-west-2.compute.internal 86m 1% 7030Mi 47%
ip-xxxxxxxxxx.us-west-2.compute.internal 3045m 39% 7057Mi 47%
ip-xxxxxxxxxx.us-west-2.compute.internal 3045m 39% 7057Mi 47%
... other nodes show low CPU %
Step 4 — Check process level inside the pod
We exec'd into the pod and listed processes.
Command:
kubectl exec -it webapp-deployment-rfc4f -n stgapp-- ps aux --sort=-%cpu | head -20
Output:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
app_run+ 46 225 17.1 8475108 2745168 pts/0 Sl+ 04:42 129:33 /usr/lib/jvm/
app_run+ 1 0.0 0.0 2664 960 pts/0 Ss 04:42 0:00 /usr/bin/tini
... (other minor processes)
Interpretation:
The Java process (PID 46) is the heavy consumer — observed at ~225% CPU.
This continuous high CPU usage from Java explains the pod-level metric.
We also obtained thread count:
Command :
kubectl exec -it webapp-deployment-rfc4f -n stgapp -- bash -c "ps -eLf | grep java | wc -l"
Output:
180
Interpretation: ~180 Java threads — a large thread count for java services.
Root cause:
- The pod was configured with a CPU limit of 3 cores.
- The Java search process consistently consumed most of that (observed 225%–293% at different times).
- Node had spare capacity → this was application-level saturation (the application needed more CPU than allotted).
- No evidence of node-level resource pressure or cgroup throttling preventing the pod from running; the pod simply used its quota.
Solution implemented:
Below are two safe approaches:
1. Option A — Increase pod CPU limit (vertical fix)
If the application legitimately needs more CPU (sustained), increase limits.cpu. Example: raise limit from 3 → 4 or 6 cores.
Command to update the deployment (non-disruptive for Deployment-managed pods; pods will be rolled):
Command:
kubectl set resources deployment/webapp-deployment-rfc4f -n stgapp \
--limits=cpu=4 --requests=cpu=3
Or edit YAML:
kubectl edit deployment/webapp-deployment-rfc4f -n stgapp
# update resources: limits.cpu to "4"
Verify:
kubectl get deployment webapp-deployment-rfc4f -n stgapp -o yaml | grep -A5 resources
# and
kubectl top pod webapp-deployment-rfc4f -n stgapp
Expected result:
Pod has a larger CPU quota. If load is same, CPU% (relative to limit) will drop and alert will recover.
Option B — Scale replicas (horizontal fix)
If requests can be load-balanced across replicas, scale out to reduce per-pod load:
command-
kubectl scale deployment webapp-deployment-rfc4f -n stgapp --replicas=2
Verify:
command
kubectl get pods -n stgapp -l app=webapp-deployment-rfc4f -o wide
kubectl top pods -n stgapp
Expected result:
- Per-pod CPU load drops (if incoming workload is split across replicas).
- We increased CPU limit to 4 and scaled to 2 replicas.
Increase CPU limit:
Command-
kubectl set resources deployment/webapp-deployment-rfc4f -n stgapp \
--limits=cpu=4 --requests=cpu=3
Scale replicas:
kubectl scale deployment webapp-deployment-rfc4f -n stgapp --replicas=2
Verify changes:
kubectl get deployment webapp-deployment-rfc4f -n stgapp -o yaml | sed -n '/resources:/,+6p'
kubectl get pods -n stgapp -l app=webapp-deployment-rfc4f -o wide
kubectl top pod -n stgapp
Post-fix verification:
Deployment resources:
kubectl get deployment webapp-deployment-rfc4f -n stgapp -o yaml | grep -A5 resources
Output -
resources:
limits:
cpu: "4"
memory: 3Gi
requests:
cpu: "3"
Summary
- Alert triggered due to pod CPU usage ≈ 2.86 cores vs limit 3 cores → Alert flagged >95% sustained usage.
- Investigation steps: kubectl top pod, kubectl top node, ps inside pod, check deployment resources.
- Root cause: Java search process saturating the pod CPU (application-level).
- Remediation: Increase CPU limit (vertical), or scale replicas (horizontal), and investigate hot threads/slow queries for permanent fixes.
Top comments (0)