Series: Java in Real Production — This is the second article of the series. If you haven't read the first one yet, it covers the fundamentals of Virtual Threads, Thread Pinning, and the Stampede Effect — concepts we'll build on here. Read Part 1 here — Virtual Threads in Java 21: The End of the Scarcity Era (and the Pitfalls That Can Take You Down).
You read about Virtual Threads. You understood the mental model. You fixed Thread Pinning, put a Semaphore in front of the database. The application is working in development.
Then you deploy.
And the weirdness begins: latency spiking for no apparent reason, container being killed by the kernel at peak hours, dashboards showing low CPU while requests pile up in the queue. Everything seems fine — until it isn't.
This article is about what happens after the deploy. The production environment — Docker, Kubernetes, and observability — has its own pitfalls for Virtual Thread applications, and most of them are invisible until it's too late.
Stack Cost and the OOM Kill Risk in Docker
Let's start with memory, because this is where a risk lives that can literally kill your container — with no stack trace, no warning, no graceful shutdown.
The fundamental difference between the two models:
- Platform Thread: ~1MB of stack allocated in the JVM's native space, outside the Heap
- Virtual Thread: stack stored as Java objects on the Heap, subject to GC
This migration from "native stack" to "Heap objects" has a direct consequence: the -Xmx that used to be enough may no longer be.
The Equation Changed
With Platform Threads, memory was predictable:
Total Memory ≈ Heap (-Xmx) + MetaSpace + (N_threads × ~1MB native)
With Virtual Threads, the thread stack moved into the Heap:
Total Memory ≈ Heap (includes VT stacks) + MetaSpace + Carrier Thread stacks
When you set --memory=512m in Docker (or resources.limits.memory in Kubernetes), the Linux cgroup applies that limit to the entire process memory. If the JVM exceeds that limit, the kernel sends a SIGKILL. That's the OOM Kill — and it doesn't warn you.
🐳 Golden rule for Docker: Monitor Heap usage with Virtual Threads active. The
-Xmxthat used to be enough may need a 20–30% increase to accommodate Virtual Thread stacks on the Heap. Adjust the container limit with a safety margin of at least 15% above-Xmx.
# docker-compose.yml — safe configuration for Virtual Threads
services:
app:
image: my-app:latest
environment:
JAVA_OPTS: >-
-Xms128m
-Xmx384m
-XX:+UseZGC
-Djdk.virtualThreadScheduler.parallelism=4
deploy:
resources:
limits:
memory: 512m # ~33% margin above Xmx — never set Xmx = limit
Note the -Djdk.virtualThreadScheduler.parallelism=4. This parameter controls how many Carrier Threads exist. On a container with 4 CPUs, keeping the default makes sense — but configuring it explicitly ensures the behavior doesn't change if the container's CPU count changes.
Why ZGC?
With high volumes of Virtual Threads, the Heap becomes a high-turnover environment: stack objects being created and destroyed constantly. Garbage collectors with long pauses — like G1 under heavy load — will introduce noticeable latency precisely at peak pressure moments. ZGC (and Shenandoah) were designed for sub-millisecond pauses regardless of Heap size. For Virtual Thread applications in production, they are the safest choice.
CPU Throttling in Kubernetes — The Silent Enemy of Carrier Threads
Kubernetes adds one more layer of complexity. And this one is especially treacherous because it acts completely silently.
The Mechanism
When you set resources.limits.cpu: "2" on your Pod, Kubernetes uses cgroup CPU quotas to ensure your container doesn't use more than 2 cores. If the process tries to use more, the kernel throttles it — literally strangling the process, preventing it from executing for a period proportional to the excess.
Remember the Carrier Threads from the previous article? They are OS threads that run Virtual Threads. If Kubernetes is throttling your container, Carrier Threads can't be scheduled by the OS. The result: even with 1,000,000 Virtual Threads ready to execute, they sit idle waiting for Carrier Threads to get CPU back.
The Misleading Symptom
High latency with apparently low CPU on dashboards.
The process isn't using CPU because it's being throttled — but the graphs show 40% usage (since throttle periods are cycles where the process simply doesn't run, pulling down the measured average). The metric that matters isn't cpu_usage, it's cpu_throttled_seconds_total — available in the cAdvisor of any Kubernetes cluster.
# kubernetes deployment — aware configuration for Virtual Threads
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: app
resources:
requests:
cpu: "1"
memory: "256Mi"
limits:
cpu: "2" # Sets the effective ceiling for active Carrier Threads
memory: "512Mi"
env:
- name: JAVA_OPTS
value: >-
-Xmx384m
-XX:+UseZGC
-Djdk.virtualThreadScheduler.parallelism=2
-XX:StartFlightRecording=filename=/tmp/jfr/recording.jfr,
duration=60s,settings=profile
⚠️ Critical alignment: The value of
virtualThreadScheduler.parallelismmust be consistent withlimits.cpu. If you set a 2 CPU limit but 8 Carrier Threads, the extra Carrier Threads will compete for CPU, increase throttling, and make things worse. Keep both values aligned.
Observability with JDK Flight Recorder (JFR)
JFR is the most powerful observability tool for diagnosing Virtual Thread problems in production. It has native support for Virtual Thread-specific events since Java 21 — and its overhead is so low it can run continuously in production without noticeable impact.
The Events That Matter
| JFR Event | What it reveals |
|---|---|
jdk.VirtualThreadPinned |
Active Thread Pinning — synchronized + I/O in the critical path |
jdk.VirtualThreadSubmitFailed |
Failures submitting Virtual Threads — signal of scheduler saturation |
jdk.VirtualThreadStart / End
|
Total volume of VTs created — detects creation explosion |
jdk.ThreadSleep |
Threads in unnecessarily long sleep |
Runtime Diagnosis (No Restart Required)
# Start a 2-minute recording without restarting the application
jcmd <PID> JFR.start name=vt-diagnosis \
settings=profile \
duration=120s \
filename=/tmp/vt-diagnosis.jfr
# Analyze pinning events directly in the terminal
jfr print --events jdk.VirtualThreadPinned /tmp/vt-diagnosis.jfr
For a complete visual analysis, JDK Mission Control (JMC) is the official GUI — open the .jfr file and get a full event timeline with drill-down by thread, method, and time.
Prometheus Integration via Micrometer
If you use Spring Boot 3.2+, Virtual Thread metrics are already available via Micrometer. Configure alerts for:
# Alert: Thread Pinning detected in production
- alert: VirtualThreadPinningDetected
expr: jvm_threads_virtual_pinned_count > 0
for: 1m
annotations:
summary: "Active Thread Pinning — investigate synchronized + I/O"
# Alert: CPU Throttling above acceptable threshold
- alert: ContainerCPUThrottling
expr: rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0.25
for: 5m
annotations:
summary: "Container being throttled — Carrier Threads impacted"
🔍 Golden tip: If
VirtualThreadPinnedfires, you have Thread Pinning in production. IfCPUThrottlingfires alongside high latency, you have Carrier Threads being strangled by the cgroup. These are different problems with different causes — separate alerts prevent investigating in the wrong place.
The Modern Developer's Checklist
Consolidating everything from the series into an operational checklist:
Before Enabling Virtual Threads
- [ ] Java 21+ in your environment — don't negotiate this
- [ ] Check JDBC driver versions — PostgreSQL ≥ 42.6, MySQL Connector/J ≥ 9.0
- [ ] Audit
synchronizedin critical I/O paths — migrate toReentrantLock - [ ] Define concurrency limits for scarce resources via
Semaphoreor Resilience4jBulkhead
Docker Configuration
- [ ] Add 20–30% margin on the container memory limit above
-Xmx - [ ] Configure
-Djdk.virtualThreadScheduler.parallelismexplicitly based on allocated CPUs - [ ] Use ZGC or Shenandoah as GC — shorter pauses, better for high Heap object turnover
Kubernetes Configuration
- [ ] Monitor
cpu_throttled_seconds_totalin cAdvisor — throttling is the silent enemy of Carrier Threads - [ ] Align
virtualThreadScheduler.parallelismwithresources.limits.cpu - [ ] Enable JFR with Virtual Thread profile in staging before going to production
Production Observability
- [ ] Alert for
jdk.VirtualThreadPinned— any value above zero deserves investigation - [ ] Alert for
container_cpu_cfs_throttled_seconds_totalabove 25% - [ ] Dashboard with
jvm_threads_states_threads_total{state="runnable"}for active VT volume - [ ] Health checks that treat
Bulkheadsaturation as a degraded health state
Conclusion
The era of thread scarcity is over. The restaurant can have 1 million waiters.
But the database still has 100 tables. Kubernetes still has limited CPU. The container still has memory defined by the cgroup. And the kernel still sends SIGKILL without asking permission.
Virtual Threads solve the thread scarcity problem — and only that. The other problems still exist, and some become even more visible because the accidental handbrake that Platform Threads provided is gone.
The correct mental model isn't "Virtual Threads = free performance". It's: Virtual Threads = I stop worrying about threads and start worrying about the real resources my application consumes.
With that model in mind, the tool is genuinely transformative.
Have a question or want to go deeper on any of the points? Comment below — I answer all of them. 🙌
References
JEP 444 — Virtual Threads (Java 21)
Conceptual foundation for Carrier Thread behavior and the CPU throttling impact discussed in this article.
https://openjdk.org/jeps/444OpenJDK — JDK Flight Recorder (JFR) Event Reference
Documentation forjdk.VirtualThreadPinned,jdk.VirtualThreadStart, and other Virtual Thread events available via JFR.
https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/package-summary.htmlSpring Boot 3.2 Release Notes — Virtual Threads
Reference for Virtual Thread configuration with Spring Boot, including Micrometer integration for the metrics cited in the alert configurations.
https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-3.2-Release-NotesResilience4j — Official CircuitBreaker Documentation
Reference forfailureRateThreshold,slidingWindowSize, andwaitDurationInOpenStateconfiguration used in the resilience examples.
https://resilience4j.readme.io/docs/circuitbreaker
Source Code
If you haven't seen the series repository yet, it contains executable demos of the Part 1 concepts — Stampede Effect, Thread Pinning, and Platform vs Virtual Threads benchmark — each with logs that make the behavior visible in real time.
Top comments (0)