DEV Community

Dhellano Castro
Dhellano Castro

Posted on

Virtual Threads in Real Production: Docker, Kubernetes, and What the Dashboards Don't Tell You

Series: Java in Real Production — This is the second article of the series. If you haven't read the first one yet, it covers the fundamentals of Virtual Threads, Thread Pinning, and the Stampede Effect — concepts we'll build on here. Read Part 1 here — Virtual Threads in Java 21: The End of the Scarcity Era (and the Pitfalls That Can Take You Down).


You read about Virtual Threads. You understood the mental model. You fixed Thread Pinning, put a Semaphore in front of the database. The application is working in development.

Then you deploy.

And the weirdness begins: latency spiking for no apparent reason, container being killed by the kernel at peak hours, dashboards showing low CPU while requests pile up in the queue. Everything seems fine — until it isn't.

This article is about what happens after the deploy. The production environment — Docker, Kubernetes, and observability — has its own pitfalls for Virtual Thread applications, and most of them are invisible until it's too late.


Stack Cost and the OOM Kill Risk in Docker

Let's start with memory, because this is where a risk lives that can literally kill your container — with no stack trace, no warning, no graceful shutdown.

The fundamental difference between the two models:

  • Platform Thread: ~1MB of stack allocated in the JVM's native space, outside the Heap
  • Virtual Thread: stack stored as Java objects on the Heap, subject to GC

This migration from "native stack" to "Heap objects" has a direct consequence: the -Xmx that used to be enough may no longer be.

The Equation Changed

With Platform Threads, memory was predictable:

Total Memory ≈ Heap (-Xmx) + MetaSpace + (N_threads × ~1MB native)
Enter fullscreen mode Exit fullscreen mode

With Virtual Threads, the thread stack moved into the Heap:

Total Memory ≈ Heap (includes VT stacks) + MetaSpace + Carrier Thread stacks
Enter fullscreen mode Exit fullscreen mode

When you set --memory=512m in Docker (or resources.limits.memory in Kubernetes), the Linux cgroup applies that limit to the entire process memory. If the JVM exceeds that limit, the kernel sends a SIGKILL. That's the OOM Kill — and it doesn't warn you.

🐳 Golden rule for Docker: Monitor Heap usage with Virtual Threads active. The -Xmx that used to be enough may need a 20–30% increase to accommodate Virtual Thread stacks on the Heap. Adjust the container limit with a safety margin of at least 15% above -Xmx.

# docker-compose.yml — safe configuration for Virtual Threads
services:
  app:
    image: my-app:latest
    environment:
      JAVA_OPTS: >-
        -Xms128m
        -Xmx384m
        -XX:+UseZGC
        -Djdk.virtualThreadScheduler.parallelism=4
    deploy:
      resources:
        limits:
          memory: 512m  # ~33% margin above Xmx — never set Xmx = limit
Enter fullscreen mode Exit fullscreen mode

Note the -Djdk.virtualThreadScheduler.parallelism=4. This parameter controls how many Carrier Threads exist. On a container with 4 CPUs, keeping the default makes sense — but configuring it explicitly ensures the behavior doesn't change if the container's CPU count changes.

Why ZGC?

With high volumes of Virtual Threads, the Heap becomes a high-turnover environment: stack objects being created and destroyed constantly. Garbage collectors with long pauses — like G1 under heavy load — will introduce noticeable latency precisely at peak pressure moments. ZGC (and Shenandoah) were designed for sub-millisecond pauses regardless of Heap size. For Virtual Thread applications in production, they are the safest choice.


CPU Throttling in Kubernetes — The Silent Enemy of Carrier Threads

Kubernetes adds one more layer of complexity. And this one is especially treacherous because it acts completely silently.

The Mechanism

When you set resources.limits.cpu: "2" on your Pod, Kubernetes uses cgroup CPU quotas to ensure your container doesn't use more than 2 cores. If the process tries to use more, the kernel throttles it — literally strangling the process, preventing it from executing for a period proportional to the excess.

Remember the Carrier Threads from the previous article? They are OS threads that run Virtual Threads. If Kubernetes is throttling your container, Carrier Threads can't be scheduled by the OS. The result: even with 1,000,000 Virtual Threads ready to execute, they sit idle waiting for Carrier Threads to get CPU back.

The Misleading Symptom

High latency with apparently low CPU on dashboards.

The process isn't using CPU because it's being throttled — but the graphs show 40% usage (since throttle periods are cycles where the process simply doesn't run, pulling down the measured average). The metric that matters isn't cpu_usage, it's cpu_throttled_seconds_total — available in the cAdvisor of any Kubernetes cluster.

# kubernetes deployment — aware configuration for Virtual Threads
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: app
          resources:
            requests:
              cpu: "1"
              memory: "256Mi"
            limits:
              cpu: "2"       # Sets the effective ceiling for active Carrier Threads
              memory: "512Mi"
          env:
            - name: JAVA_OPTS
              value: >-
                -Xmx384m
                -XX:+UseZGC
                -Djdk.virtualThreadScheduler.parallelism=2
                -XX:StartFlightRecording=filename=/tmp/jfr/recording.jfr,
                  duration=60s,settings=profile
Enter fullscreen mode Exit fullscreen mode

⚠️ Critical alignment: The value of virtualThreadScheduler.parallelism must be consistent with limits.cpu. If you set a 2 CPU limit but 8 Carrier Threads, the extra Carrier Threads will compete for CPU, increase throttling, and make things worse. Keep both values aligned.


Observability with JDK Flight Recorder (JFR)

JFR is the most powerful observability tool for diagnosing Virtual Thread problems in production. It has native support for Virtual Thread-specific events since Java 21 — and its overhead is so low it can run continuously in production without noticeable impact.

The Events That Matter

JFR Event What it reveals
jdk.VirtualThreadPinned Active Thread Pinning — synchronized + I/O in the critical path
jdk.VirtualThreadSubmitFailed Failures submitting Virtual Threads — signal of scheduler saturation
jdk.VirtualThreadStart / End Total volume of VTs created — detects creation explosion
jdk.ThreadSleep Threads in unnecessarily long sleep

Runtime Diagnosis (No Restart Required)

# Start a 2-minute recording without restarting the application
jcmd <PID> JFR.start name=vt-diagnosis \
  settings=profile \
  duration=120s \
  filename=/tmp/vt-diagnosis.jfr

# Analyze pinning events directly in the terminal
jfr print --events jdk.VirtualThreadPinned /tmp/vt-diagnosis.jfr
Enter fullscreen mode Exit fullscreen mode

For a complete visual analysis, JDK Mission Control (JMC) is the official GUI — open the .jfr file and get a full event timeline with drill-down by thread, method, and time.

Prometheus Integration via Micrometer

If you use Spring Boot 3.2+, Virtual Thread metrics are already available via Micrometer. Configure alerts for:

# Alert: Thread Pinning detected in production
- alert: VirtualThreadPinningDetected
  expr: jvm_threads_virtual_pinned_count > 0
  for: 1m
  annotations:
    summary: "Active Thread Pinning  investigate synchronized + I/O"

# Alert: CPU Throttling above acceptable threshold
- alert: ContainerCPUThrottling
  expr: rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0.25
  for: 5m
  annotations:
    summary: "Container being throttled  Carrier Threads impacted"
Enter fullscreen mode Exit fullscreen mode

🔍 Golden tip: If VirtualThreadPinned fires, you have Thread Pinning in production. If CPUThrottling fires alongside high latency, you have Carrier Threads being strangled by the cgroup. These are different problems with different causes — separate alerts prevent investigating in the wrong place.


The Modern Developer's Checklist

Consolidating everything from the series into an operational checklist:

Before Enabling Virtual Threads

  • [ ] Java 21+ in your environment — don't negotiate this
  • [ ] Check JDBC driver versions — PostgreSQL ≥ 42.6, MySQL Connector/J ≥ 9.0
  • [ ] Audit synchronized in critical I/O paths — migrate to ReentrantLock
  • [ ] Define concurrency limits for scarce resources via Semaphore or Resilience4j Bulkhead

Docker Configuration

  • [ ] Add 20–30% margin on the container memory limit above -Xmx
  • [ ] Configure -Djdk.virtualThreadScheduler.parallelism explicitly based on allocated CPUs
  • [ ] Use ZGC or Shenandoah as GC — shorter pauses, better for high Heap object turnover

Kubernetes Configuration

  • [ ] Monitor cpu_throttled_seconds_total in cAdvisor — throttling is the silent enemy of Carrier Threads
  • [ ] Align virtualThreadScheduler.parallelism with resources.limits.cpu
  • [ ] Enable JFR with Virtual Thread profile in staging before going to production

Production Observability

  • [ ] Alert for jdk.VirtualThreadPinned — any value above zero deserves investigation
  • [ ] Alert for container_cpu_cfs_throttled_seconds_total above 25%
  • [ ] Dashboard with jvm_threads_states_threads_total{state="runnable"} for active VT volume
  • [ ] Health checks that treat Bulkhead saturation as a degraded health state

Conclusion

The era of thread scarcity is over. The restaurant can have 1 million waiters.

But the database still has 100 tables. Kubernetes still has limited CPU. The container still has memory defined by the cgroup. And the kernel still sends SIGKILL without asking permission.

Virtual Threads solve the thread scarcity problem — and only that. The other problems still exist, and some become even more visible because the accidental handbrake that Platform Threads provided is gone.

The correct mental model isn't "Virtual Threads = free performance". It's: Virtual Threads = I stop worrying about threads and start worrying about the real resources my application consumes.

With that model in mind, the tool is genuinely transformative.


Have a question or want to go deeper on any of the points? Comment below — I answer all of them. 🙌


References


Source Code

If you haven't seen the series repository yet, it contains executable demos of the Part 1 concepts — Stampede Effect, Thread Pinning, and Platform vs Virtual Threads benchmark — each with logs that make the behavior visible in real time.

🔗 github.com/DheCastro/java-virtual-threads-pitfalls

Top comments (0)