<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dhellano Castro</title>
    <description>The latest articles on DEV Community by Dhellano Castro (@dhellano_castro_c5aba0c56).</description>
    <link>https://dev.to/dhellano_castro_c5aba0c56</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3573092%2F3d09fb55-88e3-4341-9c65-d0cbb20eb754.jpg</url>
      <title>DEV Community: Dhellano Castro</title>
      <link>https://dev.to/dhellano_castro_c5aba0c56</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dhellano_castro_c5aba0c56"/>
    <language>en</language>
    <item>
      <title>Virtual Threads in Real Production: Docker, Kubernetes, and What the Dashboards Don't Tell You</title>
      <dc:creator>Dhellano Castro</dc:creator>
      <pubDate>Sun, 22 Feb 2026 19:29:59 +0000</pubDate>
      <link>https://dev.to/dhellano_castro_c5aba0c56/virtual-threads-in-real-production-docker-kubernetes-and-what-the-dashboards-dont-tell-you-4pg</link>
      <guid>https://dev.to/dhellano_castro_c5aba0c56/virtual-threads-in-real-production-docker-kubernetes-and-what-the-dashboards-dont-tell-you-4pg</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Series:&lt;/strong&gt; &lt;em&gt;Java in Real Production&lt;/em&gt; — This is the second article of the series. If you haven't read the first one yet, it covers the fundamentals of Virtual Threads, Thread Pinning, and the Stampede Effect — concepts we'll build on here. Read Part 1 here — &lt;a href="https://dev.to/dhellano_castro_c5aba0c56/virtual-threads-in-java-21-the-end-of-the-scarcity-era-and-the-pitfalls-that-can-take-you-down-4bml"&gt;Virtual Threads in Java 21: The End of the Scarcity Era (and the Pitfalls That Can Take You Down)&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;You read about Virtual Threads. You understood the mental model. You fixed Thread Pinning, put a Semaphore in front of the database. The application is working in development.&lt;/p&gt;

&lt;p&gt;Then you deploy.&lt;/p&gt;

&lt;p&gt;And the weirdness begins: latency spiking for no apparent reason, container being killed by the kernel at peak hours, dashboards showing low CPU while requests pile up in the queue. Everything seems fine — until it isn't.&lt;/p&gt;

&lt;p&gt;This article is about what happens &lt;strong&gt;after&lt;/strong&gt; the deploy. The production environment — Docker, Kubernetes, and observability — has its own pitfalls for Virtual Thread applications, and most of them are invisible until it's too late.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stack Cost and the OOM Kill Risk in Docker
&lt;/h2&gt;

&lt;p&gt;Let's start with memory, because this is where a risk lives that can literally kill your container — with no stack trace, no warning, no graceful shutdown.&lt;/p&gt;

&lt;p&gt;The fundamental difference between the two models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Platform Thread:&lt;/strong&gt; ~1MB of stack allocated in the JVM's &lt;strong&gt;native&lt;/strong&gt; space, outside the Heap&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Virtual Thread:&lt;/strong&gt; stack stored as &lt;strong&gt;Java objects on the Heap&lt;/strong&gt;, subject to GC&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This migration from "native stack" to "Heap objects" has a direct consequence: the &lt;code&gt;-Xmx&lt;/code&gt; that used to be enough may no longer be.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Equation Changed
&lt;/h3&gt;

&lt;p&gt;With Platform Threads, memory was predictable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Total Memory ≈ Heap (-Xmx) + MetaSpace + (N_threads × ~1MB native)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With Virtual Threads, the thread stack moved into the Heap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Total Memory ≈ Heap (includes VT stacks) + MetaSpace + Carrier Thread stacks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you set &lt;code&gt;--memory=512m&lt;/code&gt; in Docker (or &lt;code&gt;resources.limits.memory&lt;/code&gt; in Kubernetes), the Linux cgroup applies that limit to &lt;strong&gt;the entire process memory&lt;/strong&gt;. If the JVM exceeds that limit, the kernel sends a &lt;strong&gt;SIGKILL&lt;/strong&gt;. That's the OOM Kill — and it doesn't warn you.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🐳 &lt;strong&gt;Golden rule for Docker:&lt;/strong&gt; Monitor Heap usage with Virtual Threads active. The &lt;code&gt;-Xmx&lt;/code&gt; that used to be enough may need a 20–30% increase to accommodate Virtual Thread stacks on the Heap. Adjust the container limit with a safety margin of at least 15% above &lt;code&gt;-Xmx&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.yml — safe configuration for Virtual Threads&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-app:latest&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;JAVA_OPTS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;-&lt;/span&gt;
        &lt;span class="s"&gt;-Xms128m&lt;/span&gt;
        &lt;span class="s"&gt;-Xmx384m&lt;/span&gt;
        &lt;span class="s"&gt;-XX:+UseZGC&lt;/span&gt;
        &lt;span class="s"&gt;-Djdk.virtualThreadScheduler.parallelism=4&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;512m&lt;/span&gt;  &lt;span class="c1"&gt;# ~33% margin above Xmx — never set Xmx = limit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the &lt;code&gt;-Djdk.virtualThreadScheduler.parallelism=4&lt;/code&gt;. This parameter controls how many &lt;strong&gt;Carrier Threads&lt;/strong&gt; exist. On a container with 4 CPUs, keeping the default makes sense — but configuring it explicitly ensures the behavior doesn't change if the container's CPU count changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why ZGC?
&lt;/h3&gt;

&lt;p&gt;With high volumes of Virtual Threads, the Heap becomes a high-turnover environment: stack objects being created and destroyed constantly. Garbage collectors with long pauses — like G1 under heavy load — will introduce noticeable latency precisely at peak pressure moments. &lt;strong&gt;ZGC&lt;/strong&gt; (and Shenandoah) were designed for sub-millisecond pauses regardless of Heap size. For Virtual Thread applications in production, they are the safest choice.&lt;/p&gt;




&lt;h2&gt;
  
  
  CPU Throttling in Kubernetes — The Silent Enemy of Carrier Threads
&lt;/h2&gt;

&lt;p&gt;Kubernetes adds one more layer of complexity. And this one is especially treacherous because it acts completely silently.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Mechanism
&lt;/h3&gt;

&lt;p&gt;When you set &lt;code&gt;resources.limits.cpu: "2"&lt;/code&gt; on your Pod, Kubernetes uses cgroup &lt;strong&gt;CPU quotas&lt;/strong&gt; to ensure your container doesn't use more than 2 cores. If the process tries to use more, the kernel &lt;strong&gt;throttles&lt;/strong&gt; it — literally strangling the process, preventing it from executing for a period proportional to the excess.&lt;/p&gt;

&lt;p&gt;Remember the Carrier Threads from the previous article? They are OS threads that run Virtual Threads. If Kubernetes is throttling your container, Carrier Threads can't be scheduled by the OS. The result: even with 1,000,000 Virtual Threads ready to execute, they sit idle waiting for Carrier Threads to get CPU back.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Misleading Symptom
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;High latency with apparently low CPU on dashboards.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The process isn't using CPU because it's being throttled — but the graphs show 40% usage (since throttle periods are cycles where the process simply doesn't run, pulling down the measured average). The metric that matters isn't &lt;code&gt;cpu_usage&lt;/code&gt;, it's &lt;code&gt;cpu_throttled_seconds_total&lt;/code&gt; — available in the cAdvisor of any Kubernetes cluster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# kubernetes deployment — aware configuration for Virtual Threads&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
          &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1"&lt;/span&gt;
              &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;256Mi"&lt;/span&gt;
            &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2"&lt;/span&gt;       &lt;span class="c1"&gt;# Sets the effective ceiling for active Carrier Threads&lt;/span&gt;
              &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;512Mi"&lt;/span&gt;
          &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;JAVA_OPTS&lt;/span&gt;
              &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;-&lt;/span&gt;
                &lt;span class="s"&gt;-Xmx384m&lt;/span&gt;
                &lt;span class="s"&gt;-XX:+UseZGC&lt;/span&gt;
                &lt;span class="s"&gt;-Djdk.virtualThreadScheduler.parallelism=2&lt;/span&gt;
                &lt;span class="s"&gt;-XX:StartFlightRecording=filename=/tmp/jfr/recording.jfr,&lt;/span&gt;
                  &lt;span class="s"&gt;duration=60s,settings=profile&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Critical alignment:&lt;/strong&gt; The value of &lt;code&gt;virtualThreadScheduler.parallelism&lt;/code&gt; must be consistent with &lt;code&gt;limits.cpu&lt;/code&gt;. If you set a 2 CPU limit but 8 Carrier Threads, the extra Carrier Threads will compete for CPU, increase throttling, and make things worse. Keep both values aligned.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Observability with JDK Flight Recorder (JFR)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;JFR&lt;/strong&gt; is the most powerful observability tool for diagnosing Virtual Thread problems in production. It has native support for Virtual Thread-specific events since Java 21 — and its overhead is so low it can run continuously in production without noticeable impact.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Events That Matter
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;JFR Event&lt;/th&gt;
&lt;th&gt;What it reveals&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;jdk.VirtualThreadPinned&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Active Thread Pinning — &lt;code&gt;synchronized&lt;/code&gt; + I/O in the critical path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;jdk.VirtualThreadSubmitFailed&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Failures submitting Virtual Threads — signal of scheduler saturation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;jdk.VirtualThreadStart&lt;/code&gt; / &lt;code&gt;End&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Total volume of VTs created — detects creation explosion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;jdk.ThreadSleep&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Threads in unnecessarily long sleep&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Runtime Diagnosis (No Restart Required)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start a 2-minute recording without restarting the application&lt;/span&gt;
jcmd &amp;lt;PID&amp;gt; JFR.start &lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;vt-diagnosis &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;settings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;profile &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;120s &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/tmp/vt-diagnosis.jfr

&lt;span class="c"&gt;# Analyze pinning events directly in the terminal&lt;/span&gt;
jfr print &lt;span class="nt"&gt;--events&lt;/span&gt; jdk.VirtualThreadPinned /tmp/vt-diagnosis.jfr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a complete visual analysis, &lt;strong&gt;JDK Mission Control (JMC)&lt;/strong&gt; is the official GUI — open the &lt;code&gt;.jfr&lt;/code&gt; file and get a full event timeline with drill-down by thread, method, and time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prometheus Integration via Micrometer
&lt;/h3&gt;

&lt;p&gt;If you use Spring Boot 3.2+, Virtual Thread metrics are already available via Micrometer. Configure alerts for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Alert: Thread Pinning detected in production&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;VirtualThreadPinningDetected&lt;/span&gt;
  &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jvm_threads_virtual_pinned_count &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
  &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1m&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Active&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Thread&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Pinning&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;investigate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;synchronized&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;+&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;I/O"&lt;/span&gt;

&lt;span class="c1"&gt;# Alert: CPU Throttling above acceptable threshold&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ContainerCPUThrottling&lt;/span&gt;
  &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rate(container_cpu_cfs_throttled_seconds_total[5m]) &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;0.25&lt;/span&gt;
  &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5m&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Container&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;being&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;throttled&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Carrier&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Threads&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;impacted"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;🔍 &lt;strong&gt;Golden tip:&lt;/strong&gt; If &lt;code&gt;VirtualThreadPinned&lt;/code&gt; fires, you have Thread Pinning in production. If &lt;code&gt;CPUThrottling&lt;/code&gt; fires alongside high latency, you have Carrier Threads being strangled by the cgroup. These are different problems with different causes — separate alerts prevent investigating in the wrong place.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Modern Developer's Checklist
&lt;/h2&gt;

&lt;p&gt;Consolidating everything from the series into an operational checklist:&lt;/p&gt;

&lt;h3&gt;
  
  
  Before Enabling Virtual Threads
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;Java 21+&lt;/strong&gt; in your environment — don't negotiate this&lt;/li&gt;
&lt;li&gt;[ ] Check JDBC driver versions — PostgreSQL ≥ 42.6, MySQL Connector/J ≥ 9.0&lt;/li&gt;
&lt;li&gt;[ ] Audit &lt;code&gt;synchronized&lt;/code&gt; in critical I/O paths — migrate to &lt;code&gt;ReentrantLock&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Define concurrency limits for scarce resources via &lt;code&gt;Semaphore&lt;/code&gt; or Resilience4j &lt;code&gt;Bulkhead&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Docker Configuration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Add &lt;strong&gt;20–30% margin&lt;/strong&gt; on the container memory limit above &lt;code&gt;-Xmx&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Configure &lt;code&gt;-Djdk.virtualThreadScheduler.parallelism&lt;/code&gt; explicitly based on allocated CPUs&lt;/li&gt;
&lt;li&gt;[ ] Use &lt;strong&gt;ZGC&lt;/strong&gt; or &lt;strong&gt;Shenandoah&lt;/strong&gt; as GC — shorter pauses, better for high Heap object turnover&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Kubernetes Configuration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Monitor &lt;code&gt;cpu_throttled_seconds_total&lt;/code&gt; in cAdvisor — throttling is the silent enemy of Carrier Threads&lt;/li&gt;
&lt;li&gt;[ ] Align &lt;code&gt;virtualThreadScheduler.parallelism&lt;/code&gt; with &lt;code&gt;resources.limits.cpu&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Enable JFR with Virtual Thread profile in staging before going to production&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Production Observability
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Alert for &lt;code&gt;jdk.VirtualThreadPinned&lt;/code&gt; — any value above zero deserves investigation&lt;/li&gt;
&lt;li&gt;[ ] Alert for &lt;code&gt;container_cpu_cfs_throttled_seconds_total&lt;/code&gt; above 25%&lt;/li&gt;
&lt;li&gt;[ ] Dashboard with &lt;code&gt;jvm_threads_states_threads_total{state="runnable"}&lt;/code&gt; for active VT volume&lt;/li&gt;
&lt;li&gt;[ ] Health checks that treat &lt;code&gt;Bulkhead&lt;/code&gt; saturation as a degraded health state&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The era of thread scarcity is over. The restaurant can have 1 million waiters.&lt;/p&gt;

&lt;p&gt;But the database still has 100 tables. Kubernetes still has limited CPU. The container still has memory defined by the cgroup. And the kernel still sends SIGKILL without asking permission.&lt;/p&gt;

&lt;p&gt;Virtual Threads solve the thread scarcity problem — and only that. The other problems still exist, and some become even more visible because the accidental handbrake that Platform Threads provided is gone.&lt;/p&gt;

&lt;p&gt;The correct mental model isn't "Virtual Threads = free performance". It's: &lt;strong&gt;Virtual Threads = I stop worrying about threads and start worrying about the real resources my application consumes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With that model in mind, the tool is genuinely transformative.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have a question or want to go deeper on any of the points? Comment below — I answer all of them.&lt;/em&gt; 🙌&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;JEP 444 — Virtual Threads (Java 21)&lt;/strong&gt;&lt;br&gt;
Conceptual foundation for Carrier Thread behavior and the CPU throttling impact discussed in this article.&lt;br&gt;
&lt;a href="https://openjdk.org/jeps/444" rel="noopener noreferrer"&gt;https://openjdk.org/jeps/444&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenJDK — JDK Flight Recorder (JFR) Event Reference&lt;/strong&gt;&lt;br&gt;
Documentation for &lt;code&gt;jdk.VirtualThreadPinned&lt;/code&gt;, &lt;code&gt;jdk.VirtualThreadStart&lt;/code&gt;, and other Virtual Thread events available via JFR.&lt;br&gt;
&lt;a href="https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/package-summary.html" rel="noopener noreferrer"&gt;https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/package-summary.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Spring Boot 3.2 Release Notes — Virtual Threads&lt;/strong&gt;&lt;br&gt;
Reference for Virtual Thread configuration with Spring Boot, including Micrometer integration for the metrics cited in the alert configurations.&lt;br&gt;
&lt;a href="https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-3.2-Release-Notes" rel="noopener noreferrer"&gt;https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-3.2-Release-Notes&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Resilience4j — Official CircuitBreaker Documentation&lt;/strong&gt;&lt;br&gt;
Reference for &lt;code&gt;failureRateThreshold&lt;/code&gt;, &lt;code&gt;slidingWindowSize&lt;/code&gt;, and &lt;code&gt;waitDurationInOpenState&lt;/code&gt; configuration used in the resilience examples.&lt;br&gt;
&lt;a href="https://resilience4j.readme.io/docs/circuitbreaker" rel="noopener noreferrer"&gt;https://resilience4j.readme.io/docs/circuitbreaker&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Source Code
&lt;/h2&gt;

&lt;p&gt;If you haven't seen the series repository yet, it contains executable demos of the Part 1 concepts — Stampede Effect, Thread Pinning, and Platform vs Virtual Threads benchmark — each with logs that make the behavior visible in real time.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/DheCastro/java-virtual-threads-pitfalls" rel="noopener noreferrer"&gt;github.com/DheCastro/java-virtual-threads-pitfalls&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>java</category>
      <category>kubernetes</category>
      <category>performance</category>
    </item>
    <item>
      <title>Virtual Threads in Java 21: The End of the Scarcity Era (and the Pitfalls That Can Take You Down)</title>
      <dc:creator>Dhellano Castro</dc:creator>
      <pubDate>Sun, 22 Feb 2026 19:29:37 +0000</pubDate>
      <link>https://dev.to/dhellano_castro_c5aba0c56/virtual-threads-in-java-21-the-end-of-the-scarcity-era-and-the-pitfalls-that-can-take-you-down-4bml</link>
      <guid>https://dev.to/dhellano_castro_c5aba0c56/virtual-threads-in-java-21-the-end-of-the-scarcity-era-and-the-pitfalls-that-can-take-you-down-4bml</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Series:&lt;/strong&gt; &lt;em&gt;Java in Real Production&lt;/em&gt; — This is the first of two articles. Here we cover the fundamentals, the right mental model, and the two pitfalls that silently bring down applications. In the second, we go deeper into Docker, Kubernetes, and observability with JFR.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Imagine a fine-dining restaurant. Every table — an HTTP request — needs a dedicated waiter. The waiter takes the order, walks to the kitchen... and &lt;strong&gt;just stands there, waiting for the chef to finish the dish&lt;/strong&gt;. Meanwhile, new tables keep arriving. But there are no waiters available. The maître d' starts turning customers away at the door.&lt;/p&gt;

&lt;p&gt;The restaurant is full of waiters standing idle in the kitchen — and the dining room is empty of service.&lt;/p&gt;

&lt;p&gt;This is the classic &lt;strong&gt;Platform Threads&lt;/strong&gt; model in Java. Each thread consumes roughly &lt;strong&gt;1MB of stack&lt;/strong&gt; in the operating system. On a server with 4GB dedicated to threads, you get at most ~4,000 waiters. Sounds like a lot? For a modern application with heavy I/O — database calls, external HTTP, messaging — it isn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project Loom&lt;/strong&gt;, introduced as a preview in Java 19 and stable since &lt;strong&gt;Java 21&lt;/strong&gt;, changed the rules of the game. The core idea is elegant: &lt;em&gt;what if the waiter could leave the table in the kitchen, go back to the dining room to serve other tables, and return when the dish was ready?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's &lt;strong&gt;Virtual Threads&lt;/strong&gt;. Millions of them. With memory cost in the &lt;strong&gt;kilobytes&lt;/strong&gt; range. The restaurant can now have 1,000 real waiters serving 1,000,000 simultaneous tables.&lt;/p&gt;

&lt;p&gt;But — and there's always a "but" — a restaurant with 1 million waiters and a single kitchen with 4 stoves will still clog up. This is where the story gets interesting.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Engine Under the Hood
&lt;/h2&gt;

&lt;p&gt;Before rushing off to create Virtual Threads everywhere, it's worth understanding what's happening under the hood. The JVM manages three distinct concepts that coexist in this ecosystem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform Threads&lt;/strong&gt; are the old, honest model: a Java thread mapped 1:1 to an operating system thread. The OS schedules it, the OS blocks it, the OS pays the memory bill. They're expensive, powerful, and limited in number.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Virtual Threads&lt;/strong&gt; are threads managed by &lt;em&gt;the JVM itself&lt;/em&gt;, not the OS. They're lightweight, cheap, and can exist in absurd quantities. When a Virtual Thread needs to wait for I/O, it is &lt;strong&gt;unmounted&lt;/strong&gt; from the OS thread and its context is saved on the heap — as regular Java objects, subject to GC.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Carrier Threads&lt;/strong&gt; are the missing link that most articles ignore. They are OS Platform Threads that the JVM's internal &lt;em&gt;ForkJoinPool&lt;/em&gt; uses to &lt;strong&gt;run&lt;/strong&gt; Virtual Threads. Think of them as subway rails: the cars (Virtual Threads) ride on top of the rails (Carrier Threads). You can have 1,000 cars, but if there are only 4 rails, only 4 cars move at a time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────┐
│                      JVM                            │
│                                                     │
│   Virtual Thread 1  ──┐                             │
│   Virtual Thread 2  ──┤                             │
│   Virtual Thread 3  ──┼──► Carrier Thread 1 ──► OS  │
│   Virtual Thread 4  ──┤                             │
│   Virtual Thread ...──┘                             │
│                        ──► Carrier Thread 2 ──► OS  │
│                        ──► Carrier Thread N ──► OS  │
│                                                     │
│   (N = number of available CPUs, by default)        │
└─────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The default number of Carrier Threads equals the number of available CPUs. In production, inside a Docker container with &lt;code&gt;--cpus=2&lt;/code&gt;, you have &lt;strong&gt;2 rails&lt;/strong&gt; for potentially millions of cars. This will matter — a lot — in the second article of this series.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pitfall 1 — Thread Pinning: The Bolt in the Floor
&lt;/h2&gt;

&lt;p&gt;Remember the waiter who could leave the table in the kitchen and go serve others? Well. There's a situation where they &lt;strong&gt;can't leave&lt;/strong&gt;. Someone bolted their chair to the kitchen floor. That bolt is called &lt;code&gt;synchronized&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;When a Virtual Thread enters a &lt;code&gt;synchronized&lt;/code&gt; block or method and hits a blocking point — I/O, for example — it &lt;strong&gt;cannot be unmounted from the Carrier Thread&lt;/strong&gt;. It pins. The Carrier Thread gets stuck with it, waiting. If all Carrier Threads get pinned, your application freezes. Completely.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Important:&lt;/strong&gt; &lt;code&gt;synchronized&lt;/code&gt; is not inherently a villain. It's perfectly safe to use it to protect fast in-memory operations, like manipulating a shared &lt;code&gt;HashMap&lt;/code&gt;. The problem arises when inside the &lt;code&gt;synchronized&lt;/code&gt; block there's a &lt;strong&gt;slow I/O operation&lt;/strong&gt; — a database query, an HTTP call, a file read.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;See the difference in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ❌ PROBLEMATIC: synchronized + I/O = Thread Pinning guaranteed&lt;/span&gt;
&lt;span class="c1"&gt;// The Carrier Thread gets stuck while the database responds&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;synchronized&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt; &lt;span class="nf"&gt;findById&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Long&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryForObject&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"SELECT * FROM users WHERE id = ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;userRowMapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ✅ CORRECT: ReentrantLock is "Virtual Thread aware"&lt;/span&gt;
&lt;span class="c1"&gt;// The Virtual Thread can be unmounted while waiting for the database&lt;/span&gt;
&lt;span class="c1"&gt;// The Carrier Thread is free to execute other Virtual Threads&lt;/span&gt;
&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;ReentrantLock&lt;/span&gt; &lt;span class="n"&gt;lock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ReentrantLock&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt; &lt;span class="nf"&gt;findById&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Long&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryForObject&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"SELECT * FROM users WHERE id = ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;userRowMapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;id&lt;/span&gt;
        &lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;unlock&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why does &lt;code&gt;ReentrantLock&lt;/code&gt; solve it?&lt;/strong&gt; Because it doesn't use native OS object monitors. When a Virtual Thread needs to wait inside a &lt;code&gt;ReentrantLock&lt;/code&gt;, the JVM can unmount it from the Carrier Thread normally. The waiter can finally get up from the chair.&lt;/p&gt;

&lt;p&gt;To identify pinning in production, enable the JVM diagnostic flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nt"&gt;-Djdk&lt;/span&gt;.tracePinnedThreads&lt;span class="o"&gt;=&lt;/span&gt;full
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Note for framework users:&lt;/strong&gt; Older JDBC drivers and some &lt;code&gt;DataSource&lt;/code&gt; implementations still use &lt;code&gt;synchronized&lt;/code&gt; internally. Check your versions. The PostgreSQL driver removed the problematic &lt;code&gt;synchronized&lt;/code&gt; usages starting from version &lt;strong&gt;42.6&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;📌 &lt;strong&gt;Note on Java 24:&lt;/strong&gt; JEP 491, delivered in Java 24, resolves this limitation in most cases. Starting from Java 24, &lt;code&gt;synchronized&lt;/code&gt; with I/O no longer causes pinning. For those still on Java 21/22/23 — which is most production environments today — the pitfall remains valid and migrating to &lt;code&gt;ReentrantLock&lt;/code&gt; is still the right recommendation.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Pitfall 2 — The Stampede Effect
&lt;/h2&gt;

&lt;p&gt;You fixed the pinning. Your application is running with Virtual Threads smooth as butter. Requests coming in, threads responding. Then you look at your database and see this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR: FATAL: remaining connection slots are reserved
       for replication superuser connections
Max connections: 100. Active: 100. Waiting: 4,847.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Welcome to the &lt;strong&gt;Stampede Effect&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The problem is subtle and cruel: with Platform Threads, the thread pool &lt;em&gt;was&lt;/em&gt; the natural limiter of database connections. If you had 200 threads in the pool, at most 200 simultaneous connections reached the database. It was accidental contention, but it worked as a handbrake.&lt;/p&gt;

&lt;p&gt;With Virtual Threads, that handbrake is gone. The JVM can create unlimited Virtual Threads. Each one, upon hitting an I/O point, stays "parked" waiting for the response — but keeps &lt;em&gt;existing&lt;/em&gt; and &lt;em&gt;holding an open connection&lt;/em&gt; to the database. A flood of 50,000 simultaneous requests can turn into 50,000 connections trying to open on the database at once.&lt;/p&gt;

&lt;p&gt;The database collapses. It wasn't the Virtual Thread that was slow — it was the absence of governance over the shared resource.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🎯 &lt;strong&gt;The central paradigm shift of Project Loom:&lt;/strong&gt; With Virtual Threads, control moves away from the &lt;em&gt;thread&lt;/em&gt; and toward the &lt;em&gt;resource&lt;/em&gt;. You no longer limit threads. You limit access to scarce resources.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Mitigation — The Intelligent Handbrake
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Semaphore: The Database Doorman
&lt;/h3&gt;

&lt;p&gt;The most direct solution is to use a &lt;code&gt;Semaphore&lt;/code&gt; as an access controller. Think of it as a doorman at the database entrance: regardless of how many clients show up, only &lt;code&gt;N&lt;/code&gt; get in at a time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Repository&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ProductRepository&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="c1"&gt;// Doorman: maximum 80 simultaneous connections to the database&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Semaphore&lt;/span&gt; &lt;span class="n"&gt;dbGatekeeper&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Semaphore&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Product&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;findAllByCategory&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;dbGatekeeper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;acquire&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Wait for the doorman's permission&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                    &lt;span class="s"&gt;"SELECT * FROM products WHERE category = ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;productRowMapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;category&lt;/span&gt;
                &lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;dbGatekeeper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;release&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Release the slot on exit&lt;/span&gt;
            &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;interrupt&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;DatabaseAccessException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Interrupted while waiting for DB slot"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The beauty here: &lt;code&gt;Semaphore.acquire()&lt;/code&gt; is a &lt;em&gt;virtual-thread-friendly&lt;/em&gt; blocking point. The Virtual Thread waiting for the doorman's slot is unmounted from the Carrier Thread, which is free to execute other Virtual Threads. Zero CPU waste.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resilience4j: Mission Control
&lt;/h3&gt;

&lt;p&gt;For real production, a bare &lt;code&gt;Semaphore&lt;/code&gt; is the bare minimum. &lt;strong&gt;Resilience4j&lt;/strong&gt; offers a complete set of resilience primitives, all compatible with Virtual Threads.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;BulkheadConfig&lt;/code&gt; is essentially a Semaphore on steroids: metrics, fallbacks, timeouts, and native integration with Micrometer and Prometheus.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Bulkhead configuration&lt;/span&gt;
&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;BulkheadRegistry&lt;/span&gt; &lt;span class="nf"&gt;bulkheadRegistry&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;BulkheadConfig&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BulkheadConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;custom&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maxConcurrentCalls&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;                 &lt;span class="c1"&gt;// Maximum simultaneous calls&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maxWaitDuration&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofSeconds&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;// Queue wait timeout&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;BulkheadRegistry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage in the service&lt;/span&gt;
&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ProductService&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Bulkhead&lt;/span&gt; &lt;span class="n"&gt;dbBulkhead&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;ProductRepository&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;ProductService&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;BulkheadRegistry&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ProductRepository&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;dbBulkhead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;bulkhead&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"database-bulkhead"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;repository&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Product&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;getProductsByCategory&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Bulkhead&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;decorateSupplier&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;dbBulkhead&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findAllByCategory&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Combine this with a &lt;strong&gt;CircuitBreaker&lt;/strong&gt; so that if the database starts rejecting connections, the circuit opens automatically — giving the database time to recover before the situation escalates.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreakerConfig&lt;/span&gt; &lt;span class="nf"&gt;circuitBreakerConfig&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreakerConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;custom&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;failureRateThreshold&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;                        &lt;span class="c1"&gt;// Opens if 50% of calls fail&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;waitDurationInOpenState&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofSeconds&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;// Waits 30s before retrying&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;slidingWindowSize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;                           &lt;span class="c1"&gt;// Evaluates the last 20 calls&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Want to See the Numbers in Practice?
&lt;/h2&gt;

&lt;p&gt;There's a complete, self-contained demo available in the repository — Java 21, zero dependencies — showing both scenarios running and printing the results. The output is brutal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SCENARIO 1 — WITHOUT control:
✅ Success:   80 requests
❌ Rejected:  420 requests  ← 84% of requests lost

SCENARIO 2 — WITH Semaphore:
✅ Success:   500 requests
❌ Rejected:  0 requests
📈 Peak:      80 connections (never exceeded the limit)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/DheCastro/java-virtual-threads-pitfalls" rel="noopener noreferrer"&gt;github.com/DheCastro/java-virtual-threads-pitfalls&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Coming in the Next Article
&lt;/h2&gt;

&lt;p&gt;Now that the mental model is correct, let's go deeper into where most Java applications actually live: &lt;strong&gt;containers in production&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In the next article of this series, we'll cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stack cost in Docker:&lt;/strong&gt; why the &lt;code&gt;-Xmx&lt;/code&gt; that used to be enough may no longer be — and how to calculate the right margin to avoid OOM Kill&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU Throttling in Kubernetes:&lt;/strong&gt; how CPU limits affect Carrier Threads and cause high latency with apparently low CPU on dashboards&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability with JFR:&lt;/strong&gt; the exact events to monitor Thread Pinning and saturation in production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complete checklist&lt;/strong&gt; for the modern developer for a safe migration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Continue reading:&lt;/strong&gt; &lt;a href="https://dev.to/dhellano_castro_c5aba0c56/virtual-threads-in-real-production-docker-kubernetes-and-what-the-dashboards-dont-tell-you-4pg"&gt;Part 2 — Virtual Threads in Real Production: Docker, Kubernetes, and What the Dashboards Don't Tell You&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If this article was helpful, drop a reaction — it really helps to know if the series is worth continuing.&lt;/em&gt; 🙌&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;JEP 444 — Virtual Threads (Java 21)&lt;/strong&gt;&lt;br&gt;
Official Project Loom specification. Documents the mount/unmount model, &lt;code&gt;synchronized&lt;/code&gt; behavior, and the role of Carrier Threads.&lt;br&gt;
&lt;a href="https://openjdk.org/jeps/444" rel="noopener noreferrer"&gt;https://openjdk.org/jeps/444&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;JEP 491 — Synchronize Virtual Threads without Pinning (Java 24)&lt;/strong&gt;&lt;br&gt;
The direct evolution of the Thread Pinning pitfall discussed in this article. Starting from Java 24, &lt;code&gt;synchronized&lt;/code&gt; with I/O no longer causes pinning in most cases.&lt;br&gt;
&lt;a href="https://openjdk.org/jeps/491" rel="noopener noreferrer"&gt;https://openjdk.org/jeps/491&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Spring Boot 3.2 Release Notes — Virtual Threads&lt;/strong&gt;&lt;br&gt;
Official documentation for the &lt;code&gt;spring.threads.virtual.enabled&lt;/code&gt; property and what it configures automatically (Tomcat, Jetty, &lt;code&gt;@Async&lt;/code&gt;, executors).&lt;br&gt;
&lt;a href="https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-3.2-Release-Notes" rel="noopener noreferrer"&gt;https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-3.2-Release-Notes&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Resilience4j — Official Bulkhead Documentation&lt;/strong&gt;&lt;br&gt;
Reference for &lt;code&gt;SemaphoreBulkhead&lt;/code&gt; and &lt;code&gt;BulkheadConfig&lt;/code&gt; used in the mitigation section.&lt;br&gt;
&lt;a href="https://resilience4j.readme.io/docs/bulkhead" rel="noopener noreferrer"&gt;https://resilience4j.readme.io/docs/bulkhead&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Source Code
&lt;/h2&gt;

&lt;p&gt;All examples from this article — and more — are available in the repository below.&lt;br&gt;
Each class is self-contained and runs with a single command (&lt;code&gt;java ClassName.java&lt;/code&gt;).&lt;br&gt;
No external dependencies, just Java 21.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/DheCastro/java-virtual-threads-pitfalls" rel="noopener noreferrer"&gt;github.com/DheCastro/java-virtual-threads-pitfalls&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>backend</category>
      <category>java</category>
      <category>performance</category>
      <category>programming</category>
    </item>
    <item>
      <title>Virtual Threads em Produção de Verdade: Docker, Kubernetes e o que os Dashboards não te Contam</title>
      <dc:creator>Dhellano Castro</dc:creator>
      <pubDate>Sun, 22 Feb 2026 15:49:48 +0000</pubDate>
      <link>https://dev.to/dhellano_castro_c5aba0c56/virtual-threads-em-producao-de-verdade-docker-kubernetes-e-o-que-os-dashboards-nao-te-contam-3na6</link>
      <guid>https://dev.to/dhellano_castro_c5aba0c56/virtual-threads-em-producao-de-verdade-docker-kubernetes-e-o-que-os-dashboards-nao-te-contam-3na6</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Série:&lt;/strong&gt; &lt;em&gt;Java em Produção de Verdade&lt;/em&gt; — Este é o segundo artigo da série. Se você ainda não leu o primeiro, ele cobre os fundamentos das Virtual Threads, Thread Pinning e o Efeito Manada — conceitos que usaremos aqui como base. Leia a Parte 1 aqui — &lt;a href="https://dev.to/dhellano_castro_c5aba0c56/virtual-threads-no-java-21-o-fim-da-era-da-escassez-e-as-armadilhas-que-podem-lhe-derrubar-1m4k"&gt;Virtual Threads no Java 21: O Fim da Era da Escassez (e as Armadilhas que Podem Lhe Derrubar)&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Você leu sobre Virtual Threads. Entendeu o modelo mental. Resolveu o Thread Pinning, colocou o Semaphore na frente do banco. A aplicação está funcionando em desenvolvimento.&lt;/p&gt;

&lt;p&gt;Aí você faz o deploy.&lt;/p&gt;

&lt;p&gt;E começa a estranheza: latência oscilando sem motivo aparente, container sendo morto pelo kernel em hora de pico, dashboards mostrando CPU baixa enquanto as requisições acumulam na fila. Tudo parece bem — até não estar.&lt;/p&gt;

&lt;p&gt;Esse artigo é sobre o que acontece &lt;strong&gt;depois&lt;/strong&gt; do deploy. O ambiente de produção — Docker, Kubernetes e observabilidade — tem suas próprias armadilhas para aplicações com Virtual Threads, e a maioria delas é invisível até ser tarde demais.&lt;/p&gt;




&lt;h2&gt;
  
  
  O Custo do Stack e o Risco de OOM Kill no Docker
&lt;/h2&gt;

&lt;p&gt;Vamos começar com memória, porque aqui mora um risco que pode matar seu container literalmente — sem stack trace, sem aviso, sem graceful shutdown.&lt;/p&gt;

&lt;p&gt;A diferença fundamental entre os dois modelos:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Platform Thread:&lt;/strong&gt; ~1MB de stack alocado no espaço &lt;strong&gt;nativo&lt;/strong&gt; da JVM, fora da Heap&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Virtual Thread:&lt;/strong&gt; stack armazenado como &lt;strong&gt;objetos Java na Heap&lt;/strong&gt;, sujeito ao GC&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Essa migração de "stack nativo" para "objetos na Heap" tem uma consequência direta: o &lt;code&gt;-Xmx&lt;/code&gt; que era suficiente antes pode não ser mais.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Equação Mudou
&lt;/h3&gt;

&lt;p&gt;Com Platform Threads, a memória era previsível:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Memória Total ≈ Heap (-Xmx) + MetaSpace + (N_threads × ~1MB nativo)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Com Virtual Threads, o stack das threads entrou na Heap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Memória Total ≈ Heap (inclui stacks das VTs) + MetaSpace + Carrier Thread stacks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Quando você define &lt;code&gt;--memory=512m&lt;/code&gt; no Docker (ou &lt;code&gt;resources.limits.memory&lt;/code&gt; no Kubernetes), o Linux cgroup aplica esse limite em &lt;strong&gt;toda a memória do processo&lt;/strong&gt;. Se a JVM ultrapassar esse limite, o kernel envia um &lt;strong&gt;SIGKILL&lt;/strong&gt;. Isso é o OOM Kill — e ele não avisa.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🐳 &lt;strong&gt;Regra de ouro para Docker:&lt;/strong&gt; Monitore o uso de Heap com Virtual Threads ativas. O &lt;code&gt;-Xmx&lt;/code&gt; que era suficiente antes pode precisar de um incremento de 20–30% para acomodar os stacks das Virtual Threads na Heap. Ajuste o limite do container com uma margem de segurança de pelo menos 15% acima do &lt;code&gt;-Xmx&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.yml — configuração segura para Virtual Threads&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;minha-app:latest&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;JAVA_OPTS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;-&lt;/span&gt;
        &lt;span class="s"&gt;-Xms128m&lt;/span&gt;
        &lt;span class="s"&gt;-Xmx384m&lt;/span&gt;
        &lt;span class="s"&gt;-XX:+UseZGC&lt;/span&gt;
        &lt;span class="s"&gt;-Djdk.virtualThreadScheduler.parallelism=4&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;512m&lt;/span&gt;  &lt;span class="c1"&gt;# ~33% de margem acima do Xmx — nunca coloque Xmx = limite&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note o &lt;code&gt;-Djdk.virtualThreadScheduler.parallelism=4&lt;/code&gt;. Esse parâmetro controla quantas &lt;strong&gt;Carrier Threads&lt;/strong&gt; existem. Num container com 4 CPUs, faz sentido manter o padrão — mas configurá-lo explicitamente garante que o comportamento não mude se o número de CPUs do container mudar.&lt;/p&gt;

&lt;h3&gt;
  
  
  Por Que ZGC?
&lt;/h3&gt;

&lt;p&gt;Com alto volume de Virtual Threads, a Heap vira um ambiente de alta rotatividade: objetos de stack sendo criados e destruídos constantemente. Coletores de lixo com pausas longas — como o G1 em cargas pesadas — vão introduzir latência perceptível justamente nos momentos de maior pressão. O &lt;strong&gt;ZGC&lt;/strong&gt; (e o Shenandoah) foram projetados para pausas sub-milissegundo, independente do tamanho da Heap. Para aplicações com Virtual Threads em produção, são a escolha mais segura.&lt;/p&gt;




&lt;h2&gt;
  
  
  CPU Throttling no Kubernetes — O Inimigo Silencioso das Carrier Threads
&lt;/h2&gt;

&lt;p&gt;O Kubernetes adiciona mais uma camada de complexidade. E essa é especialmente traiçoeira porque age de forma completamente silenciosa.&lt;/p&gt;

&lt;h3&gt;
  
  
  O Mecanismo
&lt;/h3&gt;

&lt;p&gt;Quando você define &lt;code&gt;resources.limits.cpu: "2"&lt;/code&gt; no seu Pod, o Kubernetes usa &lt;strong&gt;CPU quotas&lt;/strong&gt; do cgroup para garantir que seu container não use mais que 2 cores. Se o processo tentar usar mais, o kernel &lt;strong&gt;throttle&lt;/strong&gt; — literalmente estrangula o processo, impedindo-o de executar por um período proporcional ao excesso.&lt;/p&gt;

&lt;p&gt;Lembra das Carrier Threads do artigo anterior? Elas são threads de SO que executam as Virtual Threads. Se o Kubernetes está throttling seu container, as Carrier Threads não conseguem ser agendadas. O resultado: mesmo com 1.000.000 de Virtual Threads prontas para executar, elas ficam paradas esperando que as Carrier Threads ganhem CPU de volta.&lt;/p&gt;

&lt;h3&gt;
  
  
  O Sintoma Enganoso
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Latência alta com CPU aparentemente baixa nos dashboards.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O processo não está usando CPU porque está sendo throttled — mas os gráficos mostram 40% de uso (já que os períodos de throttle são ciclos onde o processo simplesmente não roda, reduzindo a média medida). A métrica que importa não é &lt;code&gt;cpu_usage&lt;/code&gt;, é &lt;code&gt;cpu_throttled_seconds_total&lt;/code&gt; — disponível no cAdvisor de qualquer cluster Kubernetes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# kubernetes deployment — configuração consciente para Virtual Threads&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app&lt;/span&gt;
          &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1"&lt;/span&gt;
              &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;256Mi"&lt;/span&gt;
            &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2"&lt;/span&gt;       &lt;span class="c1"&gt;# Define o teto efetivo de Carrier Threads ativas&lt;/span&gt;
              &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;512Mi"&lt;/span&gt;
          &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;JAVA_OPTS&lt;/span&gt;
              &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;-&lt;/span&gt;
                &lt;span class="s"&gt;-Xmx384m&lt;/span&gt;
                &lt;span class="s"&gt;-XX:+UseZGC&lt;/span&gt;
                &lt;span class="s"&gt;-Djdk.virtualThreadScheduler.parallelism=2&lt;/span&gt;
                &lt;span class="s"&gt;-XX:StartFlightRecording=filename=/tmp/jfr/recording.jfr,&lt;/span&gt;
                  &lt;span class="s"&gt;duration=60s,settings=profile&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Alinhamento crítico:&lt;/strong&gt; O valor de &lt;code&gt;virtualThreadScheduler.parallelism&lt;/code&gt; deve ser coerente com o &lt;code&gt;limits.cpu&lt;/code&gt;. Se você define 2 CPUs de limite mas 8 Carrier Threads, as Carrier Threads extras vão competir por CPU, aumentar o throttling e piorar a situação. Mantenha os dois valores alinhados.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Observabilidade com JDK Flight Recorder (JFR)
&lt;/h2&gt;

&lt;p&gt;O &lt;strong&gt;JFR&lt;/strong&gt; é a ferramenta de observabilidade mais poderosa para diagnosticar problemas com Virtual Threads em produção. Ele tem suporte nativo a eventos específicos de Virtual Threads desde o Java 21 — e tem overhead tão baixo que pode rodar continuamente em produção sem impacto perceptível.&lt;/p&gt;

&lt;h3&gt;
  
  
  Os Eventos que Importam
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Evento JFR&lt;/th&gt;
&lt;th&gt;O que revela&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;jdk.VirtualThreadPinned&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Thread Pinning ativo — &lt;code&gt;synchronized&lt;/code&gt; + I/O no caminho crítico&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;jdk.VirtualThreadSubmitFailed&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Falhas ao submeter Virtual Threads — sinal de saturação do scheduler&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;jdk.VirtualThreadStart&lt;/code&gt; / &lt;code&gt;End&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Volume total de VTs criadas — detecta explosão de criação&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;jdk.ThreadSleep&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Threads em sleep desnecessariamente longo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Diagnóstico em Runtime (Sem Restart)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Inicia uma gravação de 2 minutos sem reiniciar a aplicação&lt;/span&gt;
jcmd &amp;lt;PID&amp;gt; JFR.start &lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;vt-diagnosis &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;settings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;profile &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;duration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;120s &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nv"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/tmp/vt-diagnosis.jfr

&lt;span class="c"&gt;# Analisa eventos de pinning diretamente no terminal&lt;/span&gt;
jfr print &lt;span class="nt"&gt;--events&lt;/span&gt; jdk.VirtualThreadPinned /tmp/vt-diagnosis.jfr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Para uma análise visual completa, o &lt;strong&gt;JDK Mission Control (JMC)&lt;/strong&gt; é a GUI oficial — você abre o arquivo &lt;code&gt;.jfr&lt;/code&gt; e tem uma linha do tempo completa de todos os eventos, com drill-down por thread, por método e por tempo.&lt;/p&gt;

&lt;h3&gt;
  
  
  Integração com Prometheus via Micrometer
&lt;/h3&gt;

&lt;p&gt;Se você usa Spring Boot 3.2+, as métricas de Virtual Threads já estão disponíveis via Micrometer. Configure alertas para:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Alerta: Thread Pinning detectado em produção&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;VirtualThreadPinningDetected&lt;/span&gt;
  &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jvm_threads_virtual_pinned_count &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
  &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1m&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Thread&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Pinning&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ativo&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;investigar&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;synchronized&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;+&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;I/O"&lt;/span&gt;

&lt;span class="c1"&gt;# Alerta: CPU Throttling acima do aceitável&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;alert&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ContainerCPUThrottling&lt;/span&gt;
  &lt;span class="na"&gt;expr&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rate(container_cpu_cfs_throttled_seconds_total[5m]) &amp;gt; &lt;/span&gt;&lt;span class="m"&gt;0.25&lt;/span&gt;
  &lt;span class="na"&gt;for&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5m&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Container&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;sendo&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;throttled&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Carrier&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Threads&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;impactadas"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;🔍 &lt;strong&gt;Dica de ouro:&lt;/strong&gt; Se &lt;code&gt;VirtualThreadPinned&lt;/code&gt; disparar, você tem Thread Pinning em produção. Se &lt;code&gt;CPUThrottling&lt;/code&gt; disparar junto com latência alta, você tem Carrier Threads sendo estranguladas pelo cgroup. São problemas diferentes com causas diferentes — os alertas separados evitam a investigação no lugar errado.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Checklist do Desenvolvedor Moderno
&lt;/h2&gt;

&lt;p&gt;Consolidando tudo da série em um checklist operacional:&lt;/p&gt;

&lt;h3&gt;
  
  
  Antes de Ligar Virtual Threads
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;Java 21+&lt;/strong&gt; no ambiente — não negocie isso&lt;/li&gt;
&lt;li&gt;[ ] Verifique versões de drivers JDBC — PostgreSQL ≥ 42.6, MySQL Connector/J ≥ 9.0&lt;/li&gt;
&lt;li&gt;[ ] Audite &lt;code&gt;synchronized&lt;/code&gt; em caminhos críticos de I/O — migre para &lt;code&gt;ReentrantLock&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Defina limites de concorrência para recursos escassos via &lt;code&gt;Semaphore&lt;/code&gt; ou Resilience4j &lt;code&gt;Bulkhead&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Configuração Docker
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Adicione &lt;strong&gt;20–30% de margem&lt;/strong&gt; no limite de memória do container acima do &lt;code&gt;-Xmx&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Configure &lt;code&gt;-Djdk.virtualThreadScheduler.parallelism&lt;/code&gt; explicitamente com base nas CPUs alocadas&lt;/li&gt;
&lt;li&gt;[ ] Use &lt;strong&gt;ZGC&lt;/strong&gt; ou &lt;strong&gt;Shenandoah&lt;/strong&gt; como GC — menores pausas, melhor para alta rotatividade de objetos na Heap&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Configuração Kubernetes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Monitore &lt;code&gt;cpu_throttled_seconds_total&lt;/code&gt; no cAdvisor — throttling é o inimigo silencioso das Carrier Threads&lt;/li&gt;
&lt;li&gt;[ ] Alinhe &lt;code&gt;virtualThreadScheduler.parallelism&lt;/code&gt; com &lt;code&gt;resources.limits.cpu&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Ative JFR com perfil de Virtual Threads em staging antes de ir para produção&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Observabilidade em Produção
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Alerta para &lt;code&gt;jdk.VirtualThreadPinned&lt;/code&gt; — qualquer valor acima de zero merece investigação&lt;/li&gt;
&lt;li&gt;[ ] Alerta para &lt;code&gt;container_cpu_cfs_throttled_seconds_total&lt;/code&gt; acima de 25%&lt;/li&gt;
&lt;li&gt;[ ] Dashboard com &lt;code&gt;jvm_threads_states_threads_total{state="runnable"}&lt;/code&gt; para volume de VTs ativas&lt;/li&gt;
&lt;li&gt;[ ] Health checks que considerem saturação do &lt;code&gt;Bulkhead&lt;/code&gt; como estado de degradação&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusão
&lt;/h2&gt;

&lt;p&gt;A era da escassez de threads terminou. O restaurante pode ter 1 milhão de garçons.&lt;/p&gt;

&lt;p&gt;Mas o banco de dados ainda tem 100 mesas. O Kubernetes ainda tem CPU limitada. O container ainda tem memória definida pelo cgroup. E o kernel ainda manda SIGKILL sem pedir licença.&lt;/p&gt;

&lt;p&gt;Virtual Threads resolvem o problema de escassez de threads — e apenas esse. Os outros problemas continuam existindo, e alguns ficam até mais visíveis porque o freio acidental que as Platform Threads proporcionavam sumiu.&lt;/p&gt;

&lt;p&gt;O modelo mental correto não é "Virtual Threads = performance livre". É: &lt;strong&gt;Virtual Threads = eu paro de me preocupar com threads e começo a me preocupar com os recursos reais que minha aplicação consome.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Com esse modelo na cabeça, a ferramenta é genuinamente transformadora.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Ficou com alguma dúvida ou quer aprofundar algum dos pontos? Comenta aqui embaixo — respondo todos.&lt;/em&gt; 🙌&lt;/p&gt;




&lt;h2&gt;
  
  
  Referências
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;JEP 444 — Virtual Threads (Java 21)&lt;/strong&gt;&lt;br&gt;
Base conceitual para o comportamento das Carrier Threads e o impacto de CPU throttling discutido neste artigo.&lt;br&gt;
&lt;a href="https://openjdk.org/jeps/444" rel="noopener noreferrer"&gt;https://openjdk.org/jeps/444&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenJDK — JDK Flight Recorder (JFR) Event Reference&lt;/strong&gt;&lt;br&gt;
Documentação dos eventos &lt;code&gt;jdk.VirtualThreadPinned&lt;/code&gt;, &lt;code&gt;jdk.VirtualThreadStart&lt;/code&gt; e demais eventos de Virtual Threads disponíveis via JFR.&lt;br&gt;
&lt;a href="https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/package-summary.html" rel="noopener noreferrer"&gt;https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/package-summary.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Spring Boot 3.2 Release Notes — Virtual Threads&lt;/strong&gt;&lt;br&gt;
Referência para configuração de Virtual Threads com Spring Boot, incluindo integração com Micrometer para as métricas citadas nas configurações de alerta.&lt;br&gt;
&lt;a href="https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-3.2-Release-Notes" rel="noopener noreferrer"&gt;https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-3.2-Release-Notes&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Resilience4j — Documentação oficial do CircuitBreaker&lt;/strong&gt;&lt;br&gt;
Referência para a configuração de &lt;code&gt;failureRateThreshold&lt;/code&gt;, &lt;code&gt;slidingWindowSize&lt;/code&gt; e &lt;code&gt;waitDurationInOpenState&lt;/code&gt; usados nos exemplos de resiliência.&lt;br&gt;
&lt;a href="https://resilience4j.readme.io/docs/circuitbreaker" rel="noopener noreferrer"&gt;https://resilience4j.readme.io/docs/circuitbreaker&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Código-fonte
&lt;/h2&gt;

&lt;p&gt;Se você ainda não viu o repositório da série, ele contém demos executáveis dos conceitos da Parte 1 — Efeito Manada, Thread Pinning e benchmark Platform vs Virtual Threads — cada um com logs que tornam o comportamento visível em tempo real.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/DheCastro/java-virtual-threads-pitfalls" rel="noopener noreferrer"&gt;github.com/DheCastro/java-virtual-threads-pitfalls&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>java</category>
      <category>kubernetes</category>
      <category>performance</category>
    </item>
    <item>
      <title>Virtual Threads no Java 21: O Fim da Era da Escassez (e as Armadilhas que Podem Lhe Derrubar)</title>
      <dc:creator>Dhellano Castro</dc:creator>
      <pubDate>Sun, 22 Feb 2026 15:46:51 +0000</pubDate>
      <link>https://dev.to/dhellano_castro_c5aba0c56/virtual-threads-no-java-21-o-fim-da-era-da-escassez-e-as-armadilhas-que-podem-lhe-derrubar-1m4k</link>
      <guid>https://dev.to/dhellano_castro_c5aba0c56/virtual-threads-no-java-21-o-fim-da-era-da-escassez-e-as-armadilhas-que-podem-lhe-derrubar-1m4k</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Série:&lt;/strong&gt; &lt;em&gt;Java em Produção de Verdade&lt;/em&gt; — Este é o primeiro de dois artigos. Aqui cobrimos os fundamentos, o modelo mental correto e as duas armadilhas que derrubam aplicações silenciosamente. No segundo, descemos para Docker, Kubernetes e observabilidade com JFR.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Imagine um restaurante sofisticado. Cada mesa — uma requisição HTTP — precisa de um garçom dedicado. O garçom anota o pedido, vai até a cozinha... e &lt;strong&gt;fica parado lá, esperando o chef terminar o prato&lt;/strong&gt;. Enquanto isso, novas mesas chegam. Mas não tem garçom disponível. O maître começa a recusar clientes na porta.&lt;/p&gt;

&lt;p&gt;O restaurante está cheio de garçons parados na cozinha sem fazer nada — e o salão, vazio de atendimento.&lt;/p&gt;

&lt;p&gt;Esse é o modelo clássico de &lt;strong&gt;Platform Threads&lt;/strong&gt; em Java. Cada thread consome cerca de &lt;strong&gt;1MB de stack&lt;/strong&gt; no sistema operacional. Num servidor com 4GB dedicados a threads, você tem no máximo ~4.000 garçons. Parece muito? Para uma aplicação moderna com alto volume de I/O — chamadas a banco, HTTP externo, mensageria — não é.&lt;/p&gt;

&lt;p&gt;O &lt;strong&gt;Project Loom&lt;/strong&gt;, introduzido como preview no Java 19 e estável a partir do &lt;strong&gt;Java 21&lt;/strong&gt;, mudou as regras do jogo. A ideia central é elegante: &lt;em&gt;e se o garçom pudesse largar a mesa na cozinha, voltar ao salão para atender outras mesas, e retornar quando o prato ficasse pronto?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Isso são as &lt;strong&gt;Virtual Threads&lt;/strong&gt;. Milhões delas. Com custo de memória na casa dos &lt;strong&gt;kilobytes&lt;/strong&gt;. O restaurante agora pode ter 1.000 garçons reais atendendo 1.000.000 de mesas simultâneas.&lt;/p&gt;

&lt;p&gt;Mas — e sempre tem um "mas" — um restaurante com 1 milhão de garçons e uma única cozinha com 4 fogões ainda vai entupir. É aqui que a história começa a ficar interessante.&lt;/p&gt;




&lt;h2&gt;
  
  
  O Motor por Baixo do Capô
&lt;/h2&gt;

&lt;p&gt;Antes de sair criando Virtual Threads por aí, vale entender o que está acontecendo embaixo dos panos. A JVM gerencia três conceitos distintos que vivem juntos nesse ecossistema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform Threads&lt;/strong&gt; são o modelo antigo e honesto: uma thread Java mapeada 1:1 para uma thread do sistema operacional. O SO agenda, o SO bloqueia, o SO paga a conta de memória. São caras, poderosas e limitadas em número.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Virtual Threads&lt;/strong&gt; são threads gerenciadas &lt;em&gt;pela própria JVM&lt;/em&gt;, não pelo SO. São leves, baratas e podem existir em quantidades absurdas. Quando uma Virtual Thread precisa esperar por I/O, ela é &lt;strong&gt;desmontada&lt;/strong&gt; (&lt;em&gt;unmounted&lt;/em&gt;) da thread do SO e o seu contexto fica salvo na heap — como objetos Java comuns, sujeitos ao GC.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Carrier Threads&lt;/strong&gt; são o elo perdido que a maioria dos artigos ignora. São Platform Threads do SO que o &lt;em&gt;ForkJoinPool&lt;/em&gt; interno da JVM usa para &lt;strong&gt;executar&lt;/strong&gt; as Virtual Threads. Pense nelas como os trilhos de um metrô: os vagões (Virtual Threads) rodam em cima dos trilhos (Carrier Threads). Você pode ter 1.000 vagões, mas se tiver apenas 4 trilhos, só 4 vagões andam ao mesmo tempo.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────┐
│                      JVM                            │
│                                                     │
│   Virtual Thread 1  ──┐                             │
│   Virtual Thread 2  ──┤                             │
│   Virtual Thread 3  ──┼──► Carrier Thread 1 ──► OS  │
│   Virtual Thread 4  ──┤                             │
│   Virtual Thread ...──┘                             │
│                        ──► Carrier Thread 2 ──► OS  │
│                        ──► Carrier Thread N ──► OS  │
│                                                     │
│   (N = número de CPUs disponíveis, por padrão)      │
└─────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;O número padrão de Carrier Threads é igual ao número de CPUs disponíveis. Em produção, dentro de um container Docker com &lt;code&gt;--cpus=2&lt;/code&gt;, você tem &lt;strong&gt;2 trilhos&lt;/strong&gt; para potencialmente milhões de vagões. Isso vai importar — muito — no segundo artigo desta série.&lt;/p&gt;




&lt;h2&gt;
  
  
  Armadilha 1 — Thread Pinning: O Parafuso no Chão
&lt;/h2&gt;

&lt;p&gt;Lembra do garçom que podia largar a mesa na cozinha e ir atender outras? Pois bem. Existe uma situação onde ele &lt;strong&gt;não consegue largar&lt;/strong&gt;. Alguém parafusou a cadeira dele no chão da cozinha. Esse parafuso se chama &lt;code&gt;synchronized&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Quando uma Virtual Thread entra em um bloco ou método &lt;code&gt;synchronized&lt;/code&gt; e encontra um ponto de bloqueio — I/O, por exemplo — ela &lt;strong&gt;não consegue ser desmontada da Carrier Thread&lt;/strong&gt;. Ela pina. A Carrier Thread fica presa junto com ela, esperando. Se todas as Carrier Threads ficarem pinadas, sua aplicação congela. Completamente.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Importante:&lt;/strong&gt; &lt;code&gt;synchronized&lt;/code&gt; não é vilão por natureza. É perfeitamente seguro usá-lo para proteger operações rápidas em memória, como manipulação de uma &lt;code&gt;HashMap&lt;/code&gt; compartilhada. O problema surge quando dentro do bloco &lt;code&gt;synchronized&lt;/code&gt; existe uma operação de &lt;strong&gt;I/O demorada&lt;/strong&gt; — consulta ao banco, chamada HTTP, leitura de arquivo.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Veja a diferença na prática:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ❌ PROBLEMÁTICO: synchronized + I/O = Thread Pinning garantido&lt;/span&gt;
&lt;span class="c1"&gt;// A Carrier Thread fica presa enquanto o banco responde&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;synchronized&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt; &lt;span class="nf"&gt;findById&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Long&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryForObject&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"SELECT * FROM users WHERE id = ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;userRowMapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ✅ CORRETO: ReentrantLock é "Virtual Thread aware"&lt;/span&gt;
&lt;span class="c1"&gt;// A Virtual Thread pode ser desmontada enquanto aguarda o banco&lt;/span&gt;
&lt;span class="c1"&gt;// A Carrier Thread fica livre para executar outras Virtual Threads&lt;/span&gt;
&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;ReentrantLock&lt;/span&gt; &lt;span class="n"&gt;lock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ReentrantLock&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt; &lt;span class="nf"&gt;findById&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Long&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryForObject&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"SELECT * FROM users WHERE id = ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;userRowMapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;id&lt;/span&gt;
        &lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;unlock&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Por que o &lt;code&gt;ReentrantLock&lt;/code&gt; resolve?&lt;/strong&gt; Porque ele não usa monitores de objeto nativos do SO. Quando a Virtual Thread precisa aguardar dentro de um &lt;code&gt;ReentrantLock&lt;/code&gt;, a JVM consegue desmontá-la da Carrier Thread normalmente. O garçom finalmente consegue sair da cadeira.&lt;/p&gt;

&lt;p&gt;Para identificar pinning em produção, ative o diagnóstico com a JVM flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nt"&gt;-Djdk&lt;/span&gt;.tracePinnedThreads&lt;span class="o"&gt;=&lt;/span&gt;full
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Nota para quem usa frameworks:&lt;/strong&gt; Drivers JDBC antigos e algumas implementações de &lt;code&gt;DataSource&lt;/code&gt; ainda usam &lt;code&gt;synchronized&lt;/code&gt; internamente. Verifique suas versões. O driver do PostgreSQL removeu os &lt;code&gt;synchronized&lt;/code&gt; problemáticos a partir da versão &lt;strong&gt;42.6&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Armadilha 2 — O Efeito Manada
&lt;/h2&gt;

&lt;p&gt;Você resolveu o pinning. Sua aplicação está rodando com Virtual Threads lisas como manteiga. Requisições entrando, threads respondendo. Aí você olha para o banco de dados e vê isso:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR: FATAL: remaining connection slots are reserved
       for replication superuser connections
Max connections: 100. Active: 100. Waiting: 4.847.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bem-vindo ao &lt;strong&gt;Efeito Manada&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;O problema é sutil e cruel: com Platform Threads, o pool de threads &lt;em&gt;era&lt;/em&gt; o limitador natural de conexões com o banco. Se você tinha 200 threads no pool, no máximo 200 conexões simultâneas chegavam ao banco. Era uma contenção acidental, mas funcionava como um freio de mão.&lt;/p&gt;

&lt;p&gt;Com Virtual Threads, esse freio sumiu. A JVM pode criar ilimitadas Virtual Threads. Cada uma, ao encontrar um ponto de I/O, fica "parada" aguardando a resposta — mas continua &lt;em&gt;existindo&lt;/em&gt; e &lt;em&gt;segurando uma conexão aberta&lt;/em&gt; com o banco. Uma enxurrada de 50.000 requisições simultâneas pode se transformar em 50.000 conexões tentando abrir no banco ao mesmo tempo.&lt;/p&gt;

&lt;p&gt;O banco colapsa. Não foi a Virtual Thread que foi lenta — foi a ausência de governança sobre o recurso compartilhado.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🎯 &lt;strong&gt;A mudança de paradigma central do Project Loom:&lt;/strong&gt; Com Virtual Threads, o controle sai da &lt;em&gt;thread&lt;/em&gt; e vai para o &lt;em&gt;recurso&lt;/em&gt;. Você não limita mais threads. Você limita acesso a recursos escassos.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Mitigação — O Freio de Mão Inteligente
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Semaphore: O Porteiro do Banco
&lt;/h3&gt;

&lt;p&gt;A solução mais direta é usar um &lt;code&gt;Semaphore&lt;/code&gt; como controlador de acesso. Pense nele como um porteiro na porta do banco de dados: independente de quantos clientes cheguem, só &lt;code&gt;N&lt;/code&gt; entram ao mesmo tempo.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Repository&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ProductRepository&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="c1"&gt;// Porteiro: máximo 80 conexões simultâneas ao banco&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Semaphore&lt;/span&gt; &lt;span class="n"&gt;dbGatekeeper&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Semaphore&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Product&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;findAllByCategory&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;dbGatekeeper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;acquire&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Aguarda permissão do porteiro&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
                    &lt;span class="s"&gt;"SELECT * FROM products WHERE category = ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;productRowMapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;category&lt;/span&gt;
                &lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;dbGatekeeper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;release&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Libera a vaga ao sair&lt;/span&gt;
            &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;interrupt&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;DatabaseAccessException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Interrupted while waiting for DB slot"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A beleza aqui: &lt;code&gt;Semaphore.acquire()&lt;/code&gt; é um ponto de bloqueio &lt;em&gt;virtual-thread-friendly&lt;/em&gt;. A Virtual Thread que aguarda a vaga do porteiro é desmontada da Carrier Thread, que fica livre para executar outras Virtual Threads. Zero desperdício de CPU.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resilience4j: O Controle de Missão
&lt;/h3&gt;

&lt;p&gt;Para produção real, o &lt;code&gt;Semaphore&lt;/code&gt; puro é o mínimo. O &lt;strong&gt;Resilience4j&lt;/strong&gt; oferece um conjunto completo de primitivas para resiliência, todas compatíveis com Virtual Threads.&lt;/p&gt;

&lt;p&gt;O &lt;code&gt;BulkheadConfig&lt;/code&gt; é essencialmente um Semaphore com superpoderes: métricas, fallbacks, timeouts e integração nativa com Micrometer e Prometheus.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Configuração do Bulkhead&lt;/span&gt;
&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;BulkheadRegistry&lt;/span&gt; &lt;span class="nf"&gt;bulkheadRegistry&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;BulkheadConfig&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BulkheadConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;custom&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maxConcurrentCalls&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;                 &lt;span class="c1"&gt;// Máximo de chamadas simultâneas&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maxWaitDuration&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofSeconds&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;// Timeout na fila de espera&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;BulkheadRegistry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Uso no serviço&lt;/span&gt;
&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ProductService&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Bulkhead&lt;/span&gt; &lt;span class="n"&gt;dbBulkhead&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;ProductRepository&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;ProductService&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;BulkheadRegistry&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ProductRepository&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;dbBulkhead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;bulkhead&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"database-bulkhead"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;repository&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Product&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;getProductsByCategory&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Bulkhead&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;decorateSupplier&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;dbBulkhead&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findAllByCategory&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Combine isso com um &lt;strong&gt;CircuitBreaker&lt;/strong&gt; para que, se o banco começar a rejeitar conexões, o circuito abra automaticamente — dando tempo para o banco se recuperar antes de a situação escalar.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreakerConfig&lt;/span&gt; &lt;span class="nf"&gt;circuitBreakerConfig&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;CircuitBreakerConfig&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;custom&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;failureRateThreshold&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;                       &lt;span class="c1"&gt;// Abre se 50% das calls falharem&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;waitDurationInOpenState&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofSeconds&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;&lt;span class="c1"&gt;// Espera 30s para tentar de novo&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;slidingWindowSize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;                          &lt;span class="c1"&gt;// Avalia as últimas 20 chamadas&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Quer Ver os Números na Prática?
&lt;/h2&gt;

&lt;p&gt;Há um demo completo e autocontido disponível no repositório — Java 21, zero dependências — que mostra os dois cenários rodando e imprimindo os resultados. O output é brutal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CENÁRIO 1 — SEM controle:
✅ Sucesso:    80 requisições
❌ Rejeitadas: 420 requisições  ← 84% das requisições perdidas

CENÁRIO 2 — COM Semaphore:
✅ Sucesso:    500 requisições
❌ Rejeitadas: 0 requisições
📈 Pico:       80 conexões (nunca ultrapassou o limite)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🔗 &lt;strong&gt;[&lt;a href="https://github.com/DheCastro/java-virtual-threads-pitfalls" rel="noopener noreferrer"&gt;https://github.com/DheCastro/java-virtual-threads-pitfalls&lt;/a&gt;]&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  O Que Vem no Próximo Artigo
&lt;/h2&gt;

&lt;p&gt;Agora que o modelo mental está correto, vamos descer para onde a maioria das aplicações Java realmente vive: &lt;strong&gt;containers em produção&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;No próximo artigo desta série, vamos cobrir:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;O custo do Stack em Docker:&lt;/strong&gt; por que o &lt;code&gt;-Xmx&lt;/code&gt; que era suficiente antes pode não ser mais — e como calcular a margem correta para evitar OOM Kill&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU Throttling no Kubernetes:&lt;/strong&gt; como limites de CPU afetam as Carrier Threads e causam latência alta com CPU aparentemente baixa nos dashboards&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observabilidade com JFR:&lt;/strong&gt; os eventos exatos para monitorar Thread Pinning e saturação em produção&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Checklist completo&lt;/strong&gt; do desenvolvedor moderno para uma migração segura&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Continue lendo:&lt;/strong&gt; &lt;a href="https://dev.to/dhellano_castro_c5aba0c56/virtual-threads-em-producao-de-verdade-docker-kubernetes-e-o-que-os-dashboards-nao-te-contam-3na6"&gt;Parte 2 — Virtual Threads em Produção de Verdade: Docker, Kubernetes e o que os Dashboards não te Contam&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Se este artigo foi útil, deixa uma reação — ajuda muito a saber se vale continuar a série.&lt;/em&gt; 🙌&lt;/p&gt;




&lt;h2&gt;
  
  
  Referências
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;JEP 444 — Virtual Threads (Java 21)&lt;/strong&gt;&lt;br&gt;
Especificação oficial do Project Loom. Documenta o modelo de mount/unmount, o comportamento do &lt;code&gt;synchronized&lt;/code&gt; e o papel das Carrier Threads.&lt;br&gt;
&lt;a href="https://openjdk.org/jeps/444" rel="noopener noreferrer"&gt;https://openjdk.org/jeps/444&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;JEP 491 — Synchronize Virtual Threads without Pinning (Java 24)&lt;/strong&gt;&lt;br&gt;
A evolução direta da armadilha do Thread Pinning discutida neste artigo. A partir do Java 24, o &lt;code&gt;synchronized&lt;/code&gt; com I/O deixa de causar pinning na maioria dos casos — relevante se você já está ou planeja estar no Java 24+.&lt;br&gt;
&lt;a href="https://openjdk.org/jeps/491" rel="noopener noreferrer"&gt;https://openjdk.org/jeps/491&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Spring Boot 3.2 Release Notes — Virtual Threads&lt;/strong&gt;&lt;br&gt;
Documentação oficial da propriedade &lt;code&gt;spring.threads.virtual.enabled&lt;/code&gt; e o que ela configura automaticamente (Tomcat, Jetty, &lt;code&gt;@Async&lt;/code&gt;, executores).&lt;br&gt;
&lt;a href="https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-3.2-Release-Notes" rel="noopener noreferrer"&gt;https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-3.2-Release-Notes&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Resilience4j — Documentação oficial do Bulkhead&lt;/strong&gt;&lt;br&gt;
Referência para o &lt;code&gt;SemaphoreBulkhead&lt;/code&gt; e &lt;code&gt;BulkheadConfig&lt;/code&gt; usados na seção de mitigação.&lt;br&gt;
&lt;a href="https://resilience4j.readme.io/docs/bulkhead" rel="noopener noreferrer"&gt;https://resilience4j.readme.io/docs/bulkhead&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Código-fonte
&lt;/h2&gt;

&lt;p&gt;Todos os exemplos deste artigo — e mais — estão disponíveis no repositório abaixo.&lt;br&gt;
Cada classe é autocontida e roda com um único comando (&lt;code&gt;java NomeClasse.java&lt;/code&gt;).&lt;br&gt;
Nenhuma dependência externa, apenas Java 21.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/DheCastro/java-virtual-threads-pitfalls" rel="noopener noreferrer"&gt;github.com/DheCastro/java-virtual-threads-pitfalls&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Gestão de nodes e pods com Karpenter - Perspectiva do Desenvolvedor</title>
      <dc:creator>Dhellano Castro</dc:creator>
      <pubDate>Sat, 18 Oct 2025 21:44:20 +0000</pubDate>
      <link>https://dev.to/dhellano_castro_c5aba0c56/gestao-de-nodes-e-pods-com-karpenter-perspectiva-do-desenvolvedor-1an8</link>
      <guid>https://dev.to/dhellano_castro_c5aba0c56/gestao-de-nodes-e-pods-com-karpenter-perspectiva-do-desenvolvedor-1an8</guid>
      <description>&lt;h2&gt;
  
  
  O problema
&lt;/h2&gt;

&lt;p&gt;Recentemente "tive a oportunidade" de passar por um problema envolvendo um deploy de um serviço em estratégia canário (se você não sabe o que é um deploy em canário, pode ler mais sobre isso aqui: &lt;a href="https://www.locaweb.com.br/blog/temas/codigo-aberto/canary-deployment-como-funciona/" rel="noopener noreferrer"&gt;Entendendo como funciona o Canary Deployment &lt;/a&gt;) e o autoscaler de nodes Karpenter. Enquanto o canário estava em execução, o Karpenter estava drenando vários pods da aplicação e impactando a latência do serviço.&lt;/p&gt;

&lt;h2&gt;
  
  
  O que é o Karpenter e o que ele faz?
&lt;/h2&gt;

&lt;p&gt;Imaginem que o Karpenter é o profissional que faz a “faxina inteligente” do cluster (K8s).&lt;/p&gt;

&lt;p&gt;Ele fica monitorando constantemente os nós (máquinas EC2) do cluster — e com qual fim?&lt;/p&gt;

&lt;p&gt;Basicamente, verificando se as máquinas estão cheias, vazias ou mal utilizadas. Quando identifica que há recursos sobrando, ele pode encerrar máquinas subutilizadas; quando percebe falta de capacidade, ele também pode provisionar novas máquinas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ou seja:&lt;/strong&gt; o Karpenter não só “mata nós”, mas também cria nós novos, conforme a demanda.&lt;/p&gt;

&lt;p&gt;Ele é, por definição, um autoscaler de nós — um paralelo ao Keda, que faz autoscaling de pods.&lt;/p&gt;

&lt;p&gt;Enquanto o Keda ajusta o número de pods, o Karpenter ajusta o número de máquinas.&lt;/p&gt;

&lt;h2&gt;
  
  
  E como o cenário de deploy em canário se relaciona com isso?
&lt;/h2&gt;

&lt;p&gt;No canário, acontecem os seguintes passos:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a versão antiga da app ainda fica rodando&lt;/li&gt;
&lt;li&gt;a nova versão sobe alguns pods&lt;/li&gt;
&lt;li&gt;aos poucos, a nova vai substituindo a antiga — e as duas coexistem por um tempo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Essa coexistência (em vários nós, inclusive novos que foram criados para suportar a nova versão) pode desbalancear o cluster. Por exemplo, alguns pods antigos morrem, outros novos sobem, e alguns nós acabam ficando quase vazios. Quando o Karpenter “vê” esses nós com pouca carga, ele entende que estão subutilizados e começa a agir: drena os pods daquele nó e os realoca em outros nós mais bem aproveitados, para então encerrar a instância EC2.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mas atenção:&lt;/strong&gt; o Karpenter não costuma drenar dois nós simultaneamente.&lt;br&gt;
Ele age de forma progressiva, reconhecendo e ajustando o uso à medida que as condições mudam. Um cenário de drenagem em massa só ocorreria se dezenas de pods saíssem ao mesmo tempo, o que é uma situação incomum. Além disso, antes de encerrar o nó, o Karpenter primeiro realoca os pods, o que pode gerar reinícios, mas o serviço tende a continuar ativo durante o processo. Se o comportamento parecer agressivo (muitos pods sendo terminados de uma vez), é sinal de que a configuração está muito sensível e precisa de ajuste.&lt;/p&gt;

&lt;h2&gt;
  
  
  O papel do PDB (Pod Disruption Budget)
&lt;/h2&gt;

&lt;p&gt;Aqui entra o PDB, que define um “orçamento” de quantos pods podem ser interrompidos ao mesmo tempo. Com ele, você instrui o cluster e o Karpenter a não drenar todos os pods de uma vez.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ex.:&lt;/strong&gt; imagine que você tem 10 pods no ar&lt;br&gt;
&lt;strong&gt;Sem PDB:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;O Karpenter decide apagar 2 nós&lt;/li&gt;
&lt;li&gt;Cada nó tem 5 pods&lt;/li&gt;
&lt;li&gt;Ele drena ambos ao mesmo tempo&lt;/li&gt;
&lt;li&gt;A aplicação pode ficar totalmente fora do ar por alguns segundos&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Com PDB definido (ex.: 80% dos pods devem permanecer ativos):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;O Karpenter e o K8s drenam no máximo 2 dos 10 pods de cada vez&lt;/li&gt;
&lt;li&gt;A app continua disponível e a transição ocorre de forma suave&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Assim, o PDB protege a operação e evita alta latência ou indisponibilidade total.&lt;/p&gt;

&lt;h2&gt;
  
  
  Interação entre Keda e Karpenter
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;O Keda escala pods (com base em métricas como fila, CPU, etc.)&lt;/li&gt;
&lt;li&gt;O Karpenter escala nós (subindo ou encerrando máquinas EC2).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cenário comum:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;O Keda detecta fila cheia e escala de 2 → 10 pods&lt;/li&gt;
&lt;li&gt;A fila processa rápido, e o Keda reduz a quantidade de pods novamente&lt;/li&gt;
&lt;li&gt;O Karpenter percebe nós quase vazios e começa a drenar e encerrar máquinas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tudo certo — até aqui.&lt;/p&gt;

&lt;p&gt;O problema aparece quando os tempos de reação dos dois não estão bem sincronizados.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Isso pode causar a famosa flutuação de pods:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keda escala os pods por conta da fila cheia → Karpenter cria novos nós&lt;/li&gt;
&lt;li&gt;A fila processa rapidamente e o Keda reduz a quantidade de pods → Karpenter ainda pode está criando máquinas novas&lt;/li&gt;
&lt;li&gt;Karpenter detecta baixo uso → começar a drenar e desligar as máquinas&lt;/li&gt;
&lt;li&gt;Fila volta a encher → Keda tenta escalar → não há nós disponíveis (Karpenter ainda matando as máquinas)&lt;/li&gt;
&lt;li&gt;Pods ficam Pending → Karpenter começa a criar novos nós&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;E o ciclo se repete até estabilizar.&lt;/p&gt;

&lt;p&gt;Por isso, é importante parametrizar um autoscaling menos agressivo, definindo intervalos maiores entre upscale e downscale, e limitando quantos pods podem escalar por vez.&lt;/p&gt;

&lt;h2&gt;
  
  
  Impactos em aplicações Java
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Durante essa flutuação, apps Java sofrem bastante, porque:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cada novo pod leva um certo tempo para “aquecer” (JIT, cache, pools, conexões...)&lt;/li&gt;
&lt;li&gt;Pods sobem e morrem antes de ficarem prontos&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Resultado:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aumento no tempo de resposta da app&lt;/li&gt;
&lt;li&gt;Perda de vazão&lt;/li&gt;
&lt;li&gt;Mais tempo de GC inicial&lt;/li&gt;
&lt;li&gt;É comum muitos logs de started e shutdown&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quando pods novos degradam pods antigos
&lt;/h2&gt;

&lt;p&gt;Mesmo com anti-affinity configurado (o que evita que vários pods do mesmo serviço caiam no mesmo nó), é possível que novos pods subam em nós já ocupados. O K8s agenda pods com base nos requests declarados&lt;br&gt;
 — que funcionam como reserva de capacidade. O Karpenter e o K8s garantem essa reserva, ou seja, não realocam mais pods do que cabe nos requests. Se sobrar capacidade, ela pode ser usada sob demanda até o limit configurado.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ex.:&lt;/strong&gt; imagine um nó com 4 vCPUs&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2 pods limitados (limit, não request) a 2 vCPUs cada&lt;/li&gt;
&lt;li&gt;O cluster tenta subir mais um pod no mesmo nó&lt;/li&gt;
&lt;li&gt;Agora temos 3 JVMs disputando 4 CPUs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Durante o start do terceiro pod, ele consome muita CPU (JIT, cache, inicialização...), o que afeta o desempenho dos pods antigos. Isso pode aumentar a latência e até induzir o Keda a escalar mais pods (se estiver usando CPU como trigger), piorando ainda mais a situação.&lt;/p&gt;

&lt;h2&gt;
  
  
  O velho problema do readiness
&lt;/h2&gt;

&lt;p&gt;Se um readiness probe estiver mal configurado, ele pode marcar um pod como “OK” antes de estar realmente pronto.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;O resultado:&lt;/strong&gt; o balanceador envia tráfego para ele, as respostas demoram, surgem picos de latência e, em alguns casos, o próprio K8s começa a matar e recriar pods — um ciclo de degradação.&lt;/p&gt;

&lt;h2&gt;
  
  
  Outro ponto crítico: conexões de banco
&lt;/h2&gt;

&lt;p&gt;Quando novos pods sobem, eles normalmente criam novas conexões no banco, criam conexões com outros serviços (kafka, rabbitMQ...) e tentam se registrar em um service discovery, entre outras coisas. Se não houver limite de conexões no banco, o mesmo pode ficar sobrecarregado e/ou começar a recusas conexões, afetando assim os pods antigos que já estavam no ar no mesmo nó em que os pods novos subiram.&lt;/p&gt;

&lt;h2&gt;
  
  
  Para além da configuração de recursos e probes: Graceful shutdown
&lt;/h2&gt;

&lt;p&gt;Um das causas mais comuns para restart de pods por “qualquer motivo", é eles continuarem recebendo requests mesmo quando já estão "morrendo", ou como já falado, quando ainda não estão prontos. Uma solução para o primeiro caso é a configuração do recurso de Graceful Shutdown, que evita que o pod seja terminado enquanto ainda recebe requisições. Isso evita ruídos no processo de check de probes (liveness e readiness) e consequente flutuação de pods subindo e descendo, causando aumento de latência e outros efeitos colaterais já mencionados.&lt;/p&gt;

&lt;h2&gt;
  
  
  Em resumo
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;O Karpenter é importante para a gestão funcional e financeira do nodes no cluster.&lt;/li&gt;
&lt;li&gt;O Keda é importante para a gestão funcional e financeira dos recursos do cluster.&lt;/li&gt;
&lt;li&gt;E a engenharia entender como as ferramentas funcionam é importante para configurar os serviços de maneira sustentável e eficaz, fazendo sentido funcionalmente para os clientes (baseado na necessidade) e financeiramente para nós enquanto plataforma.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Documentação de apoio
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://karpenter.sh/docs/" rel="noopener noreferrer"&gt;https://karpenter.sh/docs/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/pt-br/docs/concepts/configuration/manage-resources-containers/" rel="noopener noreferrer"&gt;https://kubernetes.io/pt-br/docs/concepts/configuration/manage-resources-containers/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>kubernetes</category>
    </item>
  </channel>
</rss>
