Dhellano Castro

Posted on Feb 22

Virtual Threads in Java 21: The End of the Scarcity Era (and the Pitfalls That Can Take You Down)

#backend #java #performance #programming

Series: Java in Real Production — This is the first of two articles. Here we cover the fundamentals, the right mental model, and the two pitfalls that silently bring down applications. In the second, we go deeper into Docker, Kubernetes, and observability with JFR.

Imagine a fine-dining restaurant. Every table — an HTTP request — needs a dedicated waiter. The waiter takes the order, walks to the kitchen... and just stands there, waiting for the chef to finish the dish. Meanwhile, new tables keep arriving. But there are no waiters available. The maître d' starts turning customers away at the door.

The restaurant is full of waiters standing idle in the kitchen — and the dining room is empty of service.

This is the classic Platform Threads model in Java. Each thread consumes roughly 1MB of stack in the operating system. On a server with 4GB dedicated to threads, you get at most ~4,000 waiters. Sounds like a lot? For a modern application with heavy I/O — database calls, external HTTP, messaging — it isn't.

Project Loom, introduced as a preview in Java 19 and stable since Java 21, changed the rules of the game. The core idea is elegant: what if the waiter could leave the table in the kitchen, go back to the dining room to serve other tables, and return when the dish was ready?

That's Virtual Threads. Millions of them. With memory cost in the kilobytes range. The restaurant can now have 1,000 real waiters serving 1,000,000 simultaneous tables.

But — and there's always a "but" — a restaurant with 1 million waiters and a single kitchen with 4 stoves will still clog up. This is where the story gets interesting.

The Engine Under the Hood

Before rushing off to create Virtual Threads everywhere, it's worth understanding what's happening under the hood. The JVM manages three distinct concepts that coexist in this ecosystem.

Platform Threads are the old, honest model: a Java thread mapped 1:1 to an operating system thread. The OS schedules it, the OS blocks it, the OS pays the memory bill. They're expensive, powerful, and limited in number.

Virtual Threads are threads managed by the JVM itself, not the OS. They're lightweight, cheap, and can exist in absurd quantities. When a Virtual Thread needs to wait for I/O, it is unmounted from the OS thread and its context is saved on the heap — as regular Java objects, subject to GC.

Carrier Threads are the missing link that most articles ignore. They are OS Platform Threads that the JVM's internal ForkJoinPool uses to run Virtual Threads. Think of them as subway rails: the cars (Virtual Threads) ride on top of the rails (Carrier Threads). You can have 1,000 cars, but if there are only 4 rails, only 4 cars move at a time.

┌─────────────────────────────────────────────────────┐
│                      JVM                            │
│                                                     │
│   Virtual Thread 1  ──┐                             │
│   Virtual Thread 2  ──┤                             │
│   Virtual Thread 3  ──┼──► Carrier Thread 1 ──► OS  │
│   Virtual Thread 4  ──┤                             │
│   Virtual Thread ...──┘                             │
│                        ──► Carrier Thread 2 ──► OS  │
│                        ──► Carrier Thread N ──► OS  │
│                                                     │
│   (N = number of available CPUs, by default)        │
└─────────────────────────────────────────────────────┘

The default number of Carrier Threads equals the number of available CPUs. In production, inside a Docker container with --cpus=2, you have 2 rails for potentially millions of cars. This will matter — a lot — in the second article of this series.

Pitfall 1 — Thread Pinning: The Bolt in the Floor

Remember the waiter who could leave the table in the kitchen and go serve others? Well. There's a situation where they can't leave. Someone bolted their chair to the kitchen floor. That bolt is called synchronized.

When a Virtual Thread enters a synchronized block or method and hits a blocking point — I/O, for example — it cannot be unmounted from the Carrier Thread. It pins. The Carrier Thread gets stuck with it, waiting. If all Carrier Threads get pinned, your application freezes. Completely.

⚠️ Important: synchronized is not inherently a villain. It's perfectly safe to use it to protect fast in-memory operations, like manipulating a shared HashMap. The problem arises when inside the synchronized block there's a slow I/O operation — a database query, an HTTP call, a file read.

See the difference in practice:

// ❌ PROBLEMATIC: synchronized + I/O = Thread Pinning guaranteed
// The Carrier Thread gets stuck while the database responds
public synchronized User findById(Long id) {
    return jdbcTemplate.queryForObject(
        "SELECT * FROM users WHERE id = ?",
        userRowMapper,
        id
    );
}

// ✅ CORRECT: ReentrantLock is "Virtual Thread aware"
// The Virtual Thread can be unmounted while waiting for the database
// The Carrier Thread is free to execute other Virtual Threads
private final ReentrantLock lock = new ReentrantLock();

public User findById(Long id) throws InterruptedException {
    lock.lock();
    try {
        return jdbcTemplate.queryForObject(
            "SELECT * FROM users WHERE id = ?",
            userRowMapper,
            id
        );
    } finally {
        lock.unlock();
    }
}

Why does ReentrantLock solve it? Because it doesn't use native OS object monitors. When a Virtual Thread needs to wait inside a ReentrantLock, the JVM can unmount it from the Carrier Thread normally. The waiter can finally get up from the chair.

To identify pinning in production, enable the JVM diagnostic flag:

-Djdk.tracePinnedThreads=full

💡 Note for framework users: Older JDBC drivers and some DataSource implementations still use synchronized internally. Check your versions. The PostgreSQL driver removed the problematic synchronized usages starting from version 42.6.

📌 Note on Java 24: JEP 491, delivered in Java 24, resolves this limitation in most cases. Starting from Java 24, synchronized with I/O no longer causes pinning. For those still on Java 21/22/23 — which is most production environments today — the pitfall remains valid and migrating to ReentrantLock is still the right recommendation.

Pitfall 2 — The Stampede Effect

You fixed the pinning. Your application is running with Virtual Threads smooth as butter. Requests coming in, threads responding. Then you look at your database and see this:

ERROR: FATAL: remaining connection slots are reserved
       for replication superuser connections
Max connections: 100. Active: 100. Waiting: 4,847.

Welcome to the Stampede Effect.

The problem is subtle and cruel: with Platform Threads, the thread pool was the natural limiter of database connections. If you had 200 threads in the pool, at most 200 simultaneous connections reached the database. It was accidental contention, but it worked as a handbrake.

With Virtual Threads, that handbrake is gone. The JVM can create unlimited Virtual Threads. Each one, upon hitting an I/O point, stays "parked" waiting for the response — but keeps existing and holding an open connection to the database. A flood of 50,000 simultaneous requests can turn into 50,000 connections trying to open on the database at once.

The database collapses. It wasn't the Virtual Thread that was slow — it was the absence of governance over the shared resource.

🎯 The central paradigm shift of Project Loom: With Virtual Threads, control moves away from the thread and toward the resource. You no longer limit threads. You limit access to scarce resources.

Mitigation — The Intelligent Handbrake

Semaphore: The Database Doorman

The most direct solution is to use a Semaphore as an access controller. Think of it as a doorman at the database entrance: regardless of how many clients show up, only N get in at a time.

@Repository
public class ProductRepository {

    // Doorman: maximum 80 simultaneous connections to the database
    private final Semaphore dbGatekeeper = new Semaphore(80);

    public List<Product> findAllByCategory(String category) {
        try {
            dbGatekeeper.acquire(); // Wait for the doorman's permission
            try {
                return jdbcTemplate.query(
                    "SELECT * FROM products WHERE category = ?",
                    productRowMapper,
                    category
                );
            } finally {
                dbGatekeeper.release(); // Release the slot on exit
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new DatabaseAccessException("Interrupted while waiting for DB slot", e);
        }
    }
}

The beauty here: Semaphore.acquire() is a virtual-thread-friendly blocking point. The Virtual Thread waiting for the doorman's slot is unmounted from the Carrier Thread, which is free to execute other Virtual Threads. Zero CPU waste.

Resilience4j: Mission Control

For real production, a bare Semaphore is the bare minimum. Resilience4j offers a complete set of resilience primitives, all compatible with Virtual Threads.

BulkheadConfig is essentially a Semaphore on steroids: metrics, fallbacks, timeouts, and native integration with Micrometer and Prometheus.

// Bulkhead configuration
@Bean
public BulkheadRegistry bulkheadRegistry() {
    BulkheadConfig config = BulkheadConfig.custom()
        .maxConcurrentCalls(80)                 // Maximum simultaneous calls
        .maxWaitDuration(Duration.ofSeconds(2)) // Queue wait timeout
        .build();

    return BulkheadRegistry.of(config);
}

// Usage in the service
@Service
public class ProductService {

    private final Bulkhead dbBulkhead;
    private final ProductRepository repository;

    public ProductService(BulkheadRegistry registry, ProductRepository repository) {
        this.dbBulkhead = registry.bulkhead("database-bulkhead");
        this.repository = repository;
    }

    public List<Product> getProductsByCategory(String category) {
        return Bulkhead.decorateSupplier(
            dbBulkhead,
            () -> repository.findAllByCategory(category)
        ).get();
    }
}

Combine this with a CircuitBreaker so that if the database starts rejecting connections, the circuit opens automatically — giving the database time to recover before the situation escalates.

@Bean
public CircuitBreakerConfig circuitBreakerConfig() {
    return CircuitBreakerConfig.custom()
        .failureRateThreshold(50)                        // Opens if 50% of calls fail
        .waitDurationInOpenState(Duration.ofSeconds(30)) // Waits 30s before retrying
        .slidingWindowSize(20)                           // Evaluates the last 20 calls
        .build();
}

Want to See the Numbers in Practice?

There's a complete, self-contained demo available in the repository — Java 21, zero dependencies — showing both scenarios running and printing the results. The output is brutal:

SCENARIO 1 — WITHOUT control:
✅ Success:   80 requests
❌ Rejected:  420 requests  ← 84% of requests lost

SCENARIO 2 — WITH Semaphore:
✅ Success:   500 requests
❌ Rejected:  0 requests
📈 Peak:      80 connections (never exceeded the limit)

🔗 github.com/DheCastro/java-virtual-threads-pitfalls

What's Coming in the Next Article

Now that the mental model is correct, let's go deeper into where most Java applications actually live: containers in production.

In the next article of this series, we'll cover:

Stack cost in Docker: why the -Xmx that used to be enough may no longer be — and how to calculate the right margin to avoid OOM Kill
CPU Throttling in Kubernetes: how CPU limits affect Carrier Threads and cause high latency with apparently low CPU on dashboards
Observability with JFR: the exact events to monitor Thread Pinning and saturation in production
Complete checklist for the modern developer for a safe migration

Continue reading: Part 2 — Virtual Threads in Real Production: Docker, Kubernetes, and What the Dashboards Don't Tell You

If this article was helpful, drop a reaction — it really helps to know if the series is worth continuing. 🙌

References

JEP 444 — Virtual Threads (Java 21)
Official Project Loom specification. Documents the mount/unmount model, synchronized behavior, and the role of Carrier Threads.
https://openjdk.org/jeps/444
JEP 491 — Synchronize Virtual Threads without Pinning (Java 24)
The direct evolution of the Thread Pinning pitfall discussed in this article. Starting from Java 24, synchronized with I/O no longer causes pinning in most cases.
https://openjdk.org/jeps/491
Spring Boot 3.2 Release Notes — Virtual Threads
Official documentation for the spring.threads.virtual.enabled property and what it configures automatically (Tomcat, Jetty, @Async, executors).
https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-3.2-Release-Notes
Resilience4j — Official Bulkhead Documentation
Reference for SemaphoreBulkhead and BulkheadConfig used in the mitigation section.
https://resilience4j.readme.io/docs/bulkhead

Source Code

All examples from this article — and more — are available in the repository below.
Each class is self-contained and runs with a single command (java ClassName.java).
No external dependencies, just Java 21.

🔗 github.com/DheCastro/java-virtual-threads-pitfalls