S M Tahosin

Posted on Jun 4

I Replaced 200 Threads With 10,000. Java Finished 13.5x Faster.

#programming #java #discuss #beginners

Low overhead for blocking tasks

I expected the fans to spin.

I had just asked Java to start 10,000 tasks, give each task its own virtual
thread, and make every one wait for 100 milliseconds.

Instead, the program finished before I could move my hand away from Enter.

So I ran it again. Then three more times.

On my 12-logical-processor laptop, the median result looked like this:

Executor	10,000 waiting tasks
Fixed pool of 200 platform threads	5,116 ms
One virtual thread per task	378 ms

That is 13.5x faster completion after changing the executor, not the task.

This is not proof that virtual threads make Java code 13.5x faster.

It is proof that I had been thinking about threads incorrectly.

Let us rebuild that mental model from the inside.

First, Make a Prediction

Each task does this:

Thread.sleep(Duration.ofMillis(100));

There are 10,000 tasks.

How long should the whole program take?

A: About 1,000 seconds, because 10,000 x 100 ms = 1,000 seconds
B: About 5 seconds, because 200 platform threads process the work in waves
C: Well under 1 second, because waiting virtual threads can step aside

All three answers can be correct. The executor decides which world you live
in.

The Old Mental Model

For most of Java's life, a Java thread was a thin wrapper around an operating
system thread.

That made threads useful, but expensive enough to treat as a limited resource.

If your server had a pool of 200 platform threads and all 200 were waiting for
a slow database, request 201 had to stand in line.

request -> platform thread -> OS thread -> wait
request -> platform thread -> OS thread -> wait
request ->       queue       ->          -> wait for a free thread

The code was blocked, but the operating system thread assigned to it was still
occupied.

Virtual threads break that one-to-one relationship.

A virtual thread is still a real java.lang.Thread.

The difference is that it does not permanently own an OS thread. The JVM
schedules many virtual threads onto a smaller number of platform threads,
called carrier threads.

You can see the distinction directly:

Thread platform = Thread.ofPlatform().start(
        () -> System.out.println(Thread.currentThread().isVirtual())
);

Thread virtual = Thread.ofVirtual().start(
        () -> System.out.println(Thread.currentThread().isVirtual())
);

platform.join();
virtual.join();

Output:

false
true

Same Thread API. Different scheduling model.

What Happens When a Virtual Thread Waits?

Imagine a virtual thread running on carrier thread 3.

It calls a supported blocking operation, such as Thread.sleep() or blocking
network I/O.

The JVM can:

Pause the virtual thread.
Unmount it from carrier thread 3.
Use carrier thread 3 to run other virtual threads.
Remount the original virtual thread when its wait is over.

The virtual thread did not make the database, network, or timer faster.

It stopped wasting a scarce carrier thread while waiting.

That sentence is the whole feature:

Virtual threads make waiting cheap. They do not make work cheap.

The Experiment

Here is the important part of the benchmark.

private static final int TASKS = 10_000;
private static final Duration WAIT = Duration.ofMillis(100);

private static void run(ExecutorService executor) throws Exception {
    try (executor) {
        List<Future<Integer>> futures = new ArrayList<>(TASKS);

        for (int task = 0; task < TASKS; task++) {
            int taskId = task;

            futures.add(executor.submit(() -> {
                Thread.sleep(WAIT);
                return taskId;
            }));
        }

        for (Future<Integer> future : futures) {
            future.get();
        }
    }
}

I ran the same method with two executors:

run(Executors.newFixedThreadPool(200));

run(Executors.newVirtualThreadPerTaskExecutor());

The first executor lets at most 200 tasks wait at once.

The virtual-thread executor starts one virtual thread for every task. When the
tasks sleep, the JVM can unmount them and keep its carrier threads available.

That is why the fixed pool behaves roughly like this:

10,000 tasks / 200 threads = 50 waves
50 waves x 100 ms          = about 5 seconds

The virtual-thread version does not need 50 waves. Almost every task can begin,
sleep, and get out of the carriers' way.

The measured medians from three runs were:

WAITING WORK
200 platform threads        5,116 ms
virtual thread per task       378 ms

CPU WORK
platform threads            2,387 ms
virtual threads             2,300 ms

The waiting result changed dramatically.

The CPU result did not.

The Benchmark Trap

Virtual threads are not tiny turbo buttons.

To test that, I also submitted 48 CPU-heavy tasks that counted primes up to
1,000,000.

Both executors finished in roughly the same time because my laptop still had
only 12 logical processors.

You can create one million virtual threads.

You cannot create one million CPU cores.

Good virtual-thread workloads spend meaningful time waiting:

HTTP requests
database queries
many file operations, after profiling
message queues
remote API calls
many independent sleep() or timer waits

Poor candidates spend most of their time calculating:

image processing
video encoding
compression
machine-learning inference
large in-memory transformations
number crunching

For CPU-bound work, use bounded parallelism near the amount of CPU your machine
can actually execute.

The Simplest Useful Rule

When tasks mostly wait:

try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    Future<String> user = executor.submit(() -> loadUser());
    Future<List<Order>> orders = executor.submit(() -> loadOrders());

    renderProfile(user.get(), orders.get());
}

This code is ordinary, blocking, and readable.

That is intentional.

For years, developers often had to choose between simple thread-per-request
code that did not scale and asynchronous code that scaled but split the
workflow across callbacks, futures, or reactive operators.

Virtual threads make the simple shape practical for many high-throughput
blocking applications.

They do not remove every concurrency problem. They remove one expensive
assumption: that every concurrent task needs its own OS thread.

Do Not Pool Virtual Threads

This feels wrong at first.

We learned to pool threads because platform threads were expensive. A pool
limited how many of those scarce threads existed.

Virtual threads are designed to be created per task.

So this is the normal pattern:

Executors.newVirtualThreadPerTaskExecutor();

Not this:

a tiny pool of reusable virtual threads

If you must limit access to something scarce, limit that thing.

Suppose a partner API permits only 20 concurrent requests:

Semaphore partnerApiSlots = new Semaphore(20);

String callPartnerApi() throws InterruptedException {
    partnerApiSlots.acquire();

    try {
        return makeBlockingHttpRequest();
    } finally {
        partnerApiSlots.release();
    }
}

The executor can still create a virtual thread per task.

The semaphore protects the actual bottleneck.

This separation is useful far beyond virtual threads:

Concurrency is how much work can be in progress. Capacity is how much work a
dependency can safely accept.

The Quiet `ThreadLocal` Trap

Virtual threads support ThreadLocal, so request context such as a user ID or
trace ID can continue to work.

The dangerous pattern is using ThreadLocal as a tiny object pool:

private static final ThreadLocal<ExpensiveClient> CLIENT =
        ThreadLocal.withInitial(ExpensiveClient::new);

That may look efficient when 200 pooled platform threads reuse 200 clients.

With one virtual thread per task, it can quietly become thousands of expensive
clients that are barely reused.

Keep context in thread-local variables only when it truly belongs to the task.
Do not use them to cache heavy reusable objects per virtual thread.

You Can Observe Them

Virtual threads are invisible to the operating system because the OS sees
carrier threads, not every virtual thread.

The JDK understands them, though.

You can create a virtual-thread-aware dump with:

jcmd <pid> Thread.dump_to_file -format=json threads.json

That distinction matters during debugging. An OS dashboard may show a modest
thread count while the JVM is managing thousands of virtual threads.

The right question is not only "how many threads exist?"

It is "what are those threads waiting for?"

One Outdated Warning

You may have read this advice:

Never block inside synchronized code when using virtual threads, because it
pins the carrier thread.

That warning mattered when virtual threads became final in Java 21.

Java 24 changed the implementation through
JEP 491. Virtual threads can now release their
carrier when blocking inside synchronized code in the normal case.

Pinning has not vanished completely. Native and foreign-function calls can
still pin a virtual thread.

But the blanket "virtual threads and synchronized do not mix" rule is
outdated on modern JDKs.

This is one reason I ran the experiment on Java 25 LTS instead of repeating an
old Java 21 checklist.

A Five-Minute Migration Checklist

Do not rewrite an application because virtual threads sound exciting.

Take one blocking workflow and inspect it.

Confirm the workload waits. Look for database calls, HTTP calls, file access, queues, and sleeps.
Replace the task executor. Try Executors.newVirtualThreadPerTaskExecutor().
Keep downstream limits. Connection pools, API quotas, and rate limits still exist.
Load test the real path. A sleep benchmark teaches the model, not your production capacity.
Measure CPU and memory too. Cheap threads can still run expensive code or retain large objects.
Check native integrations. Native calls are one of the remaining pinning cases.

The goal is not "use virtual threads everywhere."

The goal is "stop paying for idle OS threads where you do not need them."

The Mental Model I Am Keeping

Before this experiment, I thought:

More concurrent Java work requires a larger thread pool.

Now I think:

Waiting work wants cheap virtual threads. CPU work wants bounded
parallelism. Scarce dependencies want explicit limits.

That model is simple enough for a beginner and accurate enough to prevent a
surprising number of production mistakes.

The full runnable lab behind the numbers uses only the JDK. No framework, build
tool, or dependency is required.

Compile and run it with Java 25:

javac VirtualThreadsLab.java
java VirtualThreadsLab

Open the complete runnable VirtualThreadsLab.java

Virtual threads became final in Java 21. Java 25 is not required for the basic
API, but it gives us the current LTS behavior, including the post-Java-24
improvements discussed above.

Sources

What should I put through this lab next: a database connection pool, 10,000
real HTTP calls, or a ThreadLocal-heavy application?

Top comments (16)

Tamim Rao • Jun 5

Since virtual threads are getting a lot of attention lately, experiments like this are a good reminder of why. The surprising part isn't that 10,000 tasks ran, it's how little overhead there was compared to the mental model many of us still have from platform threads.

What I find interesting is that it also highlights a common misconception. Seeing 10,000 threads complete smoothly doesn't mean we should start creating threads everywhere. It means the cost model has changed, so we can focus more on expressing concurrency in a straightforward way and less on building complex pooling strategies for I/O-heavy workloads.

I'd be curious to see the same experiment with blocking network calls, database operations, and some CPU-bound work mixed in. That's usually where the real trade-offs start to show up.

S M Tahosin • Jun 5

That's exactly the takeaway I was hoping readers would get from the experiment. The interesting part isn't the number itself, it's that virtual threads let us go back to a much simpler concurrency model without paying the same cost we used to associate with threads.

I also agree that "10,000 threads worked" can easily turn into the wrong conclusion if people stop there. Virtual threads make waiting cheap, but they don't magically make CPU work cheaper.

The mixed workload scenario you mentioned would be a great follow-up. Network I/O, database calls, and CPU-bound tasks in the same benchmark would probably show a much more nuanced picture of where virtual threads shine and where the underlying hardware limits still dominate. That's actually the direction I'm thinking of exploring next.

mote • Jun 8

The "cheap waiting, not cheap work" framing is the part that took me longest to internalize. I kept trying to use virtual threads as a drop-in for thread pools on CPU-bound tasks, then wondering why memory usage spiked without speed improvement. The Semaphore pattern for rate-limiting actual bottlenecks is underrated — most devs reach for it too late, after they've already blown up a downstream API's rate limit.

One thing I'd push back on slightly: the article mentions ThreadLocal as a gotcha, but the deeper issue is that virtual threads fundamentally change the cost model. In Rust's async model, you'd handle this differently — instead of Semaphore + blocking calls, you'd reach for async channels or futures that yield without thread blocking. Same problem, different primitives. Neither is wrong, just requires rethinking what "waiting" means in your specific runtime.

What's your take on structured concurrency here? Virtual threads make it easier to accidentally spawn fire-and-forget tasks that outlive their parent scope.

S M Tahosin • Jun 9

That's a great point. I think the ThreadLocal example is really just one symptom of the broader shift in the cost model. Virtual threads let us write code in a more direct style, but they also force us to revisit assumptions that were built around expensive threads.

I also agree with the Rust comparison. The primitives are different, but the underlying challenge is the same: expressing concurrency without confusing waiting with useful work.

As for structured concurrency, I'm a big fan of it for exactly the reason you mentioned. Once spawning work becomes cheap, lifecycle management becomes more important, not less. It's very easy to create tasks that technically work but are no longer tied to the scope that created them. Structured concurrency feels like the missing guardrail that keeps that power manageable.

Ankita Sarkar • Jun 5

Really enjoyed this experiment. A lot of developers still think "threads are expensive" without considering what those threads are actually doing. Your results are a good reminder that modern JVMs and operating systems handle idle threads much better than many of us expect.

What stood out to me is how easy it is to carry old assumptions forward without testing them. It would be interesting to see the same experiment with CPU-heavy work instead of sleeping threads to compare where the real limits start showing up.

Thanks for sharing actual measurements instead of just repeating common wisdom.

S M Tahosin • Jun 5

Exactly. That was one of the main motivations behind the experiment. It's surprisingly easy to inherit assumptions from older threading models and never revisit them.

A CPU-heavy version would be a great comparison because that's where I'd expect the hardware limits to become much more visible. Thanks for the thoughtful observation.

Ismail Hasan • Jun 5

This experiment is fascinating because it really challenges our intuition about what modern hardware can handle. Most people assume starting thousands of threads would instantly bring a laptop to its knees, but seeing it barely notice is eye-opening. It also makes me think about how much the JVM and modern operating systems optimize thread management behind the scenes. I wonder how this would scale on different workloads, especially when threads are doing more than just sleeping. It’s a great reminder that sometimes our assumptions about performance bottlenecks are outdated, and testing can reveal surprising truths about the tools we use every day.

S M Tahosin • Jun 5

I completely agree. One of the biggest lessons for me was realizing how many performance assumptions I was carrying around without ever testing them.

You're also right that the workload matters. Sleeping threads are one thing, but CPU-heavy work or blocking I/O can tell a very different story. That's why benchmarks are so valuable. They often reveal that the bottleneck isn't where we expected it to be.

Thanks for sharing your thoughts.

Mansa Datta • Jun 5

What I liked about this experiment is that it challenges a common assumption many developers have: seeing a huge thread count and immediately expecting the system to fall apart. The interesting takeaway isn't that 10,000 threads worked, but understanding why they worked. Most of them were likely waiting rather than actively competing for CPU time.

It's also a good reminder that concurrency discussions are often more nuanced than "more threads = bad." Thread state, memory usage, and workload type matter just as much as the raw number of threads.

A follow-up comparison with CPU-bound tasks or Java virtual threads would be really interesting. That would show where traditional threads start to hit their limits and how different concurrency approaches compare in practice.

Great experiment and a nice reality check for many of the assumptions we carry about threads.

S M Tahosin • Jun 5

That's a great way to put it. I think many of us still carry the mental model that a large thread count automatically means trouble, because that's often true with platform threads. What surprised me most wasn't the number itself, but how little actual contention there was once you look at what those threads were doing.

I also like your point that concurrency discussions often get reduced to a single metric. The raw thread count is easy to focus on, but thread state and workload characteristics usually tell a much more useful story.

A CPU-bound comparison is definitely on my list. My expectation is that the gap becomes much smaller there, which would reinforce the idea that virtual threads make waiting cheap, not computation cheap. That's where the distinction between concurrency and parallelism becomes really interesting in practice.

Adrian Ng • Jun 5

What stood out to me is how this highlights the gap between theory and reality. We often hear "threads are expensive" and stop there, but seeing 10,000 threads barely make a modern laptop sweat puts that advice into context. The most interesting part wasn't the number itself, it was the reminder to test assumptions instead of repeating them. Nice experiment and a fun read.

S M Tahosin • Jun 5

I couldn't agree more. The phrase "threads are expensive" isn't wrong, but it's often repeated without enough context.

What surprised me most was how different the actual result was from the picture I had in my head before running the test. That's exactly why I love small experiments like this. They have a way of exposing assumptions we didn't even realize we were carrying around.

Daniel Markus • Jun 5

This was a fun reminder that many of us still think about Java threads using rules from a different era. The most interesting part wasn't that 10,000 threads could be created, but how little impact it had when those threads weren't actively doing work.

It also highlights an important distinction between thread count and actual concurrency pressure. Numbers alone can be misleading. Thanks for sharing a simple experiment that challenges assumptions instead of repeating them.

S M Tahosin • Jun 5

Exactly. I think that's the key distinction people often miss. A large thread count sounds scary until you look at what those threads are actually doing.

The experiment was less about proving that 10,000 is a magic number and more about questioning an assumption I've seen repeated for years. Sometimes the mental model becomes outdated long before we realize it.

Hani Lieu • Jun 9

Really cool experiment. It's amazing how something that used to feel impossible is now running comfortably on a regular laptop thanks to virtual threads.

S M Tahosin • Jun 9

That's what surprised me too. A few years ago, "10,000 threads on a laptop" would have sounded like a terrible idea. Virtual threads really change what feels practical for highly concurrent workloads.