Julien Dubois for Microsoft Azure

Posted on Apr 6, 2020

Spring Boot performance benchmarks with Tomcat, Undertow and Webflux

#java #azure #spring #performance

Tomcat vs Undertow vs Webflux

JHipster is used by thousands of people to generate production-ready Spring Boot applications. We've been using Undertow for years, with great success, but as we are planning for JHipster 7 we started discussing migrating away from Undertow.

This is the kind of discussion which happens very frequently in the JHipster community: with so many people contributing, we often test and try alternatives to our current approach.

We ended up discussing about 3 different application servers:

Undertow, from Red Hat/IBM: it is known for being lightweight, and we have years of (good) experiences with it.
Tomcat, from the Apache Software Foundation: by far, the most popular option. It is also the default solution coming with Spring Boot.
Webflux, from VMWare: this isn't really an application server, this is Spring Webflux running on top of Netty. This implies using reactive APIs, which are supposed to provide better performance and scalability. It's a whole different approach, which is also supported by JHipster.

The test applications

I created a performance benchmarks GitHub repository, with applications generated by JHipster.

Those applications are more complex and realistic than simple "hello, world" applications created with Spring Boot. For instance, they use Spring Security and Spring Boot Actuator: those libraries will impact application start-up time and performance, but they are what you would use in the real world.

Then, they are not as complex as they could be: I didn't configure a database or an application cache. Those would make running performance tests a lot more complicated, and wouldn't add any specific value: as we would use the same drivers or caching solution with all three application servers, we would end up testing the same things.

In the GitHub repository, you will find 4 directories: one for each application, and one with the performance tests.

Using Azure Virtual Machines for the test

As we needed to set up a test environment, using the cloud was obviously the best solution! For this test, I have created 2 virtual machines, with their own private network.

I used:

2 Azure Virtual Machines, using their default configuration called "Standard D2s v3" (2 vcpus, 8 GiB memory).
1 private Azure Virtual Network.

One of the machines was hosting the application server, and the other one was used to run the performance test suite.

The VMs were created using the Ubuntu image provided by default by Azure, on which I installed the latest LTS AdoptOpenJDK JVM (openjdk version "11.0.6" 2020-01-14).

To configure your virtual machines easily, I maintain this script which you might find useful: https://github.com/jdubois/jdubois-configuration.

Start-up time

Start-up time is all the rage now, but on JHipster we usually give more importance to runtime performance. For example, that's the reason why we use Afterburner: this slows down our start-up time, but gives about 10% higher runtime performance over a "normal" Spring Boot application.

Here are the results after 10 rounds, for each application server:

	Undertow	Tomcat	Webflux
1	4,879	5,237	4,285
2	4,847	5,125	4,225
3	4,889	5,103	4,221
4	5,013	5,129	4,232
5	4,84	5,134	4,271
6	5,007	5,141	4,191
7	4,868	5,214	4,147
8	4,826	5,032	4,251
9	4,856	5,069	4,274
10	4,908	5,078	4,128
Mean	4,8933	5,1262	4,2225
Difference		4,76%	-13,71%

As expected, Undertow is lighter than the competition, but the difference is quite small.

The runtime performance test

For our runtime performance, we needed to have a specific test suite.

The performance test were written in Scala for the Gatling load-testing tool. They are pretty simple (we are just doing POST and GET requests), and are available here.

This test does the following:

Each user does 100 GET requests and 100 POST requests, every 1.5 seconds.
We will have 10,000 users doing those requests, with a ramp up of 1 minute.

The objective here is to stay under 5,000 requests per second, as when you go above that level you will usually need to do some specific OS tuning.

Undertow performance benchmarks

The Undertow results were quite good: it could handle the whole load without losing one single request, and it was delivering about 2,700 requests per second:

The response time was quite slow, with nearly all users having to wait about 3 seconds to get a response. But it was also quite stable (or "fair" for all users), as the 50th percentile is not that far away from the 99th percentile (or even the "max" time!):

Tomcat performance benchmarks

Tomcat had about 5% of its requests failing:

Those failures explain why the graph below doesn't look very good. Also, it was only delivering about 2,100 requests per second (compared to 2,700 requests per second for Undertow):

Last but not least, the response time was good for about 10% of the requests, but it was much worse than Undertow at the 95th and 99th percentile, which shows that it could not handle all requests correctly. That's also why it had such a bad standard deviation (2,760 seconds!):

Webflux performance benchmarks

Webflux had about 1% of its requests failing:

Those failures were at the beginning of the tests, and then the server could correctly handle the load: it looks like it had to trouble to handle the traffic growth because it was quite sudden, and then stabilized.

Then, we can notice that once stabilized, Webflux had some strange variations - this is why we see all those peaks in the blue graph below: it would go suddenly from handling nearly 5,000 requests/second to less than 1,000 requests/second. In average, it was handling a bit more than 2,700 requests/second, so that's the same as Undertow, but with big variations that Undertow didn't have.

The variations that we noticed in the previous graph also explain why, compared to Undertow, Webflux has a lower 50th percentile, but a higher 95th percentile. And that's also why its standard deviation is much worse:

Conclusion

Undertow definitely had impressive results in those tests! Compared to Tomcat, it proved to start up faster, handle more load, and also had a far more stable throughput. Compared to Webflux, which has a completely different programming model, the difference was less important: Webflux started faster, but had 1% of errors at the beginning of the test - it looks like it had some trouble to handle the load at the beginning, but that wasn't a huge issue.

On JHipster, this is probably one of the many different choices that we have made, which make JHipster applications much faster and more stable than classical Spring Boot applications. So this performance test is definitely very important in our future decision to keep Undertow or move away from it. If you'd like to participate in the discussion, because you ran more tests or have any good insight, please don't hesitate to comment on our "migrating away from Undertow" ticket, or on this blog post.

Top comments (6)

undqurek • Apr 6 '23 • Edited

To get better much better results we can also use APR native libraries: dirask.com/posts/Spring-Boot-2-exa...

Advantages:

motivation to use APR native libraries is better connection management performance,
HTTPS lets to enable Brotli compression that is better than gzip,
with enabled HTTP2 we can get better connection and data transmission performance than HTTP 1.1.

Henri Gomez • Apr 7 '20

Julien, could you share configuration details of Tomcat, Undertow and Webflux ?

For Tomcat, there is many way to configure and tweak for high load than it would deserve a look

Thanks

Julien Dubois • Apr 7 '20

All the code is in the GitHub repo - those are basically the “normal” configurations that come out-of-the-box with Spring Boot.

Henri Gomez • Apr 8 '20

For Tomcat, this one ?

github.com/jdubois/jhipster-benchm...

Sorry, but I miss information :

Version of Tomcat is used
Connector used : APR / NIO / NIO2
Connector tuning

Julien Dubois • Apr 8 '20

It is this configuration file, but not this line: there is no line in fact, as it's the default configuration that comes with Spring Boot.
It should be Tomcat 9.0.31, and everything else is by default.

Henri Gomez • Apr 8 '20 • Edited

So basically, what you demonstrate is not Tomcat vs Undertow vs Webflux performance but Spring default for Tomcat vs Spring default for Undertow vs Spring default for Webflux.

You compare default settings for Spring servlet engines implementation rather than servlet engines themselves, it would be fair to make a note about this.

DEV Community

Spring Boot performance benchmarks with Tomcat, Undertow and Webflux

Tomcat vs Undertow vs Webflux

The test applications

Using Azure Virtual Machines for the test

Start-up time

The runtime performance test

Undertow performance benchmarks

Tomcat performance benchmarks

Webflux performance benchmarks

Conclusion

Top comments (6)

Read next

Create a GitHub pipeline to test, review, and deploy a Bicep template.

The Ultimate Guide to Sets in Java: Uncovering Every Secret of This Humble Data Structure

Top 10 Profiler Tools for Optimizing Software Performance in 2025

Behind the Scenes: Building a Dynamic Data Ingestion Pipeline with Azure Data Factory