DEV Community

Cover image for Concurrency in modern programming languages: Rust vs Go vs Java vs Node.js vs Deno vs .NET 6
Deepu K Sasidharan
Deepu K Sasidharan

Posted on • Edited on • Originally published at deepu.tech

Concurrency in modern programming languages: Rust vs Go vs Java vs Node.js vs Deno vs .NET 6

Originally published at deepu.tech.

This is a multi-part series where I'll discuss concurrency in modern programming languages. I will be building and benchmarking a concurrent web server, inspired by the example from the Rust book, in popular languages like Rust, Go, JavaScript (NodeJS), TypeScript (Deno), Kotlin, and Java to compare concurrency and its performance between these languages/platforms. The chapters of this series are as below.

  1. Introduction
  2. Concurrent web server in Rust
  3. Concurrent web server in Golang
  4. Concurrent web server in JavaScript with NodeJS
  5. Concurrent web server in TypeScript with Deno
  6. Concurrent web server in Java with JVM
  7. Comparison and conclusion of benchmarks

What is concurrency

Concurrency is one of the most complex aspects of programming, and depending on your language of choice, the complexity can be anywhere from "that looks confusing" to "what black magic is this".

Concurrency is the ability where multiple tasks can be executed in overlapping time periods, in no specific order without affecting the final outcome. Concurrency is a very broad term and can be achieved by multi-threading, parallelism, and/or asynchronous processing.

concurrency

First, I suggest you read the introduction post to understand this post better.

Benchmarking & comparison

In the previous posts, I built a simple web server in Rust, Go, Node.js, Deno, and Java. I kept it as simple as possible without using external dependencies as much as possible. I also kept the code similar across languages. In this final post, we will compare the performance of all these implementations to see which language offers the best performance for a concurrent web server.

If the language supports both asynchronous and multi-threaded concurrency, we will try both and a combination of both and pick the best performer for the comparison. The complexity of the application will hence depend on language features and language complexity. We will use whatever the language provides to make concurrency performance as good as possible without over-complicating stuff. The web server will just serve one endpoint, and it will add a sleep of two seconds on every tenth request. This will simulate a more realistic load, IMO.

We will use promises, thread pools, and workers if required and if the language supports it. We won't use any unnecessary I/O in the application.

The code implementations are probably not the best possible; if you have a suggestion for improvement, please open and issue or PR on this repository. Further improvements possible are:

  • Use a thread pool for Java multi-threaded version
  • Use a Java webserver library
  • Use createReadStream for Node.js
  • Use Warp, Rocket or actix-web for Rust Added a Rust actix-web sample to comparison

Disclaimer: I'm not claiming this to be an accurate scientific method or the best benchmark for concurrency. I'm pretty sure different use cases will have different results, and real-world web servers will have more complexity that requires communication between concurrent processes affecting performance. I'm just trying to provide some simple base comparisons for a simple use case. Also, my knowledge of some languages is better than others; hence I might miss some optimizations here and there. So please don't shout at me. If you think the code for a particular language can be improved out of the box to enhance concurrency performance, let me know. If you think this benchmark is useless, well, please suggest a better one :)

Update: Despite the above disclaimer, people were still mad at me for using thread.sleep to simulate blocking and for using ApacheBench for this benchmark. I have since updated the post with more benchmarks using different tools. It's still not scientific or the best way to benchmark concurrency. This is just me, doing experiments. If you have better ideas, please feel free to use the code and publish a follow-up or comment with your results, and I'll update the post with it and attribute you.

All the implementations used in this comparison can be found in the nosleep branch of this GitHub repository.

Benchmarking conditions

These will be some of the conditions I'll use for the benchmark.

  • The latest stable release versions of language/runtimes available are used, and as of writing, those are:
    • Rust: 1.58.1-Stable
    • Go: 1.17.6
    • Java: OpenJDK 17.0.2
    • Node.js: 17.4.0
    • Deno: 1.18.1
    • .NET: 6.0.100
  • Update: Thread.sleep has been removed from all implementations.
  • We will be using external dependencies only if that is the standard recommended way in the language.
    • latest versions of such dependencies as of writing will be used
  • We are not going to look at improving concurrency performance using any configuration tweaks
  • Update: Many people pointed out that ApacheBench is not the best tool for this benchmark. I have hence also included results from wrk and drill
  • We will use ApacheBench for the benchmarks with the below settings:
    • Concurrency factor of 100 requests
    • 10000 total requests
    • The benchmark will be done ten times for each language with a warmup round, and the mean values will be used.
    • ApacheBench version on Fedora: httpd-tools-2.4.52-1.fc35.x86_64
    • Command used: ab -c 100 -n 10000 http://localhost:8080/
  • All the benchmarks are run on the same machine running Fedora 35 on an Intel i9-11900H (8 core/16 thread) processor with 64GB memory.
    • The wrk and drill clients were run from another similar machine on the same network and also from the same computer; the results were more or less the same; I used the results from the client computer for comparisons.

Comparison parameters

I'll be comparing the below aspects related to concurrency as well.

  • Performance, based on benchmark results
  • Community consensus
  • Ease of use and simplicity, especially for complex use cases
  • External libraries and ecosystem for concurrency

Benchmark results

Updated: I have updated the benchmark results with the results from wrk, drill and also updated previous results from ApacheBench after tweaks suggested by various folks.

Update 2: There is a .NET 6 version in the repo now, thanks to srollinet for the PR. Benchmarks updated with the .NET results.

Update 3: Rust using actix-web and Java undertow is now included in the wrk and drill benchmarks. The implementations were simplified to return just a string instead of doing a file I/O for these, and hence they are shown as a separate set. I started this series as a concurrency in languages experiment. Now, this feels like a benchmark of web server frameworks; while concurrency is an important aspect of these, I'm not sure if the results mean anything from a concurrency of the language aspect.

Results from wrk

Benchmark using wrk with the below command (Threads 8, Connections 500, duration 30 seconds):



wrk -t8 -c500 -d30s http://127.0.0.1:8080


Enter fullscreen mode Exit fullscreen mode

wrk benchmarks with Go HTTP version

Update comparison of Go HTTP, Rust actix-web, Java Undertow, and .NET 6

wrk benchmarks with web servers

The Go, Rust, and Java web server versions blow everything out of the water when it comes to req/second performance. If we remove it, we get a better picture as below.

wrk benchmarks without web servers

Results from drill

Benchmark using drill with concurrency 1000 and 1 million requests

drill benchmark 1

Update comparison of Go HTTP, Rust actix-web, Java Undertow, and .NET 6

drill benchmark 1 with web servers

Benchmark using drill with concurrency 2000 and 1 million requests

drill benchmark 2

Update comparison of Go HTTP, Rust actix-web, Java Undertow, and .NET 6

drill benchmark 2 with web servers

Previous ApacheBench results with thread blocking

The average values for different metrics with a thread.sleep every ten requests across ten benchmark runs are as below:

Apache bench average

You can find all the results used in the GitHub repo

Conclusion

Based on the benchmark results, these are my observations.

Benchmark observations

Since recommendations based on benchmarks are hot topics, I'll just share my observations, and you can make decisions yourself.

  • For the HTTP server benchmark using wrk, Go HTTP wins in request/sec, latency, and throughput, but it uses more memory and CPU than Rust. This might be because Go has one of the best built-in HTTP libraries, and it's extremely tuned for the best possible performance; hence it's not fair to compare that with the simple TCP implementations I did for Java and Rust. But you can compare it to Node.js and Deno as they also have standard HTTP libs that are used here for benchmarks. Update: I have now compared Go HTTP to Rust actix-web and Java Undertow, and surprisingly Undertow performs better, and actix-web comes second. Probably a Go web framework, like Gin, will come closer to Undertow and actix-web.
  • The Go TCP version is a fair comparison to the Rust and Java implementations, and in this case, Both Java and Rust outperforms Go and hence would be logical to expect third party HTTP libraries in Rust and Java that can compete with Go and if I'm a betting person I would bet that there is a Rust library that can outperform Go.
  • Resource usage is a whole different story, Rust seems to use the least memory and CPU consistently in all the benchmarks, while Java uses the most memory, and Node.js multi-threaded version uses the most CPU.
  • Asynchronous Rust seems to perform worst than multi-threaded Rust implementations.
  • In the benchmarks using drill, the Asynchronous Java version outperformed Rust and was a surprise to me.
  • Java and Deno have more failed requests than others.
  • When concurrent requests are increased from 1000 to 2000, most implementations have a very high failure rate. The Go HTTP and Rust Tokio versions have nearly 100% failure rates, while multi-threaded Node.js have the least failure and have good performance at that concurrency level but with high CPU usage. It runs multiple versions of V8 for multi-threading, which explains the high CPU use.
  • Overall, Node.js still seems to perform better than Deno.
  • Another important takeaway is that benchmarking tools like ApacheBench, wrk, or drill seem to offer very different results, and hence micro-benchmarks are not as reliable as ultimate performance benchmarks. Based on the actual use case and implementation-specific details, there could be a lot of differences. Thanks to Eamon Nerbonne for pointing it out.
  • Apache Benchmarks run on versions with and without thread.sleep doesn't say much as the results are similar for all implementations, and it might be due to limitations of the ApacheBench tool. Hence as many people pointed out, I'm disregarding them.

For more comprehensive benchmarks for web frameworks, I recommend checking out TechEmpower's Web framework benchmarks

With ApacheBench, as you can see, there isn't any significant difference between the languages when it comes to total time taken for 10k requests for a system with considerable thread blocking, which means for a real-world use case, the language choice isn't going to be a huge factor for concurrency performance. But of course, if you want the best possible performance, then Rust clearly seems faster than other languages as it gives you the highest throughput, followed by Java and Golang. JavaScript and TypeScript are behind them, but not by a considerable margin. The Go version using the built-in HTTP server is the slowest of the bunch due to inconsistent performance across runs, probably due to garbage collection (GC) kicking in, causing spikes. Also interesting is to see the difference between the multi-threaded and asynchronous approaches. While for Rust, multi-threaded implementation performs the best by a slight margin, the asynchronous version performs slightly better for Java and JavaScript. But none of the differences is significant enough to justify suggesting one approach over another for this particular case. But in general, I would recommend using the asynchronous approach if available as it's more flexible without some of the limitations you might encounter with threads.

Community consensus

The community consensus when it comes to concurrency performance is quite split. For example, both Rust and Go communities claim to be the best in concurrency performance. From personal experience, I find them relatively close in performance, with Rust having a slight lead over Go. The Node.js ecosystem was built over the promise of asynchronous concurrency performance, and there are testimonials of huge performance improvements when switching to Node.js. Java also boasts of real-world projects serving millions of concurrent requests without any issues; hence it's hard to take a side here.

Another general observation is that Rust was quite consistent in terms of performance across runs while all other languages had some variance, especially when GC kicks in.

Simplicity

While performance is an important aspect, ease of use and simplicity is also very important. I think it's also important to differentiate between asynchronous and multi-threaded approaches.

Asynchronous: I personally find Node.js and Deno the simplest and easy-to-use platforms for async concurrency. Golang would be my second choice as it's also easy to use and simple without compromising on features or performance. Rust follows it as it is a bit more complex as it has more features and needs getting used to. I would rate Java last as it requires much more boilerplate, and doing asynchronous programming is more complex than in others. I hope project Loom fixes that for Java.

Multi-threaded: For multi-threaded concurrency, I will put Rust first as it's packed with features, and doing multi-threading is easy and worry-free in Rust due to memory and thread-safety. You don't have to worry about race conditions and such. I'll put Java and Go second here. Java has a mature ecosystem for multi-threading and is not too difficult to use. Go is very easy to use, but you don't have a lot of control over OS threads else I would rate Go higher than Java. Finally, there are multi-threading capabilities in Node.js and Deno, but they are not as flexible as other languages; hence I'll put them last.

Ecosystem

Rust has the best ecosystem for concurrency, in my opinion, followed by Java and Golang, which have matured options. Node.js and Deno, while not as good as others, offer a descent ecosystem as well.


If you like this article, please leave a like or a comment.

You can follow me on Twitter and LinkedIn.

Latest comments (62)

Collapse
 
der_gopher profile image
Alex Pliutau

Rust for Gophers with John Arundel packagemain.tech/p/rust-for-gophers

Collapse
 
der_gopher profile image
Alex Pliutau

We'll be publishing a post soon about comparing Rust and Go in 2024, stay tuned here - packagemain.tech/

Collapse
 
oliszymanski profile image
Oli | Developer

These results are very interesting! I wonder how will they look like when the full version of Zig will come out. I'm pretty sure that it'll knock out Rust

Collapse
 
mtrantalainen profile image
Mikko Rantalainen

Updates to benchmark testing were great!

I would recommend also testing servers using one (or more) computer as client and another as server. For example, you were testing localhost connections only which doesn't represent real world performance with real sockets that well. In addition, wrk was running on 8 CPU cores so unless you had reserved additional identical number of physical cores for all the test servers, asynchronous servers would get extra boost compared to multi-threaded servers due not over-booking the CPU that badly.

With real sockets I'd expect the server with lowest latency (Rust with async + multi-threaded) to get the best results.

If your benchmark software supports it, usually a better way to test servers is to decide timeout for a request (say 50 ms) and then test how many request/s you can execute until you start to get timeouts. Some server software is really unfair and fails to serve older request first to keep worst case latency sensible. This kind of testing would preferably ramp the request rate slowly until the timeout is triggered for a request. Best output for this kind of test would be a graph with request/s on horizontal axis and worst case latency on vertical axis.

If you end the test on first timeout, I'd expect Java servers to fail early because those often stall during GC and if your timeout is pretty small, a single world-stop GC may be enough to ruin the run. It's possible to create Java servers that do not exhibit stalls but the most simple implementation often fails on that.

Collapse
 
assertnotnull profile image
Patrice Gauthier

Not mentioned but Elixir parallelism is something else for it being based on the Erlang VM called BEAM.

Erlang was built as a “concurrency-oriented programming language.” The Erlang VM can create and manage its own lightweight, internal processes, and was designed to run millions of them. Erlang processes are lighter weight than threads, but independent like OS processes, so they can’t corrupt one another’s data. Instead of sharing memory, they use message passing to coordinate their work.

The Erlang VM runs multiple schedulers — one per CPU core — and ensures that its processes are efficiently spread across them. This means you get the full benefit of that multi-core server. Also, if a process runs long enough to need garbage collection (and many do not), other processes do not have to pause while that happens.

Referenced from this article

The concurrency system has been tested with a single really buff machine to handle 2 millions concurrent websockets.

Also in this Erlang VM there's the OTP system you can do cron job internally, have caching without Redis and have processes restart when their parent process notice they did crash.

Collapse
 
pkolaczk profile image
Piotr Kołaczkowski

How did you compile each program?

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Was compiled using their native compilers in production mode of available

Collapse
 
pkolaczk profile image
Piotr Kołaczkowski

When publishing benchmarks you have to give exact steps to repeat, so exact command line parameters and compiler flags used to compile should be given.

Also this:

When concurrent requests are increased from 1000 to 2000, most implementations have a very high failure rate. The Go HTTP and Rust Tokio versions have nearly 100% failure rates

Suggests that something is way off either with your setup or your code. Async Rust and async Go are capable of running hundred thousands concurrent connections.

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

All the commands can be found in the code repository mentioned. And for breakdown at 2000 concurrency, yes it's possible that the code is a problem. Do you see anything obvious?

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

First, I thought, It could also be the tool used itself which fails at those rates but then Node.js with multiple workers seems to work better so I'm not sure anymore

Collapse
 
karanpratapsingh profile image
Karan Pratap Singh

Great post! I really like how easy is to right concurrent code in Go!

Collapse
 
federicoviscomi profile image
federico viscomi

What about the memory footprint? It seems like it would be an important feature to consider

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I have updated the benchmarks with more data. WDYT now?

Collapse
 
mtrantalainen profile image
Mikko Rantalainen

What was the actual process doing? It seems that every request had 200 ms baseline delay and for example Rust took 0.7 ms over that vs Node.js taking 4-7 ms. If you get rid of that 200 latency, Rust should be 5-10x faster than Node.js in this test.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I'm gonna look into that

Collapse
 
stuta profile image
stuta

When you add 2 second delay every 10 requests you make the comparison totally meaningless. You are mesuring delays, not the code.

Reading the file in every loop mesures reading from disk, not actual program performance.

Also, ab is not a good tool for measuring. Usually you are measuring the performace of ab, not the system that can be 20 times faster than what ab can measure. Use github.com/wg/wrk instead.

When I remove the delay then go program crashes when testing with wrk: Error reading:EOF.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I have updated the benchmarks with more data. WDYT now?

Collapse
 
stuta profile image
stuta

Good. Could you update the code in repo too - they contain sleep().

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

Its a different branch now (nosleep)

Collapse
 
robingoupil profile image
Robin Goupil

I was about to comment on the topic, thank you for pointing this out and not blindly trusting the internet.
The massive hint is the extremely similar performance between all languages. I'm sure the intent of this article was to help the community, but I hope the author will understand their mistake and update the results accordingly.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I have updated the benchmarks with more data. WDYT now?

Collapse
 
eamonnerbonne profile image
Eamon Nerbonne

It's remarkable too that none of the other commenters noted this. As the adage goes: just because you read it on the internet doesn't make it true! Good catch.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I have updated the benchmarks with more data. WDYT now?

Thread Thread
 
eamonnerbonne profile image
Eamon Nerbonne

I think it's a lot better! I suspect that some of the differences now are due to technical trivia of superficially irrelevant details you happened to choose when implementing these programs, but that's the real-world for you. The current data highlights much more clearly just how many req/sec any of this options can handle - because I think that's the real takeaway here; the web-frameworks themselves are unlikely to be a significant bottleneck in any real-world usecase; and if the language and/or framework matters for heavier, real workloads - well, that's the kind of thing you can't microbenchmark well; you need a real use case.

What the current data also highlights more clearly is just how finicky perf at this level is; e.g. the way the program using the go http stack apparently is much more efficient than the program you labelled TCP; or how wrk and drill results are quite different. And that's important to understand; microbenchmarks are notoriously flaky and sensitive to all kinds of details you don't actually care about. Taking a microbenchmark to mean that task X takes time Y is usually the wrong way to think about it - it takes time Y only in the hyper-specific circumstances of the test - but generalizing a specific result is quite error-prone.

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

I think this is an excellent takeaway

Collapse
 
deepu105 profile image
Deepu K Sasidharan • Edited

I did add a disclaimer that this is a simple concurrency benchmark. I don't agree that its meaningless as I'm comparing exact same impl across languages to see if the language/platform makes any difference, sleep was added to introduce a concurrency bottleneck. This is not a HTTP performance comparison, its a concurrency comparison and for that I think AB is as good as any other tool. I'll try wrk and post the results.

Collapse
 
mtrantalainen profile image
Mikko Rantalainen

Instead of sleeping 2 sec or even 200 ms, you should sleep 3-8 ms to simulate access to fast SQL server. Then you would have meaningful request rate.

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

I have updated the benchmarks with more data. WDYT now?

Collapse
 
stuta profile image
stuta • Edited

System sleep is not exactly 2 seconds, it can vary wildly.

Basically you are comparing nonsense. This is like comparing people who jump at top of 2000m mountain and and you tell that someone jumped 2000.3 meters and someone else jumped 2000.26 meters and you tell that "results are almost same". And acctually in every jump the mountain height varies a lot.

Thread Thread
 
stuta profile image
stuta • Edited

If you want ot test concurrency then take the web code away. Give every program same amount of loops to run. And let the tests run 5 seconds or preferably more, anything less is not statistically valid.

Collapse
 
eamonnerbonne profile image
Eamon Nerbonne

All you demonstrated is that thread sleep works in all languages, and that overhead is significantly smaller than 200ms. You did not really benchmark the various language / server combos.

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

If it makes you all happy I'll update the post with numbers from the code without sleep. Oh and there are cases where a request takes more than 2 seconds, I have performance tuned many such systems when I was working for enterprise companies. Also do suggest better approach to simulate a thread blocking request

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

And I reiterate again if it wasn't clear from the title, intro or all the previous posts. I was trying to benchmark performance of concurrency and not web server performance. They are related but not the same