DEV Community

Cover image for Concurrency in modern programming languages: Rust vs Go vs Java vs Node.js vs Deno vs .NET 6
Deepu K Sasidharan
Deepu K Sasidharan

Posted on • Updated on • Originally published at deepu.tech

Concurrency in modern programming languages: Rust vs Go vs Java vs Node.js vs Deno vs .NET 6

Originally published at deepu.tech.

This is a multi-part series where I'll discuss concurrency in modern programming languages. I will be building and benchmarking a concurrent web server, inspired by the example from the Rust book, in popular languages like Rust, Go, JavaScript (NodeJS), TypeScript (Deno), Kotlin, and Java to compare concurrency and its performance between these languages/platforms. The chapters of this series are as below.

  1. Introduction
  2. Concurrent web server in Rust
  3. Concurrent web server in Golang
  4. Concurrent web server in JavaScript with NodeJS
  5. Concurrent web server in TypeScript with Deno
  6. Concurrent web server in Java with JVM
  7. Comparison and conclusion of benchmarks

What is concurrency

Concurrency is one of the most complex aspects of programming, and depending on your language of choice, the complexity can be anywhere from "that looks confusing" to "what black magic is this".

Concurrency is the ability where multiple tasks can be executed in overlapping time periods, in no specific order without affecting the final outcome. Concurrency is a very broad term and can be achieved by multi-threading, parallelism, and/or asynchronous processing.

concurrency

First, I suggest you read the introduction post to understand this post better.

Benchmarking & comparison

In the previous posts, I built a simple web server in Rust, Go, Node.js, Deno, and Java. I kept it as simple as possible without using external dependencies as much as possible. I also kept the code similar across languages. In this final post, we will compare the performance of all these implementations to see which language offers the best performance for a concurrent web server.

If the language supports both asynchronous and multi-threaded concurrency, we will try both and a combination of both and pick the best performer for the comparison. The complexity of the application will hence depend on language features and language complexity. We will use whatever the language provides to make concurrency performance as good as possible without over-complicating stuff. The web server will just serve one endpoint, and it will add a sleep of two seconds on every tenth request. This will simulate a more realistic load, IMO.

We will use promises, thread pools, and workers if required and if the language supports it. We won't use any unnecessary I/O in the application.

The code implementations are probably not the best possible; if you have a suggestion for improvement, please open and issue or PR on this repository. Further improvements possible are:

  • Use a thread pool for Java multi-threaded version
  • Use a Java webserver library
  • Use createReadStream for Node.js
  • Use Warp, Rocket or actix-web for Rust Added a Rust actix-web sample to comparison

Disclaimer: I'm not claiming this to be an accurate scientific method or the best benchmark for concurrency. I'm pretty sure different use cases will have different results, and real-world web servers will have more complexity that requires communication between concurrent processes affecting performance. I'm just trying to provide some simple base comparisons for a simple use case. Also, my knowledge of some languages is better than others; hence I might miss some optimizations here and there. So please don't shout at me. If you think the code for a particular language can be improved out of the box to enhance concurrency performance, let me know. If you think this benchmark is useless, well, please suggest a better one :)

Update: Despite the above disclaimer, people were still mad at me for using thread.sleep to simulate blocking and for using ApacheBench for this benchmark. I have since updated the post with more benchmarks using different tools. It's still not scientific or the best way to benchmark concurrency. This is just me, doing experiments. If you have better ideas, please feel free to use the code and publish a follow-up or comment with your results, and I'll update the post with it and attribute you.

All the implementations used in this comparison can be found in the nosleep branch of this GitHub repository.

Benchmarking conditions

These will be some of the conditions I'll use for the benchmark.

  • The latest stable release versions of language/runtimes available are used, and as of writing, those are:
    • Rust: 1.58.1-Stable
    • Go: 1.17.6
    • Java: OpenJDK 17.0.2
    • Node.js: 17.4.0
    • Deno: 1.18.1
    • .NET: 6.0.100
  • Update: Thread.sleep has been removed from all implementations.
  • We will be using external dependencies only if that is the standard recommended way in the language.
    • latest versions of such dependencies as of writing will be used
  • We are not going to look at improving concurrency performance using any configuration tweaks
  • Update: Many people pointed out that ApacheBench is not the best tool for this benchmark. I have hence also included results from wrk and drill
  • We will use ApacheBench for the benchmarks with the below settings:
    • Concurrency factor of 100 requests
    • 10000 total requests
    • The benchmark will be done ten times for each language with a warmup round, and the mean values will be used.
    • ApacheBench version on Fedora: httpd-tools-2.4.52-1.fc35.x86_64
    • Command used: ab -c 100 -n 10000 http://localhost:8080/
  • All the benchmarks are run on the same machine running Fedora 35 on an Intel i9-11900H (8 core/16 thread) processor with 64GB memory.
    • The wrk and drill clients were run from another similar machine on the same network and also from the same computer; the results were more or less the same; I used the results from the client computer for comparisons.

Comparison parameters

I'll be comparing the below aspects related to concurrency as well.

  • Performance, based on benchmark results
  • Community consensus
  • Ease of use and simplicity, especially for complex use cases
  • External libraries and ecosystem for concurrency

Benchmark results

Updated: I have updated the benchmark results with the results from wrk, drill and also updated previous results from ApacheBench after tweaks suggested by various folks.

Update 2: There is a .NET 6 version in the repo now, thanks to srollinet for the PR. Benchmarks updated with the .NET results.

Update 3: Rust using actix-web and Java undertow is now included in the wrk and drill benchmarks. The implementations were simplified to return just a string instead of doing a file I/O for these, and hence they are shown as a separate set. I started this series as a concurrency in languages experiment. Now, this feels like a benchmark of web server frameworks; while concurrency is an important aspect of these, I'm not sure if the results mean anything from a concurrency of the language aspect.

Results from wrk

Benchmark using wrk with the below command (Threads 8, Connections 500, duration 30 seconds):

wrk -t8 -c500 -d30s http://127.0.0.1:8080
Enter fullscreen mode Exit fullscreen mode

wrk benchmarks with Go HTTP version

Update comparison of Go HTTP, Rust actix-web, Java Undertow, and .NET 6

wrk benchmarks with web servers

The Go, Rust, and Java web server versions blow everything out of the water when it comes to req/second performance. If we remove it, we get a better picture as below.

wrk benchmarks without web servers

Results from drill

Benchmark using drill with concurrency 1000 and 1 million requests

drill benchmark 1

Update comparison of Go HTTP, Rust actix-web, Java Undertow, and .NET 6

drill benchmark 1 with web servers

Benchmark using drill with concurrency 2000 and 1 million requests

drill benchmark 2

Update comparison of Go HTTP, Rust actix-web, Java Undertow, and .NET 6

drill benchmark 2 with web servers

Previous ApacheBench results with thread blocking

The average values for different metrics with a thread.sleep every ten requests across ten benchmark runs are as below:

Apache bench average

You can find all the results used in the GitHub repo

Conclusion

Based on the benchmark results, these are my observations.

Benchmark observations

Since recommendations based on benchmarks are hot topics, I'll just share my observations, and you can make decisions yourself.

  • For the HTTP server benchmark using wrk, Go HTTP wins in request/sec, latency, and throughput, but it uses more memory and CPU than Rust. This might be because Go has one of the best built-in HTTP libraries, and it's extremely tuned for the best possible performance; hence it's not fair to compare that with the simple TCP implementations I did for Java and Rust. But you can compare it to Node.js and Deno as they also have standard HTTP libs that are used here for benchmarks. Update: I have now compared Go HTTP to Rust actix-web and Java Undertow, and surprisingly Undertow performs better, and actix-web comes second. Probably a Go web framework, like Gin, will come closer to Undertow and actix-web.
  • The Go TCP version is a fair comparison to the Rust and Java implementations, and in this case, Both Java and Rust outperforms Go and hence would be logical to expect third party HTTP libraries in Rust and Java that can compete with Go and if I'm a betting person I would bet that there is a Rust library that can outperform Go.
  • Resource usage is a whole different story, Rust seems to use the least memory and CPU consistently in all the benchmarks, while Java uses the most memory, and Node.js multi-threaded version uses the most CPU.
  • Asynchronous Rust seems to perform worst than multi-threaded Rust implementations.
  • In the benchmarks using drill, the Asynchronous Java version outperformed Rust and was a surprise to me.
  • Java and Deno have more failed requests than others.
  • When concurrent requests are increased from 1000 to 2000, most implementations have a very high failure rate. The Go HTTP and Rust Tokio versions have nearly 100% failure rates, while multi-threaded Node.js have the least failure and have good performance at that concurrency level but with high CPU usage. It runs multiple versions of V8 for multi-threading, which explains the high CPU use.
  • Overall, Node.js still seems to perform better than Deno.
  • Another important takeaway is that benchmarking tools like ApacheBench, wrk, or drill seem to offer very different results, and hence micro-benchmarks are not as reliable as ultimate performance benchmarks. Based on the actual use case and implementation-specific details, there could be a lot of differences. Thanks to Eamon Nerbonne for pointing it out.
  • Apache Benchmarks run on versions with and without thread.sleep doesn't say much as the results are similar for all implementations, and it might be due to limitations of the ApacheBench tool. Hence as many people pointed out, I'm disregarding them.

For more comprehensive benchmarks for web frameworks, I recommend checking out TechEmpower's Web framework benchmarks

With ApacheBench, as you can see, there isn't any significant difference between the languages when it comes to total time taken for 10k requests for a system with considerable thread blocking, which means for a real-world use case, the language choice isn't going to be a huge factor for concurrency performance. But of course, if you want the best possible performance, then Rust clearly seems faster than other languages as it gives you the highest throughput, followed by Java and Golang. JavaScript and TypeScript are behind them, but not by a considerable margin. The Go version using the built-in HTTP server is the slowest of the bunch due to inconsistent performance across runs, probably due to garbage collection (GC) kicking in, causing spikes. Also interesting is to see the difference between the multi-threaded and asynchronous approaches. While for Rust, multi-threaded implementation performs the best by a slight margin, the asynchronous version performs slightly better for Java and JavaScript. But none of the differences is significant enough to justify suggesting one approach over another for this particular case. But in general, I would recommend using the asynchronous approach if available as it's more flexible without some of the limitations you might encounter with threads.

Community consensus

The community consensus when it comes to concurrency performance is quite split. For example, both Rust and Go communities claim to be the best in concurrency performance. From personal experience, I find them relatively close in performance, with Rust having a slight lead over Go. The Node.js ecosystem was built over the promise of asynchronous concurrency performance, and there are testimonials of huge performance improvements when switching to Node.js. Java also boasts of real-world projects serving millions of concurrent requests without any issues; hence it's hard to take a side here.

Another general observation is that Rust was quite consistent in terms of performance across runs while all other languages had some variance, especially when GC kicks in.

Simplicity

While performance is an important aspect, ease of use and simplicity is also very important. I think it's also important to differentiate between asynchronous and multi-threaded approaches.

Asynchronous: I personally find Node.js and Deno the simplest and easy-to-use platforms for async concurrency. Golang would be my second choice as it's also easy to use and simple without compromising on features or performance. Rust follows it as it is a bit more complex as it has more features and needs getting used to. I would rate Java last as it requires much more boilerplate, and doing asynchronous programming is more complex than in others. I hope project Loom fixes that for Java.

Multi-threaded: For multi-threaded concurrency, I will put Rust first as it's packed with features, and doing multi-threading is easy and worry-free in Rust due to memory and thread-safety. You don't have to worry about race conditions and such. I'll put Java and Go second here. Java has a mature ecosystem for multi-threading and is not too difficult to use. Go is very easy to use, but you don't have a lot of control over OS threads else I would rate Go higher than Java. Finally, there are multi-threading capabilities in Node.js and Deno, but they are not as flexible as other languages; hence I'll put them last.

Ecosystem

Rust has the best ecosystem for concurrency, in my opinion, followed by Java and Golang, which have matured options. Node.js and Deno, while not as good as others, offer a descent ecosystem as well.


If you like this article, please leave a like or a comment.

You can follow me on Twitter and LinkedIn.

Oldest comments (60)

Collapse
 
somedood profile image
Basti Ortiz • Edited

Interesting results! I didn't expect everyone to perform about the same. This is great!

I would hypothesize that this is due to the fact that network requests are mostly I/O-bound. That is to say, the CPU remains idle most of the time as it waits for the network to respond.

Therefore, the underlying runtime—namely Node.js for JavaScript, Tokio for Rust and Deno, etc.—is experimentally irrelevant for this use case, as you have shown in your data. It seems that under the hood, all runtimes manage to process the requests faster than the network/hardware can provide the bytes, hence the insignificant differences in the various time-based metrics. TL;DR: the network may be the bottleneck, not the languages.

With that said, I would be very interested in a follow-up post where you go beyond time-based metrics since they don't paint the full picture. Namely, I would like to cite Discord's case study on why they switched from Go to Rust.

In their article, the major performance gains mostly came from the absence of garbage collection, which you also briefly mentioned in your conclusion. Go's garbage-collected runtime caused large spikes in latency and CPU usage every two minutes or so, which ultimately proved to be unacceptable at Discord's scale. I highly recommend reading their thoughts on it. 👌

Anyway, what I'm trying to say is that I look forward to an investigation into other metrics beyond "requests-per-second". As Discord's engineering team has shown, this does not always paint the full picture. Data on CPU and memory usage would definitely make your series more comprehensive.

Nevertheless, this is excellent write-up!

Collapse
 
fjones profile image
FJones

Aye, the findings about Go match my experience as well. It's very useful for static caches (e.g. ZIP code and address data), but horrible at LRU caches and the like. If you have an upper bound on your memory usage and know you can keep that in memory on your instance, it's great and super quick. If you need to free up memory and dynamically replace cache entries, it falls apart.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Yes, I fully agree and thats why I added a disclaimer. This is a very simple benchmark, for a real world usecase there are considerations beyond this and Rust has way more benefits than concurrency to win over Go. I would choose Rust over Go anyday. And thanks for the Discourd article, I didn't see that before, its very interesting

Collapse
 
gklijs profile image
Gerard Klijs

It might be interesting to also measure the resource use while the test is running. I did something like that earlier between Java and Rust. Where for cpu it was pretty comparable, but memory use with Rust was much lower.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Rust uses way less resources. Actually that would be an interesting metric to look at. I know from expereince that Rust uses way less memory than all others for same stuff. I worte KDash in Rust, which is way more graphically intensive than kubectl, but its still uses 6-7 times less memory than kubectl. For memory usage my bet would be Rust < Go < Deno < Node.js < Java

Collapse
 
arunnabraham profile image
arunnabraham

Well many of you miss a concurrent async extension of PHP called swoole. Or workerman. It also shows impressive results closer to golang

Collapse
 
metal3d profile image
Patrice Ferlet

I'm sorry but it's one more time a biased argumentation for rust. Speaking about the threading control is OK, but what can we see in the reality ? You have no native access to http library with rust, you have more lines of code to type and it's not as simple to read than the Go TCP version.
More, you say that you haven't as much as control for threading with go... But goroutines are made to use concurrency or threading without the need to develop the switch. Then if you want threading and control you can use C inside Go or there are packages for this, so you can avoid goroutines.
Rust is cool for memory management and why not low level development like in Linux kernel. But it's way more complicated than Go to develop such http service. Instead of switching to rust, I prefer to ask to Go creator to help on fixing garbage collection control.
And leaving go managing concurrency and threading with ease and efficiency.
Sorry for my answer but I see too many pro rust article with too much of criticism for Go.

Collapse
 
pkolaczk profile image
Piotr Kołaczkowski

The biggest selling point of Rust is IMHO fearless concurrency with guarantee for no data races. So while Go (and JS and Java) programs may appear initially simpler to write, because they give a bit more freedom to a programmer, at the end of the day they are often not as easy to reason about. It is trivial to guarantee that a piece of code won't be called concurrently in Rust, I can see explicitly what is allowed to run concurrently and what not and if I try to invoke a non-thread safe code accidentally in multithreaded context, it simply won't compile. Fixing a compile time error vs fixing code failing once a week in production under heavy load only - the choice is pretty obvious to me.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Exactly and that why I said Rust is better for multi-threading. Performance is just added bonus

Collapse
 
metal3d profile image
Patrice Ferlet

There are tools in Go to check race condition.

The fact is that there are two paradigms, not a better one over the other.

Rust is not suitable for developing REST APIs, at least not as easily as with Go or even Python. Rust is very cool for developing low-level tools with increased control of memory management, in the case of low CPU cost applications.
But when we develop an HTTP application with a very complex management of coroutines to manage SSEs, with messages coming from different routines, Rust becomes purely and simply infernal.

Rust has its advantages, but you have to keep in mind that other languages are not outdone. I don't see myself developing machine learning in Go or Rust - I don't develop kernel modules in JS, and I definitely don't do REST APIs in Rust or C.

And for so many reasons, I'd need a multi-step article to demonstrate that Rust doesn't fit so well in many areas.

Thread Thread
 
mhvelplund profile image
Mads Hvelplund

So, having made a few REST services with Rust, I can say that it's fine for it. I just generated the stub code with the OpenAPI CLI generator from a spec file, and then implemented the business logic as I would do in most languages.

The main disadvantage to Rust is that it's more difficult to learn. Ofcourse, that is just my opinion. But being more difficult, it is also more expensive to hire competent devs to maintain your application once you, the master programmer, have finished it.

I think I could have made the same services with Node or Python in 25% of the time, with no fear of data races, due to the nature of the services. Also, I/O to the cloud provider would be the bottleneck in most of the applications, not time spent in logic.

So my take away after an enjoyable 16 months of exclusively programming Rust is that it is not the tool for everything. If you are writing an MPEG encoder, or a scientific calculation library, it would be great, but if you are writing wrappers for other services, there are better languages with cheaper development costs.

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

I agree. It's definitely not suitable for everything

Collapse
 
deepu105 profile image
Deepu K Sasidharan • Edited

I'm a polyglot developer with roots in Java, JavaScript, who later did Go, C#, PHP, Python and Rust so I'm not trying to be biased. I have done most of the coding in Java, JS/TS, Go and Rust and what I wrote is based on what I experienced. For me the only selling points of Go over Rust is that its simple to read (not to write, Go is way more boilerplate due to lack of generics for bigger projects) and concurrency is easier to write. When I said control over concurrency, I just stated the fact that Rust offers more control on that area than Go. I did say Go was easier for asyc than Rust. So for me this is not from a fanboy standpoint but more from a user who finds one product better than other. I'm not married to Rust or Go, if there is a new language that is better than Rust i'll sing praise of that in a heartbeat.

See this PR for example, in Rust you don't even have to think about data races

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Also this post from another comment talks in detail about why people who try both Go and Rust end up preferring Rust for such use cases discord.com/blog/why-discord-is-sw...

Collapse
 
metal3d profile image
Patrice Ferlet

That's not the case for everyone. There are plenty of examples of developers that switches from Rust to Go, from JS to Python, from Python to Go, from Java to Rust...

95% of users are on Windows, that doesn't make Windows the best OS. That's the same for languages and technologies.

Go is simpler to use, like Python is simpler to use. That helps to develop faster and with a certain level of needed control. Discord needed to avoid long LRU cache cleaning: OK. Now, is this very important for 95% of the websites in the world ?

You know what, I never cleaned cache of all website I do in Go. Their memory print is low... so... why using Rust here ? (And one of the API I develop has 100k request/sec to manage)

Collapse
 
mrhiden profile image
Marek Krzyżowski

I hope next time you will also include V-lang in this comparison chalange. Perhaps results will be similar but this Lang is really worth to look at.

Collapse
 
mtrantalainen profile image
Mikko Rantalainen

V-lang is indeed interesting but last time I checked, the automatic memory freeing was really buggy (check the issues on GitHub for details) and if you don't free memory, RAM usage is obliviously going to explode pretty fast if you handle e.g. 100k requests.

Collapse
 
pkolaczk profile image
Piotr Kołaczkowski

There are two things that look fishy to me in those results:

  • There are almost no differences between server implementations. Not saying it is impossible, but differences in server implementations small like that are very unlikely.
  • The absolute number of handled requests per second is very low - I was able to easily get 250k http requests per second handled by Actix-web server, loaded with ab, on a laptop, so more than 2 orders of magnitude faster. Also got 100k+ req/s in some server implementations in Go (but far different than Rust).

This suggests there was a common bottleneck outside of your server implementations, and you've measured the performance of that bottleneck, not the servers. Which also means the results are probably inconclusive and you can't interpret them as "Rust has won".

I looked quickly at your code and it seems you're opening a new connection for each request. This typically adds a large amount of latency and system load to each request and might become a problem, particularly at low concurrency levels like 100.

A few suggestions for better benchmarking:

  • For a throughput comparison you need to verify if the servers are really working at their full speed, so you should capture CPU load. It is also good to capture other system metrics like system CPU time, cache misses, context switches and syscalls, which are often a good indicator of how efficiently the server app interacts with the system.

  • Cache connections and leverage the HTTP keep-alive. That makes a tremendous difference in throughout.

  • Play with different concurrency levels. If concurrency is too low and latency is too high, you won't get the max throughput. The server would simply wait for requests, handle them quickly and go idle waiting for more. Also switching between idle and active is costly (context switch).

  • In latency tests, latency median is not as interesting as a full histogram. I'd expect large differences in P99 between GCed and non-gced servers. So even if medians are very close, it doesn't mean the servers would work equally well in production. Obviously you should do latency tests at lower throughout than max, so those should be separate experiments.

Anyway I'd love to see updated results, because you seem to have put a lot of work into multiple implementations and it would be a pity if you stopped now ;)

Collapse
 
deepu105 profile image
Deepu K Sasidharan

When I started the series, I did want to capture more metrics, but that kept pushed and Its been months so I decided to do something simple atleast. The source is in GitHub so if you are interested feel free to use it and publish a follow up. I might not have time anytime in the near future due to other commitments. The metrics you are suggesting will take a lot of effort and time to do properly. The bottleneck is the sleep introduced, so theoritically 25 seconds is the best possible for this code. If I remove the sleep this is the result for same 10k req with 100 concurrent

Concurrency Level:      100
Time taken for tests:   0.309 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      2830000 bytes
HTML transferred:       1760000 bytes
Requests per second:    32344.98 [#/sec] (mean)
Time per request:       3.092 [ms] (mean)
Time per request:       0.031 [ms] (mean, across all concurrent requests)
Transfer rate:          8939.09 [Kbytes/sec] received
Enter fullscreen mode Exit fullscreen mode
Collapse
 
mtrantalainen profile image
Mikko Rantalainen

Concurrency level 100 is way to small. Try something in range 500-5000. Beware that ab is not a good tool for testing high concurrency.

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

Ya, I wasn't expecting people take this simple experiment of mine so seriously. I'll try to update the tests to something better

Collapse
 
brunoborges profile image
Bruno Borges

Never run client/server benchmarks on the same computer.

The process to generate loads will inevitably impact the process to serve the requests.

Best infra for benchmarking is two independent computer hardware. Not even VMs, as they also compete for CPU resources.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Honestly I didn't expect people to take this so seriously or even for the post to do well. I was just wrapping up a series that was taking a lot of effort and not much interest in terms of views. But man this blowed up. Now I think I have to rework this to something better 😂

Thread Thread
 
brunoborges profile image
Bruno Borges

That's what happens when you publish benchmarks! 😂

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

lesson learned :P

Collapse
 
pkolaczk profile image
Piotr Kołaczkowski

Depends on how efficient the load generation tool is vs how much work on the server side is required to handle the request. You can also pin those two processes to different CPU core sets. This way one computer is enough to get meaningful results. Obviously if your don't know what you're doing, it is better to use two separate machines.

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

Ya in this case the server is quite simple and doesn't need too much resource that might explain why I got similar results from both. I would be interested in learning more about pinning process to cores. Do you have any resource you can recommend?

Thread Thread
 
pkolaczk profile image
Piotr Kołaczkowski

man taskset

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I have updated the benchmarks with more data. WDYT now?

Collapse
 
stuta profile image
stuta

When you add 2 second delay every 10 requests you make the comparison totally meaningless. You are mesuring delays, not the code.

Reading the file in every loop mesures reading from disk, not actual program performance.

Also, ab is not a good tool for measuring. Usually you are measuring the performace of ab, not the system that can be 20 times faster than what ab can measure. Use github.com/wg/wrk instead.

When I remove the delay then go program crashes when testing with wrk: Error reading:EOF.

Collapse
 
deepu105 profile image
Deepu K Sasidharan • Edited

I did add a disclaimer that this is a simple concurrency benchmark. I don't agree that its meaningless as I'm comparing exact same impl across languages to see if the language/platform makes any difference, sleep was added to introduce a concurrency bottleneck. This is not a HTTP performance comparison, its a concurrency comparison and for that I think AB is as good as any other tool. I'll try wrk and post the results.

Collapse
 
eamonnerbonne profile image
Eamon Nerbonne

All you demonstrated is that thread sleep works in all languages, and that overhead is significantly smaller than 200ms. You did not really benchmark the various language / server combos.

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

If it makes you all happy I'll update the post with numbers from the code without sleep. Oh and there are cases where a request takes more than 2 seconds, I have performance tuned many such systems when I was working for enterprise companies. Also do suggest better approach to simulate a thread blocking request

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

And I reiterate again if it wasn't clear from the title, intro or all the previous posts. I was trying to benchmark performance of concurrency and not web server performance. They are related but not the same

Collapse
 
stuta profile image
stuta • Edited

System sleep is not exactly 2 seconds, it can vary wildly.

Basically you are comparing nonsense. This is like comparing people who jump at top of 2000m mountain and and you tell that someone jumped 2000.3 meters and someone else jumped 2000.26 meters and you tell that "results are almost same". And acctually in every jump the mountain height varies a lot.

Thread Thread
 
stuta profile image
stuta • Edited

If you want ot test concurrency then take the web code away. Give every program same amount of loops to run. And let the tests run 5 seconds or preferably more, anything less is not statistically valid.

Collapse
 
mtrantalainen profile image
Mikko Rantalainen

Instead of sleeping 2 sec or even 200 ms, you should sleep 3-8 ms to simulate access to fast SQL server. Then you would have meaningful request rate.

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

I have updated the benchmarks with more data. WDYT now?

Collapse
 
eamonnerbonne profile image
Eamon Nerbonne

It's remarkable too that none of the other commenters noted this. As the adage goes: just because you read it on the internet doesn't make it true! Good catch.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I have updated the benchmarks with more data. WDYT now?

Thread Thread
 
eamonnerbonne profile image
Eamon Nerbonne

I think it's a lot better! I suspect that some of the differences now are due to technical trivia of superficially irrelevant details you happened to choose when implementing these programs, but that's the real-world for you. The current data highlights much more clearly just how many req/sec any of this options can handle - because I think that's the real takeaway here; the web-frameworks themselves are unlikely to be a significant bottleneck in any real-world usecase; and if the language and/or framework matters for heavier, real workloads - well, that's the kind of thing you can't microbenchmark well; you need a real use case.

What the current data also highlights more clearly is just how finicky perf at this level is; e.g. the way the program using the go http stack apparently is much more efficient than the program you labelled TCP; or how wrk and drill results are quite different. And that's important to understand; microbenchmarks are notoriously flaky and sensitive to all kinds of details you don't actually care about. Taking a microbenchmark to mean that task X takes time Y is usually the wrong way to think about it - it takes time Y only in the hyper-specific circumstances of the test - but generalizing a specific result is quite error-prone.

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

I think this is an excellent takeaway

Collapse
 
robingoupil profile image
Robin Goupil

I was about to comment on the topic, thank you for pointing this out and not blindly trusting the internet.
The massive hint is the extremely similar performance between all languages. I'm sure the intent of this article was to help the community, but I hope the author will understand their mistake and update the results accordingly.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I have updated the benchmarks with more data. WDYT now?

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I have updated the benchmarks with more data. WDYT now?

Collapse
 
stuta profile image
stuta

Good. Could you update the code in repo too - they contain sleep().

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

Its a different branch now (nosleep)

Collapse
 
mtrantalainen profile image
Mikko Rantalainen

What was the actual process doing? It seems that every request had 200 ms baseline delay and for example Rust took 0.7 ms over that vs Node.js taking 4-7 ms. If you get rid of that 200 latency, Rust should be 5-10x faster than Node.js in this test.

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I'm gonna look into that

Collapse
 
federicoviscomi profile image
federico viscomi

What about the memory footprint? It seems like it would be an important feature to consider

Collapse
 
deepu105 profile image
Deepu K Sasidharan

I have updated the benchmarks with more data. WDYT now?

Collapse
 
karanpratapsingh profile image
Karan Pratap Singh

Great post! I really like how easy is to right concurrent code in Go!

Collapse
 
pkolaczk profile image
Piotr Kołaczkowski

How did you compile each program?

Collapse
 
deepu105 profile image
Deepu K Sasidharan

Was compiled using their native compilers in production mode of available

Collapse
 
pkolaczk profile image
Piotr Kołaczkowski

When publishing benchmarks you have to give exact steps to repeat, so exact command line parameters and compiler flags used to compile should be given.

Also this:

When concurrent requests are increased from 1000 to 2000, most implementations have a very high failure rate. The Go HTTP and Rust Tokio versions have nearly 100% failure rates

Suggests that something is way off either with your setup or your code. Async Rust and async Go are capable of running hundred thousands concurrent connections.

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

All the commands can be found in the code repository mentioned. And for breakdown at 2000 concurrency, yes it's possible that the code is a problem. Do you see anything obvious?

Thread Thread
 
deepu105 profile image
Deepu K Sasidharan

First, I thought, It could also be the tool used itself which fails at those rates but then Node.js with multiple workers seems to work better so I'm not sure anymore

Collapse
 
assertnotnull profile image
Patrice Gauthier

Not mentioned but Elixir parallelism is something else for it being based on the Erlang VM called BEAM.

Erlang was built as a “concurrency-oriented programming language.” The Erlang VM can create and manage its own lightweight, internal processes, and was designed to run millions of them. Erlang processes are lighter weight than threads, but independent like OS processes, so they can’t corrupt one another’s data. Instead of sharing memory, they use message passing to coordinate their work.

The Erlang VM runs multiple schedulers — one per CPU core — and ensures that its processes are efficiently spread across them. This means you get the full benefit of that multi-core server. Also, if a process runs long enough to need garbage collection (and many do not), other processes do not have to pause while that happens.

Referenced from this article

The concurrency system has been tested with a single really buff machine to handle 2 millions concurrent websockets.

Also in this Erlang VM there's the OTP system you can do cron job internally, have caching without Redis and have processes restart when their parent process notice they did crash.

Collapse
 
mtrantalainen profile image
Mikko Rantalainen

Updates to benchmark testing were great!

I would recommend also testing servers using one (or more) computer as client and another as server. For example, you were testing localhost connections only which doesn't represent real world performance with real sockets that well. In addition, wrk was running on 8 CPU cores so unless you had reserved additional identical number of physical cores for all the test servers, asynchronous servers would get extra boost compared to multi-threaded servers due not over-booking the CPU that badly.

With real sockets I'd expect the server with lowest latency (Rust with async + multi-threaded) to get the best results.

If your benchmark software supports it, usually a better way to test servers is to decide timeout for a request (say 50 ms) and then test how many request/s you can execute until you start to get timeouts. Some server software is really unfair and fails to serve older request first to keep worst case latency sensible. This kind of testing would preferably ramp the request rate slowly until the timeout is triggered for a request. Best output for this kind of test would be a graph with request/s on horizontal axis and worst case latency on vertical axis.

If you end the test on first timeout, I'd expect Java servers to fail early because those often stall during GC and if your timeout is pretty small, a single world-stop GC may be enough to ruin the run. It's possible to create Java servers that do not exhibit stalls but the most simple implementation often fails on that.

Collapse
 
oliszymanski profile image
Oli | Developer

These results are very interesting! I wonder how will they look like when the full version of Zig will come out. I'm pretty sure that it'll knock out Rust