TL;DR Comparing Go and Python for code execution time, working with Docker images and native packages distribution.
Recently I was working on a programming challenge and I thought it would be interesting to have an implementation both in Go and Python.
The challenge consists of processing a dataset (specifically a csv of ~100mb), make some aggregations and output a json file that will be used as a data-source to serve a bunch of REST API endpoints.
But enough with the chit-chat, lets dig in about the comparison.
Execution time: time difference in speed between the two implementations to execute the data aggregation algorithm.
Docker image: complexity in producing a docker image and size of the generated images.
Native distribution: ease of use to install and run the app in a native environment (aka on your laptop).
The Go and Python version are respectively
Both implementation are quite small:
- PLZ Go: 250loc
- PLZ Py: 191loc (~23% smaller)
Metrics generated with tokei using:
- for Go
tokei -f -t=Go -e="*_test.go
- for Py
tokei -f -t=Python -e="tests"
Both implementations use the same identical sequential algorithm to process the data and both input and output are the same.
Average speed1 to process ~100mb csv input to a 32k json ouput is:
PZL Go: ~0.6s (~2.2x faster)
PLZ Py: ~1.3s
If you dig into the code and you spot some evident problem affecting the performance please let me know in the comments.
Not much to say here, Go is twice faster than Python in this scenario.
Since the application provides a REST API to serve the results of the aggregation, it make sense to ship the app as a Docker image.
The approach to build the docker image is to use a multi-stage build: first build the app and aggregate the data and then assemble a "production" image.
For this criteria Go is hands down the best, you can instruct the compiler to build a self contained binary to be package in a scratch image for a resulting image size of little more of 9mb (3.5mb compressed).
On the other, with Python, the final image (based on
3.8-slim-buster) is a staggering 175mb (55mb compressed). This is due to the many layers that compose the
3.8-slim-buster base image, therefore there is room for improvement by building a custom image, but that will likely require a significant effort for build and maintenance.
PZL Go: ~9.3mb - (18x smaller)
PLZ Py: ~175mb
By native distribution I mean to install the app on a server or on your laptop without having to tinker too much with software requirements. I strongly believe that this is an important metric, since having complex installation procedures involves a lot of cognitive effort that is ultimately wasted.
In this area both Go and Python are more or less equivalent on the surface, with Go you can install the package running
go get github.com/noandrea/plz and with Python using
pip install plzpy.
A caveat is that Python is far more popular and get shipped by default with many OS, while for Go you will likely have to download and install the Go toolkit.
Another notable difference is that with
pip install you are installing a "binary" distribution while with
go get you are fetching the source code and compiling it locally.
Worth mentioning that the maintainer(s) of the Go package may have shipped binaries for the OS specific architectures and package managers (
apt, ...), but that requires additional work and maintenance.
Slightly better: Python
I hope you enjoyed this comparison as much as I did in making it, and about which one is better, Go or Python, the answer is ....🥁 ... both and none 😉!
consistent over multiple executions ↩