These days, a single laptop can chomp through gigabytes of data in seconds. So why was it taking ~1.5min to compress & upload 2 GB? Why was it taking ~10s to download just 100 MB?
I get bothered by code that "should" be fast but isn't, when I have to wait around for it. Maybe it's 30+ yrs experience with software, 25+ years in web dev: I have a pretty good sense when something is slower than it "should" be.
And o', but am I never satisfied with needlessly slow code.
Time is both time and money. The more cancer researchers can process data, the faster we get to innovative treatments and save lives. And going 2x as fast with the same hardware typically means spending 1/2 as much. In an eventual clinical setting, every cent matters when it comes to tests being given freely… which can mean life & death.
I checked with my co-conspirator Lynn Langit: "these speeds, but really though?" She pointed me at the gcloud CLI tool's much superior performance in file transfer.
That began an investigation into optimizing transfer: basically, the standard Python (& other) Blob implementation is single-threaded. So much computing power just … sitting there sad & idle.
It's nice when default settings "just work" – correctly, but also fast. The numpy library is absolutely brilliant because it brings all kinds of low-level hardware optimization into Python, you don't have to think about it.
In that spirit, I hope to make cloud storage file transfer just that much easier, so that you don't have to think about it to get fast performance.
Without further ado: introducing gs-fastcopy:
https://medium.com/@dchaley/introducing-gs-fastcopy-36bb3bb71818
It's my first open-source public Python package 🐍 📦 🎉
Package: https://pypi.org/project/gs-fastcopy/
Source code: https://github.com/redwoodconsulting-io/gs-fastcopy-python
Now I download & uncompress those 100 MB in just a couple seconds, not 10. I'll take a 5x speedup. And the impact is only bigger as the files get larger.
Top comments (0)