This is an interesting list. I'm not usually a fan of listicles (article + list = listicle) since they are rarely carefully curated and someone basically throws the list together to fulfill some quota. You obviously spent a great deal of time putting this together, including figuring out what tool they replace. Very nice.
Two observations/thoughts:
speedtest-cli is fine as long as you understand that the results might not be realistic. Ookla, the organization behind speedtest.net, may have the largest selection of donated speed test servers globally, but ISPs are fully aware of their existence. Some ISPs even go out of their way to lie to their customers such as remove bandwidth caps for connections to the known, public list of Ookla speed test servers so that you get the advertised speeds you are paying for in just that one specific use-case. Ookla also sells cobranded versions of their speed testing widget to the ISPs that cater to each ISP's whims/desires, which I think is pretty sus. That said, something that does speed testing is better than nothing. Speed testing tools are also useful for the datacenter. For example, DigitalOcean advertises a minimum 1Gps link speed but I've seen burst rates up to 3Gps even for their cheapest VPS servers.
The rsync algorithm is different from most differencing algorithms because it does not require the presence of the two files to calculate the delta. Instead, it requires a set of checksums of each block of one file, which together form a signature for that file. Blocks at any position in the other file which have the same checksum are likely to be identical, and whatever remains is the difference.
The bolded portion is the important phrase that makes both librsync and anything built on it likely to introduce data corruption over time. There is no guarantee that the same checksum of a portion of a file means that the stored data is actually identical. I have personally witnessed significant data corruption due to the underlying algorithm used in librsync. rsync and librsync are, in my experience, unsuitable for the purposes that they claim to be suitable for.
Thanks @cubiclesocial - I wasn't actually aware of that librsync issue, I've not experienced corruption myself, but it's very good to be aware of. On the topic, is there anything similar that you recommend for incremental file transfers?
I have tried unison in the past, but since it works in a similar way, I'd imagine it would also be susceptible to this issue, especially when dealing with large data sets. For backups specifically, possibly restic could be an option?
Possibly. For backups, I use my own 3rd generation software called Cloud Backup. I backup my servers via a Remoted API Server + Cloud Storage Server instance. Been running that combo for years with only the occasional hiccup with network connectivity that usually clears itself up by re-running the backup automatically until it succeeds. Whenever I've needed to retrieve something, it's there and ready for use and can be pulled back onto the system within a few seconds from the backup. The setup handles multiple GB of transfers daily across multiple systems. I even use Cloud Backup when I need to migrate between *NIX systems because it faithfully preserves timestamps, owner, group, privileges, symlinks, etc.
Cloud Backup, Cloud Storage Server, and Remoted API Server were written before I came up with "question-answer" CLI interfaces. As a result, initial setup is really awkward and overly complex. So I can't really recommend using what I use for that reason. I'll eventually get around to fixing that problem.
I also have a very large drive attached to a cheap-o mini PC that is firewalled onto its own VLAN that I pay for a single user license of Backblaze (approx $7/mo). I push backup data to the Cloud Storage Server instance running on that system a couple of different ways. The Backblaze client software then picks up everything on the drive and puts it in the cloud. Basically, this setup gives me unlimited online backup storage for all my computer systems (instead of paying for cloud storage for each system). The key to saving money with Backblaze is to dump everything onto one system with at least one large external attached drive over USB. In my case, by using Cloud Backup, Backblaze just gets compressed, encrypted data blobs but someone could go a lot simpler than my setup and just dump straight files onto a similar setup for cheap, "unlimited" cloud storage (unlimited = as much locally attached storage as can be afforded). I'm a penny pinching fiend. I wish Backblaze had a Linux client but I suspect they don't because they don't want people to abuse their system any more than it currently gets abused.
The DIY alternative to paying for online cloud storage is to setup a backup system with Cloud Storage Server at a friend's house and use Remoted API Server on any public facing VPS to allow their home IP to freely roam and also to not have to worry about router/firewall rules. Then point Cloud Backup at the running Remoted API Server. Once setup, the bonus with this approach is that restoring everything from the backup takes a fraction of the time it would take over the Internet (especially when restoring multi-TB of data): Drive to the friend's house, pick up the equipment, drive home, adjust the configs to point at the local network, restore everything locally, revert the configs, drive back, put the equipment back in place. Fully restored in mere hours instead of days or weeks. Buy the friend a pizza or coffee to celebrate.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
This is an interesting list. I'm not usually a fan of listicles (article + list = listicle) since they are rarely carefully curated and someone basically throws the list together to fulfill some quota. You obviously spent a great deal of time putting this together, including figuring out what tool they replace. Very nice.
Two observations/thoughts:
speedtest-cli is fine as long as you understand that the results might not be realistic. Ookla, the organization behind speedtest.net, may have the largest selection of donated speed test servers globally, but ISPs are fully aware of their existence. Some ISPs even go out of their way to lie to their customers such as remove bandwidth caps for connections to the known, public list of Ookla speed test servers so that you get the advertised speeds you are paying for in just that one specific use-case. Ookla also sells cobranded versions of their speed testing widget to the ISPs that cater to each ISP's whims/desires, which I think is pretty sus. That said, something that does speed testing is better than nothing. Speed testing tools are also useful for the datacenter. For example, DigitalOcean advertises a minimum 1Gps link speed but I've seen burst rates up to 3Gps even for their cheapest VPS servers.
From the current librsync GitHub repo:
The bolded portion is the important phrase that makes both librsync and anything built on it likely to introduce data corruption over time. There is no guarantee that the same checksum of a portion of a file means that the stored data is actually identical. I have personally witnessed significant data corruption due to the underlying algorithm used in librsync. rsync and librsync are, in my experience, unsuitable for the purposes that they claim to be suitable for.
Thanks @cubiclesocial - I wasn't actually aware of that librsync issue, I've not experienced corruption myself, but it's very good to be aware of. On the topic, is there anything similar that you recommend for incremental file transfers?
I have tried unison in the past, but since it works in a similar way, I'd imagine it would also be susceptible to this issue, especially when dealing with large data sets. For backups specifically, possibly restic could be an option?
Possibly. For backups, I use my own 3rd generation software called Cloud Backup. I backup my servers via a Remoted API Server + Cloud Storage Server instance. Been running that combo for years with only the occasional hiccup with network connectivity that usually clears itself up by re-running the backup automatically until it succeeds. Whenever I've needed to retrieve something, it's there and ready for use and can be pulled back onto the system within a few seconds from the backup. The setup handles multiple GB of transfers daily across multiple systems. I even use Cloud Backup when I need to migrate between *NIX systems because it faithfully preserves timestamps, owner, group, privileges, symlinks, etc.
Cloud Backup, Cloud Storage Server, and Remoted API Server were written before I came up with "question-answer" CLI interfaces. As a result, initial setup is really awkward and overly complex. So I can't really recommend using what I use for that reason. I'll eventually get around to fixing that problem.
I also have a very large drive attached to a cheap-o mini PC that is firewalled onto its own VLAN that I pay for a single user license of Backblaze (approx $7/mo). I push backup data to the Cloud Storage Server instance running on that system a couple of different ways. The Backblaze client software then picks up everything on the drive and puts it in the cloud. Basically, this setup gives me unlimited online backup storage for all my computer systems (instead of paying for cloud storage for each system). The key to saving money with Backblaze is to dump everything onto one system with at least one large external attached drive over USB. In my case, by using Cloud Backup, Backblaze just gets compressed, encrypted data blobs but someone could go a lot simpler than my setup and just dump straight files onto a similar setup for cheap, "unlimited" cloud storage (unlimited = as much locally attached storage as can be afforded). I'm a penny pinching fiend. I wish Backblaze had a Linux client but I suspect they don't because they don't want people to abuse their system any more than it currently gets abused.
The DIY alternative to paying for online cloud storage is to setup a backup system with Cloud Storage Server at a friend's house and use Remoted API Server on any public facing VPS to allow their home IP to freely roam and also to not have to worry about router/firewall rules. Then point Cloud Backup at the running Remoted API Server. Once setup, the bonus with this approach is that restoring everything from the backup takes a fraction of the time it would take over the Internet (especially when restoring multi-TB of data): Drive to the friend's house, pick up the equipment, drive home, adjust the configs to point at the local network, restore everything locally, revert the configs, drive back, put the equipment back in place. Fully restored in mere hours instead of days or weeks. Buy the friend a pizza or coffee to celebrate.