padaki-pavan

Posted on Mar 25, 2020

what software based backup solutions do you use?

#discuss #devops #ask #askdev

I'm currently using restic to backup system images to backblaze, and rsync to create those images. what do you use for work and home systems as your goto backup tools?

Top comments (10)

Austin S. Hemmelgarn • Mar 25 '20

My standard backup stack consists of borg for the actual backups, and then rclone to copy the backup repositories out to Backblaze.

Specific advantages include:

Borg uses a repository-style structure with a cap on file sizes, which makes incremental transfers/replication much simpler than other tools with otherwise similar feature sets (for example, ZPAQ).
Borg has good support for a variety of compression algorithms. I personally use really aggressive zstd compression on my backups as it runs faster than XZ for my use cases but gets similarly good ratios.
Borg has integrated encryption support that's trivial to work with. This is really a baseline requirement IMO for a backup tool. If it can't do transparent encryption, I won't even consider it.
Borg does inline deduplication with a variable block size. It's a bit slow, but generally does a remarkably good job of cutting down on overall archive size. This also extends to handling of incremental backups. Internally, an 'archive' just references a bunch of data chunks, and chunks get garbage collected when the last reference is removed.
Rclone works very well with Borg repositories and also gives me the option of changing my storage backend to any of a huge number of options.
The killer feature here for me is that you can mount borg archives (or entire repositories) and browse through backups through the mount-point. This is really big IMO, because it lets you easily pull out individual files.

padaki-pavan • Mar 26 '20

How do u think restic compares with borg. I haven't used borg personally, but some features are similar with restic.

Austin S. Hemmelgarn • Mar 26 '20 • Edited

I've dug a bit deeper, and there are three other major differences I've found, namely:

Restic is multi-threaded, borg is not. This translates to restic being extremely fast in comparison to borg, but borg having less impact on average on CPU usage while running. This limitation in borg is actually a direct consequence of the next point.
Borg does actual deduplication, while restic only does classic incremental backups. With restic, you store a copy of every file, but the files are reference counted so that each version of a file only gets stored once. Borg, however, operates on blocks, not files, and deduplicates within individual backups. So if you have a dozen copies of the same data in your backup, restic stores each copy, but borg only stores the first and makes all the others references to that. The main benefit of this is that borg produces much smaller backups when you have lots of duplicate data and actually does more space efficient incremental backups (because it only stores what actually changed, not the whole changed file).
Borg supports compression, while Restic seemingly doe snot (and doesn't handle sparse files very well either). This too has a huge impact on space efficiency, and may explain why restic is lightning fast on my systems when compared to borg.

That second and third difference are going to keep me using borg, since they have a huge impact on backup sizes for some of my systems. As a very simple example, the two dozen plus VMs I use for testing take up:

656GB of apparent space
81GB of actual space on disk (the disk images are sparse and I'm using transparent compression on the backing filesystem)
28GB for four backups using borg (after compression and deduplication)
662GB of space for a single backup using restic

Note, however, that restic finishes an initial backup in about 30 minutes for this on the system in question, but borg takes almost 6 hours for an initial backup (though incrementals with borg for this complete in about 1-2 hours depending on how much changed over the week).

padaki-pavan • Mar 26 '20

Thanks for the comparison. I'll definitely give brog a try, although I must say restic is doing the job just fine for me so far.

Austin S. Hemmelgarn • Mar 26 '20

For a 'normal' desktop use case, and even a decent number of server use cases, I'd actually expect restic to do just fine despite the lack of compression and deduplication. Unless you are dealing with a lot of duplicate or easily compressible data or have certain very specific write patterns, it's not likely that you're missing out on much space savings.

It's when you get into situations where you have lots of duplicated data, or lots of data that compresses very well that you'll be most likely to see a difference. In my case, most of my usage fits that, so using borg makes sense for me. Were that not the case, I actually probably would use restic instead given how fast it is.

@RubenKelevra • Jan 27 '21

Austin I have to correct you:

Restic does indeed do deduplication on blocklevel. It uses a rolling hash algorithm called rabin as a chunker.

In short, a rolling hash algorithm reacts to patterns within the file and cuts it. If two files have the same patterns there's a high chance to have the cuts at the same positions, giving it the ability to deduplicate files which's data is not aligned to any specific block size.

So if you for example have multiple VM images to backup from multiple machines and the data is mixed up in them with like 4K block size on one and 512-byte block size within the VM in the other one, rabin can still identify the similar streams of data and deduplicate the redundancies.

If you compare this for example to ZFS which has a variable size of blocks, but they go up to 128 K and can be deduplicated. But when your data is slightly misaligned, like with a new version of a file that moved the data or when VM images don't align well with the block sizes they tend to have zero deduplication.

The speed restic processes my system is pretty impressive. I backup only certain parts of the system with it, which are around 40 GB and calculates the diff in just 85 seconds:

scan finished in 85.643s: 733157 files, 40.726 GiB

Files:         577 new,  4757 changed, 727820 unmodified
Dirs:          122 new,   611 changed, 110675 unmodified
Data Blobs:   1614 new
Tree Blobs:    711 new
Added to the repo: 395.722 MiB

You can see, a daily snapshot weights only 400 MB in this case :)

Austin S. Hemmelgarn • Mar 26 '20

Unfortunately, I've never actually used restic.

Major differences I can see just from a cursory look though:

Borg is written in Python, Restic is written in Go.
Borg explicitly allows for the option of an unencrypted repository, Restic does not appear to.
Borg doesn't support Windows, Restic does.
Borg gives you the option to exert some fine-grained control over the chunking/deduplication process, Restic doesn't appear to.

Borg is derived ultimately from Attic. It looks like Restic has similar roots, but went a different way. I'll have to look a bit deeper myself at it, though unless it can mount backups using FUSE like Borg can I probably won't switch.

Ghost • Mar 25 '20

I only really care about my code, docs and configurations of my Linux system. I use borg, to backup my /home and /etc with a daily cron and keep also 4 sundays and 4 first days of the month, that in an external drive connected to my PC (I don't really care too much about that), what I really care about is all in git repos, etc/, my ~/.config, docs and code; those are of course local, in a external drive, in a USB thumbdrive chained to my pants and in a Raspberry Pi. And I mail to myself to two different email providers files that are very important (encrypted locally of course).

Darshan kumar • Mar 25 '20

This is cool. I think so I am just relying on cloud and my external hard drive.

yokotobe • Mar 26 '20

Here you are backuppc.sourceforge.net/