Discussion on: what software based backup solutions do you use?

View post

Replies for: I've dug a bit deeper, and there are three other major differences I've found, namely: Restic is multi-threaded, borg is not. This translates to ...

Thanks for the comparison. I'll definitely give brog a try, although I must say restic is doing the job just fine for me so far.

Austin S. Hemmelgarn • Mar 26 '20

For a 'normal' desktop use case, and even a decent number of server use cases, I'd actually expect restic to do just fine despite the lack of compression and deduplication. Unless you are dealing with a lot of duplicate or easily compressible data or have certain very specific write patterns, it's not likely that you're missing out on much space savings.

It's when you get into situations where you have lots of duplicated data, or lots of data that compresses very well that you'll be most likely to see a difference. In my case, most of my usage fits that, so using borg makes sense for me. Were that not the case, I actually probably would use restic instead given how fast it is.

@RubenKelevra • Jan 27 '21

Austin I have to correct you:

Restic does indeed do deduplication on blocklevel. It uses a rolling hash algorithm called rabin as a chunker.

In short, a rolling hash algorithm reacts to patterns within the file and cuts it. If two files have the same patterns there's a high chance to have the cuts at the same positions, giving it the ability to deduplicate files which's data is not aligned to any specific block size.

So if you for example have multiple VM images to backup from multiple machines and the data is mixed up in them with like 4K block size on one and 512-byte block size within the VM in the other one, rabin can still identify the similar streams of data and deduplicate the redundancies.

If you compare this for example to ZFS which has a variable size of blocks, but they go up to 128 K and can be deduplicated. But when your data is slightly misaligned, like with a new version of a file that moved the data or when VM images don't align well with the block sizes they tend to have zero deduplication.

The speed restic processes my system is pretty impressive. I backup only certain parts of the system with it, which are around 40 GB and calculates the diff in just 85 seconds:

scan finished in 85.643s: 733157 files, 40.726 GiB

Files:         577 new,  4757 changed, 727820 unmodified
Dirs:          122 new,   611 changed, 110675 unmodified
Data Blobs:   1614 new
Tree Blobs:    711 new
Added to the repo: 395.722 MiB

You can see, a daily snapshot weights only 400 MB in this case :)