Backing up terabytes of data across multiple drives, NAS boxes, or even different computers? You want everything in one safe, offsite place without paying a fortune for redundant uploads or duplicate storage.
That's exactly why I built b2-dedup — a parallel, streaming deduplicating uploader tailored for Backblaze B2.
In this tutorial you'll learn:
- Why Backblaze B2 is (in my opinion) the smartest choice for personal/large-scale backups vs. AWS S3, Azure Blob, etc.
- How deduplication across drives saves you time, bandwidth, and money
- Step-by-step setup and usage of b2-dedup
Why Backblaze B2 Beats S3, Azure Blob, and Most Alternatives (for Backup Use Cases)
Backblaze B2 is purpose-built for simplicity and low cost — especially for "set it and forget it" backup scenarios.
Here's a quick 2025–2026 comparison (based on published rates and common backup patterns):
Storage cost
B2 → ~$6/TB/month (recently adjusted from $5/TB; still extremely competitive)
AWS S3 Standard → ~$23/TB/month (first 50 TB tier)
Azure Blob Hot → similar ballpark to S3 (~$18–$23/TB)
→ B2 is roughly 1/4 to 1/5 the price for always-hot, instantly accessible storage.Egress (downloads)
B2 → Free egress up to 3× your monthly stored average (and unlimited free to many CDNs/partners like Cloudflare, Fastly, etc.)
S3/Azure → $0.08–$0.09/GB after free tier — restore 1 TB and you're looking at ~$80+ just to get your data back.-
Other gotchas
- No upload fees, delete penalties, minimum file sizes, or hidden API call charges that sting on large backups (B2 keeps Class A calls free; generous free tier on others).
- S3-compatible API → works with rclone, restic, Veeam, etc.
- No complex storage tiers/classes to accidentally get stuck in (unless you want Glacier/Archive for cold data).
For personal users, homelab hoarders, photographers/videographers, or small businesses doing offsite backups, B2 wins on predictable low cost + sane egress.
Shameless plug time (you knew it was coming 😄):
I've personally used Backblaze since ~2017 (after CrashPlan killed their unlimited home service). I've backed up tens of terabytes reliably with zero drama. If you're new to B2, feel free to use my referral link — we both get a little credit:
👉 Sign up for Backblaze here
What b2-dedup Does (and Why You Need It for Multi-Drive Backups)
When you back up multiple drives (e.g., main PC SSD, external HDDs, media NAS), you often have tons of duplicate files — same photos, same movies, same installers, same OS images across machines.
Standard tools (rclone, Duplicati, etc.) usually deduplicate within one backup job, but not across entirely separate sources.
b2-dedup fixes that:
- Uses a local SQLite DB (
~/b2_dedup.db) to remember SHA-256 hashes of every file it has ever seen. - When you point it at Drive #2, it skips anything already uploaded from Drive #1.
- Parallel uploads (default 10 workers, tunable) + streaming chunked uploads → low memory, high speed.
- Resumable — interrupted jobs pick up where they left off.
- Scan-only / dry-run modes for safety.
Result: One B2 bucket, many "drive-name" prefixes (e.g., PC2025/, MediaNAS/, Laptop/) — but real storage usage is minimized because duplicates aren't re-uploaded.
Step-by-Step Tutorial: Install & Use b2-dedup
1. Prerequisites
- Python 3.8+
- A Backblaze B2 account + bucket created
- B2 Application Key (KeyID + Application Key) — generate one with Read + Write access to your bucket
2. Install
git clone https://github.com/n0nag0n/b2-dedup.git
cd b2-dedup
pip install -r requirements.txt
3. Authenticate
Easiest way: Install the official b2 CLI and authorize it:
pip install b2
b2 account authorize # follow prompts with your KeyID + App Key
b2-dedup will automatically use those credentials.
(Alternatively, export env vars: B2_KEY_ID and B2_APPLICATION_KEY)
4. Recommended Workflow for Multiple Drives
First drive (baseline)
# Optional: just scan & hash everything first (no upload)
python b2_dedup.py /mnt/primary-drive --drive-name PrimaryPC --bucket my-backup-bucket-123 --scan-only
# Then do the real upload
python b2_dedup.py /mnt/primary-drive --drive-name PrimaryPC --bucket my-backup-bucket-123
Second (or Nth) drive — duplicates are skipped!
python b2_dedup.py /mnt/media-drive --drive-name MediaNAS --bucket my-backup-bucket-123 --workers 20
Pro tip: Use --dry-run first to preview:
python b2_dedup.py /mnt/media-drive --drive-name MediaNAS --bucket my-backup-bucket-123 --dry-run
5. Useful Flags
-
--workers 25— more parallelism if your internet/upload can handle it -
--refresh-count— force re-count files (useful if source changed a lot) -
--scan-only— build/populate the hash DB without touching B2
6. Where Files End Up in B2
- Everything under one bucket
- Prefix =
--drive-name(e.g.,PrimaryPC/Documents/report.docx) - Deduplication happens on content hash — identical files get stored only once, regardless of path/name.
Tips & Gotchas
- The DB lives at
~/b2_dedup.db— back it up! (it's tiny, but losing it means re-hashing everything) - If you rename drives or reorganize, just use a new
--drive-name - For very large initial scans, start with
--scan-onlyovernight - Combine with rclone/Borg/restic for versioning if you need it — b2-dedup is purely for initial / incremental deduped uploads
Wrap-Up
With b2-dedup + Backblaze B2 you get:
- One cheap, durable offsite location
- Cross-drive deduplication to slash upload time & storage bills
- Parallel, resumable, low-memory operation
I've been running similar setups for years — it's rock-solid for hoarding photos, videos, ISOs, and irreplaceable documents.
Give it a spin, star the repo if you find it useful → https://github.com/n0nag0n/b2-dedup
Questions or want to contribute? Open an issue or PR!
Happy (deduplicated) backing up! 🚀
Top comments (0)