DEV Community

Cover image for Back Up Multiple Drives to Backblaze with Deduplication – Introducing b2-dedup
n0nag0n
n0nag0n

Posted on

Back Up Multiple Drives to Backblaze with Deduplication – Introducing b2-dedup

Backing up terabytes of data across multiple drives, NAS boxes, or even different computers? You want everything in one safe, offsite place without paying a fortune for redundant uploads or duplicate storage.

That's exactly why I built b2-dedup — a parallel, streaming deduplicating uploader tailored for Backblaze B2.

In this tutorial you'll learn:

  • Why Backblaze B2 is (in my opinion) the smartest choice for personal/large-scale backups vs. AWS S3, Azure Blob, etc.
  • How deduplication across drives saves you time, bandwidth, and money
  • Step-by-step setup and usage of b2-dedup

Why Backblaze B2 Beats S3, Azure Blob, and Most Alternatives (for Backup Use Cases)

Backblaze B2 is purpose-built for simplicity and low cost — especially for "set it and forget it" backup scenarios.

Here's a quick 2025–2026 comparison (based on published rates and common backup patterns):

  • Storage cost

    B2 → ~$6/TB/month (recently adjusted from $5/TB; still extremely competitive)

    AWS S3 Standard → ~$23/TB/month (first 50 TB tier)

    Azure Blob Hot → similar ballpark to S3 (~$18–$23/TB)

    → B2 is roughly 1/4 to 1/5 the price for always-hot, instantly accessible storage.

  • Egress (downloads)

    B2 → Free egress up to 3× your monthly stored average (and unlimited free to many CDNs/partners like Cloudflare, Fastly, etc.)

    S3/Azure → $0.08–$0.09/GB after free tier — restore 1 TB and you're looking at ~$80+ just to get your data back.

  • Other gotchas

    • No upload fees, delete penalties, minimum file sizes, or hidden API call charges that sting on large backups (B2 keeps Class A calls free; generous free tier on others).
    • S3-compatible API → works with rclone, restic, Veeam, etc.
    • No complex storage tiers/classes to accidentally get stuck in (unless you want Glacier/Archive for cold data).

For personal users, homelab hoarders, photographers/videographers, or small businesses doing offsite backups, B2 wins on predictable low cost + sane egress.

Shameless plug time (you knew it was coming 😄):

I've personally used Backblaze since ~2017 (after CrashPlan killed their unlimited home service). I've backed up tens of terabytes reliably with zero drama. If you're new to B2, feel free to use my referral link — we both get a little credit:

👉 Sign up for Backblaze here

What b2-dedup Does (and Why You Need It for Multi-Drive Backups)

When you back up multiple drives (e.g., main PC SSD, external HDDs, media NAS), you often have tons of duplicate files — same photos, same movies, same installers, same OS images across machines.

Standard tools (rclone, Duplicati, etc.) usually deduplicate within one backup job, but not across entirely separate sources.

b2-dedup fixes that:

  • Uses a local SQLite DB (~/b2_dedup.db) to remember SHA-256 hashes of every file it has ever seen.
  • When you point it at Drive #2, it skips anything already uploaded from Drive #1.
  • Parallel uploads (default 10 workers, tunable) + streaming chunked uploads → low memory, high speed.
  • Resumable — interrupted jobs pick up where they left off.
  • Scan-only / dry-run modes for safety.

Result: One B2 bucket, many "drive-name" prefixes (e.g., PC2025/, MediaNAS/, Laptop/) — but real storage usage is minimized because duplicates aren't re-uploaded.

Step-by-Step Tutorial: Install & Use b2-dedup

1. Prerequisites

  • Python 3.8+
  • A Backblaze B2 account + bucket created
  • B2 Application Key (KeyID + Application Key) — generate one with Read + Write access to your bucket

2. Install

git clone https://github.com/n0nag0n/b2-dedup.git
cd b2-dedup
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

3. Authenticate

Easiest way: Install the official b2 CLI and authorize it:

pip install b2
b2 account authorize   # follow prompts with your KeyID + App Key
Enter fullscreen mode Exit fullscreen mode

b2-dedup will automatically use those credentials.

(Alternatively, export env vars: B2_KEY_ID and B2_APPLICATION_KEY)

4. Recommended Workflow for Multiple Drives

First drive (baseline)

# Optional: just scan & hash everything first (no upload)
python b2_dedup.py /mnt/primary-drive --drive-name PrimaryPC --bucket my-backup-bucket-123 --scan-only

# Then do the real upload
python b2_dedup.py /mnt/primary-drive --drive-name PrimaryPC --bucket my-backup-bucket-123
Enter fullscreen mode Exit fullscreen mode

Second (or Nth) drive — duplicates are skipped!

python b2_dedup.py /mnt/media-drive --drive-name MediaNAS --bucket my-backup-bucket-123 --workers 20
Enter fullscreen mode Exit fullscreen mode

Pro tip: Use --dry-run first to preview:

python b2_dedup.py /mnt/media-drive --drive-name MediaNAS --bucket my-backup-bucket-123 --dry-run
Enter fullscreen mode Exit fullscreen mode

5. Useful Flags

  • --workers 25 — more parallelism if your internet/upload can handle it
  • --refresh-count — force re-count files (useful if source changed a lot)
  • --scan-only — build/populate the hash DB without touching B2

6. Where Files End Up in B2

  • Everything under one bucket
  • Prefix = --drive-name (e.g., PrimaryPC/Documents/report.docx)
  • Deduplication happens on content hash — identical files get stored only once, regardless of path/name.

Tips & Gotchas

  • The DB lives at ~/b2_dedup.db — back it up! (it's tiny, but losing it means re-hashing everything)
  • If you rename drives or reorganize, just use a new --drive-name
  • For very large initial scans, start with --scan-only overnight
  • Combine with rclone/Borg/restic for versioning if you need it — b2-dedup is purely for initial / incremental deduped uploads

Wrap-Up

With b2-dedup + Backblaze B2 you get:

  • One cheap, durable offsite location
  • Cross-drive deduplication to slash upload time & storage bills
  • Parallel, resumable, low-memory operation

I've been running similar setups for years — it's rock-solid for hoarding photos, videos, ISOs, and irreplaceable documents.

Give it a spin, star the repo if you find it useful → https://github.com/n0nag0n/b2-dedup

Questions or want to contribute? Open an issue or PR!

Happy (deduplicated) backing up! 🚀

Top comments (0)