🎥 Video demo:
Introduction
A few years ago, I was working on a system that required synchronizing very large datasets — sometimes close to 1 TB — across several servers belonging to different companies.
Some servers were in the same building, others were remote, some were behind locked-down firewalls, and in many cases I had:
- no VPN,
- no direct link,
- no control over the remote infra,
- and machines that didn’t even know each other existed.
To move initial datasets, I relied on traditional transfer tools.
But the real problem appeared after that first copy:
How do you verify that datasets across multiple locations are fully identical, and resynchronize only the missing deltas — especially after an interrupted or incomplete transfer?
Double-checking terabytes manually wasn’t an option.
Running massive checksums remotely was slow and error-prone.
And multi-endpoint scenarios (A ↔ B ↔ C) made it exponentially worse.
This pain eventually led me to prototype a custom sync engine…
and that prototype turned into ByteSync, an open-source, on-demand file synchronization tool.
This article is the story of that journey — the architecture, the challenges, the strange bugs, and the “aha” moments.
Why on-demand sync still matters
Continuous sync tools are amazing — Syncthing is a work of art.
But continuous sync wasn’t compatible with the environments I worked in.
When you synchronize across companies or infrastructures you don't manage, you often have:
- strict maintenance windows
- servers that are offline most of the time
- compliance rules against background daemons
- sensitive data that must move only at specific times
- endpoints that can’t stay permanently connected
So syncing needed to happen only when everyone explicitly agreed on the time slot.
On-demand sync wasn’t a preference — it was a requirement.
It let me run comparisons, verify integrity, and apply deltas exactly when it was permitted.
This shaped almost every architectural decision that came later.
Challenge #1 — Picking a delta algorithm that works everywhere
I wanted block-level deltas.
Full file transfers would kill the purpose of multi-site sync.
Naturally, rsync came to mind.
Then I discovered FastRSyncNet — a .NET port inspired by rsync’s signature and delta algorithm.
It gave me:
- rolling checksums
- block signatures
- efficient delta construction
- rsync-like behaviour, but portable inside a modern .NET app
ByteSync became technically very close to rsync’s internal diffing engine, with higher-level orchestration on top.
Challenge #2 — Merging LAN and WAN connections into a single model
This was the hardest architectural problem.
I had prior experience with SignalR, so I used it for the realtime communication layer.
On top of that:
- Azure Functions
- Azure Redis
- Azure Blob Storage
…initially formed the relay for remote sync operations.
But ByteSync originally had two separate modes:
- Local mode (LAN only)
- Cloud mode (WAN only)
I couldn’t find a clean way to bridge them.
Users had to pick a mode upfront, which didn’t match real-world workflows.
The breakthrough came with the concept of DataNodes.
A DataNode abstracts a sync participant — local or remote — so the orchestration doesn’t care about the distance between nodes.
This allowed:
âś” direct LAN connections when devices can see each other
✔ encrypted relayed connections when they can’t
âś” both in the same sync session
Suddenly, we had hybrid sessions.
And it changed everything.
Challenge #3 — Azure Blob Storage and the “egress bill from hell”
Originally, remote exchanges used Azure Blob Storage.
It worked.
But then I ran the cost estimate.
And… no.
Azure egress fees were far too high for multi-site sync.
Not just high — non-viable.
That pushed me to migrate the relay layer to Cloudflare R2:
- no egress fees
- great performance
- straightforward API
- predictable costs
- perfect for temporary encrypted blobs
Switching to R2 turned out to be one of the best decisions of the project.
Challenge #4 — The first fully working remote sync (the “aha” moment)
I remember the first time a full remote sync completed successfully.
It wasn’t just correct — it was fast, considering the conditions.
Encrypted cloud relay.
Rolling checksums.
Delta blocks.
Multi-endpoint convergence.
Everything clicked that day.
That was my “OK yes — this is worth continuing” moment.
Challenge #5 — The strangest bug I’ve ever hit
This one cost me three days of my life.
Every time I added a file or folder on my machine…
the connection dropped.
Every.
Single.
Time.
I debugged:
- SignalR
- Azure Functions
- caching
- threading
- reconnection logic
- manifests
- cancellation tokens
- TCP vs WebSockets
Nothing made sense.
The culprit?
Opening Windows Explorer caused a 20–30 second network hang… but only on my machine.
Not on the servers.
Not in production.
Just… my Windows environment being haunted.
Once I understood it, everything made sense — and nothing made sense at the same time.
Challenge #6 — Validating real-world use cases
The architecture later proved itself in scenarios like:
- multi-site synchronization across organizations
- multi-folder split comparisons
- integrity verification after partial transfers
- deduplication across several endpoints
- syncing nodes that were sometimes LAN, sometimes WAN, sometimes both
- combining several independent datasets into a unified comparison
The more complex the scenario, the more the architecture made sense.
Conclusion
I didn’t initially plan to build a synchronization tool.
I just needed a way to reliably synchronize large datasets across machines that couldn’t talk to each other.
But challenge after challenge, the project grew into something more robust and more general than I expected.
This article isn’t meant as a product pitch — just an honest breakdown of the problems I faced and how I solved them.
If you're curious about the tool behind these experiments, ByteSync is open-source:
- GitHub: https://github.com/POW-Software/ByteSync
- Website: https://www.bytesyncapp.com
Feedback is always welcome.
Thanks for reading.




Top comments (1)
Thank you for reading! I’m happy to answer any questions about the architecture or the implementation details.