DEV Community

Paul Fresquet
Paul Fresquet

Posted on

How I built a hybrid LAN/WAN file sync engine without VPN (and why on-demand sync still matters)

🎥 Video demo:


Introduction

A few years ago, I was working on a system that required synchronizing very large datasets — sometimes close to 1 TB — across several servers belonging to different companies.

Some servers were in the same building, others were remote, some were behind locked-down firewalls, and in many cases I had:

  • no VPN,
  • no direct link,
  • no control over the remote infra,
  • and machines that didn’t even know each other existed.

To move initial datasets, I relied on traditional transfer tools.

But the real problem appeared after that first copy:

How do you verify that datasets across multiple locations are fully identical, and resynchronize only the missing deltas — especially after an interrupted or incomplete transfer?

Double-checking terabytes manually wasn’t an option.

Running massive checksums remotely was slow and error-prone.

And multi-endpoint scenarios (A ↔ B ↔ C) made it exponentially worse.

This pain eventually led me to prototype a custom sync engine…

and that prototype turned into ByteSync, an open-source, on-demand file synchronization tool.

This article is the story of that journey — the architecture, the challenges, the strange bugs, and the “aha” moments.


Why on-demand sync still matters

Continuous sync tools are amazing — Syncthing is a work of art.

But continuous sync wasn’t compatible with the environments I worked in.

When you synchronize across companies or infrastructures you don't manage, you often have:

  • strict maintenance windows
  • servers that are offline most of the time
  • compliance rules against background daemons
  • sensitive data that must move only at specific times
  • endpoints that can’t stay permanently connected

So syncing needed to happen only when everyone explicitly agreed on the time slot.

On-demand sync wasn’t a preference — it was a requirement.

It let me run comparisons, verify integrity, and apply deltas exactly when it was permitted.

This shaped almost every architectural decision that came later.


Challenge #1 — Picking a delta algorithm that works everywhere

I wanted block-level deltas.

Full file transfers would kill the purpose of multi-site sync.

Naturally, rsync came to mind.

Then I discovered FastRSyncNet — a .NET port inspired by rsync’s signature and delta algorithm.

It gave me:

  • rolling checksums
  • block signatures
  • efficient delta construction
  • rsync-like behaviour, but portable inside a modern .NET app

ByteSync became technically very close to rsync’s internal diffing engine, with higher-level orchestration on top.


Challenge #2 — Merging LAN and WAN connections into a single model

This was the hardest architectural problem.

I had prior experience with SignalR, so I used it for the realtime communication layer.

On top of that:

  • Azure Functions
  • Azure Redis
  • Azure Blob Storage

…initially formed the relay for remote sync operations.

But ByteSync originally had two separate modes:

  • Local mode (LAN only)
  • Cloud mode (WAN only)

I couldn’t find a clean way to bridge them.

Users had to pick a mode upfront, which didn’t match real-world workflows.

The breakthrough came with the concept of DataNodes.

A DataNode abstracts a sync participant — local or remote — so the orchestration doesn’t care about the distance between nodes.

This allowed:

âś” direct LAN connections when devices can see each other

✔ encrypted relayed connections when they can’t

âś” both in the same sync session

Suddenly, we had hybrid sessions.

And it changed everything.


Challenge #3 — Azure Blob Storage and the “egress bill from hell”

Originally, remote exchanges used Azure Blob Storage.

It worked.

But then I ran the cost estimate.

And… no.

Azure egress fees were far too high for multi-site sync.

Not just high — non-viable.

That pushed me to migrate the relay layer to Cloudflare R2:

  • no egress fees
  • great performance
  • straightforward API
  • predictable costs
  • perfect for temporary encrypted blobs

Switching to R2 turned out to be one of the best decisions of the project.


Challenge #4 — The first fully working remote sync (the “aha” moment)

I remember the first time a full remote sync completed successfully.

It wasn’t just correct — it was fast, considering the conditions.

Encrypted cloud relay.

Rolling checksums.

Delta blocks.

Multi-endpoint convergence.

Everything clicked that day.

That was my “OK yes — this is worth continuing” moment.


Challenge #5 — The strangest bug I’ve ever hit

This one cost me three days of my life.

Every time I added a file or folder on my machine…

the connection dropped.

Every.

Single.

Time.

I debugged:

  • SignalR
  • Azure Functions
  • caching
  • threading
  • reconnection logic
  • manifests
  • cancellation tokens
  • TCP vs WebSockets

Nothing made sense.

The culprit?

Opening Windows Explorer caused a 20–30 second network hang… but only on my machine.

Not on the servers.

Not in production.

Just… my Windows environment being haunted.

Once I understood it, everything made sense — and nothing made sense at the same time.


Challenge #6 — Validating real-world use cases

The architecture later proved itself in scenarios like:

  • multi-site synchronization across organizations
  • multi-folder split comparisons
  • integrity verification after partial transfers
  • deduplication across several endpoints
  • syncing nodes that were sometimes LAN, sometimes WAN, sometimes both
  • combining several independent datasets into a unified comparison

The more complex the scenario, the more the architecture made sense.


Conclusion

I didn’t initially plan to build a synchronization tool.

I just needed a way to reliably synchronize large datasets across machines that couldn’t talk to each other.

But challenge after challenge, the project grew into something more robust and more general than I expected.

This article isn’t meant as a product pitch — just an honest breakdown of the problems I faced and how I solved them.

If you're curious about the tool behind these experiments, ByteSync is open-source:

Feedback is always welcome.

Thanks for reading.

Top comments (1)

Collapse
 
pfresquet profile image
Paul Fresquet

Thank you for reading! I’m happy to answer any questions about the architecture or the implementation details.