DEV Community

Cover image for On a Quest for the Fastest P2P File Transfer CLI: Thruflux (Open Alpha)
samsungplay
samsungplay

Posted on

On a Quest for the Fastest P2P File Transfer CLI: Thruflux (Open Alpha)

Click here for Github repo

Demo:

My goal

One day, I awoke with this single ambition burning within my heart:
I wanted to create the fastest file transfer tool in existence. Having had some prior experience in self-led web development projects, I instantly turned to the web to find out what existing browser-based solutions are out there.

The problem with existing file transfer tools

First off, what do people use to transfer large files across the internet? I'm pretty sure most people use popular commercial tools such as Dropbox, Google Drive, or other messaging mediums such as Discord and Email. However, for developers, and surprisingly for many non-developer individuals as well, such tools are simply not sufficient. We all know their limits - popular client solutions not only impose limit to the size of file you can transfer, but also thrive on "client-server" model where an upload must first complete before anyone else can download the files. And frankly speaking, while this is excellent for securely storing your files on the cloud, it is quite a terrible experience for one-off file sharing. Furthermore, because files have to be stored in the server, there are always innate security concerns.

Exploring the P2P landscape

Then I had turned to the world of p2p file transfers. Some popular p2p file transfer tools included Blip, Magic-wormhole, Scp, Rsync, and Croc. Unfortunately, none of these tools satisfied me. First off, Blip is a GUI application - installing a GUI application was a friction that I did not want to deal with, and something I would not want to ask for the receiving end as well. Furthermore, it was close-source and proprietary. Magic-wormhole and croc are great CLI based tools, but they lacked first-class support for multi-file transfers. To give you a picture, magic-wormhole compresses the entire folder before transferring, and on receiving end it needs to decompress it all over again. For croc, sending multiple files required heavy amounts of hashing prior to sending them, not to mention it was simply not optimized for sending multiple files with high speed in my experience. Finally, there were the good old scp and rsync - while fast, they were meant to be used under assumptions that connections are completely open, which is usually not the case in real world environments.

The key realization

But then I discovered one thing they all have in common: they use TCP connections to transfer files. Recently, I've had great interest in QUIC and UDP protocol in general, so I knew I had to make use of this if I had any chance to beat these tools. I was confident that if well-engineered, QUIC can match if not outperform TCP based connections. This was the core focus of my tool : leverage QUIC to build the fastest CLI transfer tool that is not only dead simple to use but also incredibly fast. I had these seven requirements in mind:

Design goals

1). It must use the QUIC protocol and ICE exchange to support transfer between random two peers with zero setup. I've heard traditionally udp hole punching is more likely to succeed over tcp hole punching. Using QUIC would therefore benefit from the higher success of direct connectivity establishment, and usage of advanced congestion control algorithm like BBR.

2). It must have first-class support for multi-file transfers. Transferring multiple files should be as fast, if not faster, than a single file of same size.

3). It must support multiple receivers. Why not? That means you can just open one session, and receivers can trickle in at anytime to download your files. No need to restart sending and scanning your files all over again everytime.

4). It must be cross-platform across Windows, Linux, and Mac.

5). It must have automatic resume support. And it should be FAST. This would be non-negotiable for large file transfers that inherently takes long time.

6). It must be open source.

7). Finally, above all : SPEED over anything else. I didn't want to go too overboard with other aspects of file transfer (i.e. security) beyond viable minimum. This should be the core selling point of my tool.

So then, it was time to choose a language. While Rust and Go were strong candidates, I decided to use C++. I believed in the mature ecosystem of C++ quic / ice libraries, low-level configurability, and zero garbage collector churns to truly juice out the performance.

After several painful weeks working day and night coding and debugging in C++, I finally managed to come up with a working implementation. Here are some interesting observations I learned throughout the process:

1). Apparently, I thought using multiple threads with multiple QUIC connections and streams was the correct way to achieve maximum throughput. I thought more parallel connections = better. But turns out in C++ the libraries and language are so efficient that I was able to saturate the network with only single thread and connection. This helped me greatly simplify the app logic.

2). QUIC scales heavily with CPU power and core count of the host machine. While QUIC performed worse on low-end devices, for a reasonable CPU released in last 10 years and with at least 2 cores, QUIC outperformed TCP.

3). BBR congestion algorithm made huge difference in terms of throughput in my implementation, almost showing ~x4 throughput compared to CUBIC. Also, the UDP buffer size of the OS mattered a lot. Transfers became nearly ~x1.3 faster given plenty of UDP buffer size of at least 8 MiB.

And finally, the moment of truth came.. to benchmark my tool against existing ones. Here are the results:

Benchmarks


Environment

  • Vultur dedicated CPU 2vCPU(AMD EPYC Genoa) 4GB RAM, NVMe SSD, Ubuntu 22.04

  • Tested over public internet, where sender is located in Chicago and receiver is located in New Jersey.

  • Method: median of 3 runs, and all times are end-to-end wall clock times including setup / closing phase, not just the pure transfer time.

Accounts for only the "receiving phase"

Benchmark commands (wall-clock measured with time)

# scp — single file
time scp test_10GiB.bin root@207.246.86.142:/root/

# scp — many files
time scp -r benchmark_files root@207.246.86.142:/root/

# rsync — single file
time rsync -a --info=progress2 --no-compress --whole-file --inplace \
  test_10GiB.bin root@207.246.86.142:/root/

# rsync — many files
time rsync -a --info=progress2 --no-compress --whole-file --inplace \
  benchmark_files root@207.246.86.142:/root/

# croc — single file
time CROC_SECRET="code" croc --yes

# croc — many files
time CROC_SECRET="code" croc --yes

# Thruflux — direct (single + many files)
time ./thru join --overwrite <CODE>

# Thruflux — forced TURN relay
time ./thru join --overwrite --force-turn <CODE>

# wormhole — single / many files
time wormhole receive <CODE> --accept-file
Enter fullscreen mode Exit fullscreen mode

Summary

Tool Transport Random Remote Peers Multi-Receiver 10 GiB File 1000×10 MiB
thruflux(direct) QUIC 1m34s 1m31s
rsync TCP (SSH) 1m43s 1m39s
scp TCP (SSH) 1m41s 2m20s
croc TCP relay 2m42s 9m22s
wormhole TCP relay 2m45s ❌ stalled ~8.8 GiB around 3m

..and it seemed very promising! Even with 6 seconds of initial p2p handshake phase (which scp and rsync doesn't have) my tool was able to beat scp and rsync in terms of wall clock time. Other than scp/rsync, compared to existing p2p tools, my tool appeared to be clearly faster - in fact, for 1000 files transfer, it had shown dominant performance that cannot be ruled out as statistical anomaly. Plus, croc spends time hashing files while wormhole spends lot of time compressing everything for multi-file sends. Since my tool just skips all that extra work, the difference in actual wall clock time was even bigger. But what I really wanted to highlight was how its performance on transferring 1000 or a single file of same size did not change at all.

Tradeoffs and limitations

So.. it seemed all nice. What were the catches?

1). CPU dependent : My tool required higher CPU power compared to other tools. On devices with low end cpu and only 1 core, it performed marginally worse than rsync and scp.

2). TURN relay fallback: While I included default turn relays in case direct connectivity cannot be established, my self-hosted turn server is not that powerful (also in a pretty bad location) and it therefore showed worse results compared to other tools. So it will be much slower for networks with symmetric NATs.

3). UDP quirks: I found out that some restrictive networks completely block outbound UDP sometimes so not even TURN would work in this case. Basically QUIC is infeasible in this situation.

4). Longer connection phase: Since I'm using a full ICE exchange protocol, initial connection phase is slower than other tools for sure. But I think this is something I can improve upon if I use trickle ICE instead of gather-all like right now.

5). Lack of verification: For speed, my tool trusts the QUIC protocol's network level integrity (which is stronger than TCP by nature). However, there can be rare edge cases such as disk corruption that may corrupt the file. But this is arguably quite rare that I decided to skip for now.

6). Bloated join code: Unlike croc/wormhole, I do not use PAKE but I rely on WSS TLS encryption and QUIC's innate AED encryption in transit. Therefore, join code must have as high entropy as possible to compensate for the security. I understand some may not love the current join code system, but I thought hopefully it wouldn't matter too much because we all copy paste anyways.

Yet, regardless of the tradeoffs, I believe Thruflux shows real promise - especially in multi-file transfer scenarios where traditional tools often struggle. There's still plenty of room to improve, but the early results have been really encouraging for me.

Open alpha release

After more internal testing, I’ve decided to open the project to the public and move onto its alpha stage. To give a brief note about the project status, all core functionality is implemented and basic testing has been done, but please expect rough edges. I unfortunately find that cross-platform and networking software in particular tends to reveal unexpected bugs in the wild.

Wrapping up, if you decide to try Thruflux (I hope you do!), I would genuinely appreciate your feedback. Bug reports, performance comparisons, edge cases, and constructive criticism are all incredibly valuable at this stage. After all, without them, Thruflux will never be able to move out from its alpha stage.

Quickstart 🚀

Install

Linux Kernel 3.10+ / glibc 2.17+ (Ubuntu, Debian, CentOS, etc.)

curl -fsSL https://raw.githubusercontent.com/samsungplay/Thruflux/refs/heads/main/install_linux.sh | bash
Enter fullscreen mode Exit fullscreen mode

Mac 11.0+ (Intel & Apple Silicon)

curl -fsSL https://raw.githubusercontent.com/samsungplay/Thruflux/refs/heads/main/install_macos.sh | bash
Enter fullscreen mode Exit fullscreen mode

Windows 10+ (10+ Recommended, technically still could work on Windows 7/8)

iwr -useb https://raw.githubusercontent.com/samsungplay/Thruflux/refs/heads/main/install_windows.ps1 | iex
Enter fullscreen mode Exit fullscreen mode

Use

# host files
thru host ./photos ./videos

# share the join code with multiple peers
thru join ABCDEFGH --out ./downloads
Enter fullscreen mode Exit fullscreen mode

Finally, if you would like to know more about the project, you may examine the repo, I have some detailed README ready there :) :

https://github.com/samsungplay/Thruflux

Top comments (0)