samsungplay

Posted on Feb 21 • Edited on Mar 8

On a Quest for the Fastest P2P File Transfer Tool: Thruflux (Open Beta)

#p2p #networking #productivity #showdev

Update: The Desktop UI version has been released, check out the repo for more information!
Click here for Github repo

Demo:

Desktop-

CLI-

My goal

One day, I awoke with this single ambition burning within my heart:
I wanted to create the fastest file transfer tool in existence. Having had some prior experience in self-led web development projects, I instantly turned to the web to find out what existing browser-based solutions are out there.

The problem with existing file transfer tools

First off, what do people use to transfer large files across the internet? I'm pretty sure most people use popular commercial tools such as Dropbox, Google Drive, or other messaging mediums such as Discord and Email. However, for developers, and surprisingly for many non-developer individuals as well, such tools are simply not sufficient. We all know their limits - popular client solutions not only impose limit to the size of file you can transfer, but also thrive on "client-server" model where an upload must first complete before anyone else can download the files. And frankly speaking, while this is excellent for securely storing your files on the cloud, it is quite a terrible experience for one-off file sharing. Furthermore, because files have to be stored in the server, there are always innate security concerns.

Exploring the P2P landscape

Then I had turned to the world of p2p file transfers. Some popular p2p file transfer tools included Blip, Magic-wormhole, Scp, Rsync, and Croc. Unfortunately, none of these tools satisfied me. First off, Blip is a GUI-only application, which introduced friction I didn’t want to deal with. I prefer tools that also provide a CLI interface, both for automation and for situations where I wouldn’t want to ask the receiving end to install a full GUI application. Additionally, Blip is closed-source and proprietary, which didn’t align with my preference for open and transparent tools. Magic-wormhole and croc are great CLI based tools, but they lacked first-class support for multi-file transfers. To give you a picture, magic-wormhole compresses the entire folder before transferring, and on receiving end it needs to decompress it all over again. For croc, sending multiple files required heavy amounts of hashing prior to sending them, not to mention it was simply not optimized for sending multiple files with high speed in my experience. Finally, there were the good old scp and rsync - while fast, they were meant to be used under assumptions that connections are completely open, which is usually not the case in real world environments.

The key realization

But then I discovered one thing they all have in common: they use TCP connections to transfer files. Recently, I've had great interest in QUIC and UDP protocol in general, so I knew I had to make use of this if I had any chance to beat these tools. I was confident that if well-engineered, QUIC can match if not outperform TCP based connections. This was the core focus of my tool : leverage QUIC to build the fastest CLI transfer tool that is not only dead simple to use but also incredibly fast. I had these seven requirements in mind:

Design goals

1). It must use the QUIC protocol and ICE exchange to support transfer between random two peers with zero setup. I've heard traditionally udp hole punching is more likely to succeed over tcp hole punching. Using QUIC would therefore benefit from the higher success of direct connectivity establishment, and usage of advanced congestion control algorithm like BBR.

2). It must have first-class support for multi-file transfers. Transferring multiple files should be as fast, if not faster, than a single file of same size.

3). It must support multiple receivers. Why not? That means you can just open one session, and receivers can trickle in at anytime to download your files. No need to restart sending and scanning your files all over again everytime.

4). It must be cross-platform across Windows, Linux, and Mac.

5). It must have automatic resume support. And it should be FAST. This would be non-negotiable for large file transfers that inherently takes long time.

6). It must be open source.

7). Finally, above all : SPEED over anything else. I didn't want to go too overboard with other aspects of file transfer (i.e. security) beyond viable minimum. This should be the core selling point of my tool.

So then, it was time to choose a language. While Rust and Go were strong candidates, I decided to use C++. I believed in the mature ecosystem of C++ quic / ice libraries, low-level configurability, and zero garbage collector churns to truly juice out the performance.

After several painful weeks working day and night coding and debugging in C++, I finally managed to come up with a working implementation. Here are some interesting observations I learned throughout the process:

1). Initially, I assumed that achieving maximum throughput would require multiple threads and several parallel QUIC connections and streams. However, after some experiments I discovered that the modern C++, QUIC libraries are so efficient that it allowed the app to saturate the network with only a single thread and a single connection. This realization brought three major benefits. First, the app consumed significantly less CPU and memory, since it avoided the overhead of managing multiple threads and connections. Second, the simplified architecture made it possible to implement fast resume-point detection that scales with file metadata and not the size of file itself, something that many traditional transfer tools struggle with because they rely on hashing entire files to determine progress. Lastly, because everything is transferred sequentially, there was considerably less chance of files getting corrupted on receiver end, even without extra file hash verification which would slow down the whole transfer one way or the other.

2). QUIC scales heavily with CPU power and core count of the host machine. While QUIC performed worse on low-end devices, for a reasonable CPU released in last 10 years and with at least 2 cores, QUIC outperformed TCP.

3). BBR congestion algorithm made huge difference in terms of throughput in my implementation, almost showing ~x4 throughput compared to CUBIC. Also, the UDP buffer size of the OS mattered a lot. Transfers became nearly ~x1.3 faster given plenty of UDP buffer size of at least 8 MiB.

And finally, the moment of truth came.. to benchmark my tool against existing ones. Here are the results:

Benchmarks

Environment

Vultr instance: 2 vCPU (AMD EPYC Genoa), 4 GB RAM, NVMe SSD, Ubuntu 24.04
Tests conducted over the public internet between Chicago (sender) and Seoul (receiver), representing a high-latency link.
Methodology: results show the median of 3 runs. Times represent end-to-end wall-clock duration, measured from the moment the sender command is executed until the receiver completes (or the sender finishes if no separate receive command exists), not just the raw transfer time.
For specific benchmarking script used, refer to the ultimate_bench.sh script in the github repo

Summary

Tool	Transport	Random Remote Peers	Multi-Receiver	10 GiB File	1000 × 10 MiB
Thruflux	QUIC	Yes	Yes	2m 20s	2m 18s
Croc	TCP	Yes	No	2m 40s	19m 33s
Wormhole	TCP (relay)	Yes	No	22m 20s	N/A (stalled around ~38:59)
SCP	TCP	No	No	15m 06s	26m 14s
Rsync	TCP	No	No	15m 18s	14m 53s

And TURN relay benchmarks here as well:

Mode 10 GiB File 1000 × 10 MiB

Thruflux (TURN relay) 5m 40s 5m 35s

Mode	10 GiB File	1000 × 10 MiB
Thruflux (TURN relay)	5m 40s	5m 35s

..and it seemed very promising! Even with 6 seconds of initial p2p handshake phase (which scp and rsync doesn't have) my tool was able to significantly outperform scp and rsync in terms of wall clock time. Besides scp/rsync, compared to existing p2p tools, my tool still appeared to be faster - in fact, for 1000 files transfer, it had shown dominant performance that cannot be ruled out as statistical anomaly. Furthermore, even when direct connectivity was not possible, the turn relay mode still have shown a solid performance that were faster than or competitive with other tools. Above all, what I particularly wanted to highlight was how its performance on transferring 1000 or a single file of same size did not change at all.

Tradeoffs and limitations

So.. it seemed all nice. What were the catches?

1). CPU dependent : My tool required higher CPU power compared to other tools. On devices with low end cpu and only 1 core, it performed marginally worse than rsync and scp. However, the CPU usage itself was lower due to use of single connection.

2). Need for TURN relay fallback: While I included default turn relays in case direct connectivity cannot be established, my self-hosted turn server is not that powerful and capacity-limited. Therefore, it showed worse results compared to direct scenario. While using QUIC helps boost chance of direct connectivity establishment, there will likely be some users who needs TURN relay.

3). UDP quirks: I found out that some restrictive networks completely block outbound UDP sometimes so not even TURN would work in this case. Basically QUIC is infeasible in this situation.

4). Longer connection phase: Since I'm using a full ICE exchange protocol, initial connection phase is slower than other tools for sure. But I think this is something I can improve upon if I use trickle ICE instead of gather-all like right now.

5). Lack of verification: For speed, my tool trusts the QUIC protocol's network level integrity (which is stronger than TCP by nature). However, there can be rare edge cases such as disk corruption that may corrupt the file. But this is arguably quite rare (more so because the tool only uses a single connection) that I decided to skip for now.

6). Bloated join code: Unlike croc/wormhole, I do not use PAKE but I rely on WSS TLS encryption and QUIC's innate AED encryption in transit. Therefore, join code must have as high entropy as possible to compensate for the security. I understand some may not love the current join code system, but I thought hopefully it wouldn't matter too much because we all copy paste anyways.

Yet, regardless of the tradeoffs, I believe Thruflux shows real promise - especially in multi-file transfer scenarios where traditional tools often struggle. There's still plenty of room to improve, but the early results have been really encouraging for me.

Open beta release

After more internal testing, I’ve decided to open the project to the public and move onto its alpha stage. To give a brief note about the project status, all core functionality is implemented and basic testing has been done, but please expect rough edges. I unfortunately find that cross-platform and networking software in particular tends to reveal unexpected bugs in the wild.

Wrapping up, if you decide to try Thruflux (I hope you do!), I would genuinely appreciate your feedback. Bug reports, performance comparisons, edge cases, and constructive criticism are all incredibly valuable at this stage. After all, without them, Thruflux will never be able to move out from its beta stage.

⸻

Quickstart 🚀

Install

Linux Kernel 3.10+ / glibc 2.17+ (Ubuntu, Debian, CentOS, etc.)

curl -fsSL https://raw.githubusercontent.com/samsungplay/Thruflux/refs/heads/main/install_linux.sh | bash

Mac 11.0+ (Intel & Apple Silicon)

curl -fsSL https://raw.githubusercontent.com/samsungplay/Thruflux/refs/heads/main/install_macos.sh | bash

Windows 10+ (10+ Recommended, technically still could work on Windows 7/8)

iwr -useb https://raw.githubusercontent.com/samsungplay/Thruflux/refs/heads/main/install_windows.ps1 | iex

Use

# host files
thru host ./photos ./videos

# share the join code with multiple peers
thru join ABCDEFGH --out ./downloads

Finally, if you would like to know more about the project and follow its latest updates, you may examine the repo, I have some detailed README ready there :) :

https://github.com/samsungplay/Thruflux