Energy Efficient Small File Uploads to S3

#serverless #aws #dotnet #greencomputing

Moving files into S3 as fast as possible has been a side-quest of mine for, well, two decades (yep, S3 is more than 20 years old).

My first encounter with moving large numbers of files to S3 was from an on-premise system, working over a backlog of many thousands of files, with thousands of new ones appearing each day.

For that project I tried the obvious path (a foreach loop doing uploads one by one) and failed. I discovered success with concurrency -- running multiple uploads simultaneously from multiple workers. It worked, and a similar technique has been built into the AWS CLI (CRT and concurrent connections config setting).

Concurrency was helpful because bandwidth wasn't the limit. The files were small; at most a 1-2 MB each, and many just a few KB. The bandwidth to S3 was relatively low by today's standards, but not the bottleneck. It was the round-trip latency, connection negotiation and retransmission getting in the way.

A few years down the road, I was tasked to put many millions of small JSON files into S3 to use as static data for a web application (a "static API"). I knew upfront the API PUT request costs would be high for this project, but as a one-time expense not a big deal.

The problem I feared was that I had to run the workload at the edge (on premises, around the world) because the source data was there. Even if we could have run inside AWS on EC2 it would still be a remarkably slow and CPU intensive process (again because of connection negotiation for governance-required TLS).

I wanted to explore UDP for file transfers back then, but didn't get the chance. Recently, for the 20th anniversary of S3, I finally have.

The problems with using UDP for file transfers are three-fold:

UDP packets are small, and files can be big.
The protocol itself doesn't support reliable delivery or ordering.
There is no real standard for encrypting UDP.

When it comes to transmitting files over a network, the points above are all significant.

To transmit a file larger than a single packet (at maximum about 1500 bytes), we need to split it. Doing so means we need to have a way to know which packets are for which files (some kind of ID). That's simple enough to add, but to keep true to the stateless nature of UDP and serverless backends it needs to be added to each packet (a little redundant).

SIDE NOTE: TCP based APIs (including HTTP/1 and HTTP/2) use state attached to the session and kept on the backend to map data to files. This is a fundamental difference when it comes to designing UDP APIs versus TCP APIs.

To ensure we have the same contents as the original requires getting all the file data (reliability) and knowing what order the chunks go in.

With UDP it's easy enough to add data to each packet to ensure the payloads they contain are assembled in the right order -- a sequence number (aka. index). With that in place, it doesn't matter what order the packets arrive, we can sort them on the backend to get the contents arranged correctly.

But we also need to make sure we have all the contents of the file. Note I didn't say "we have all the packets" -- we just need the file data.

This problem is exactly what RaptorQ solves. In a nutshell, it sends the content of the file plus extra error correction data so that a few missing packets doesn't prevent the file from being reconstructed data. It also handles sequence numbers, so we have a standard to follow.

So that covers moving the file data reliably. One check mark.

What about privacy? RaptorQ doesn't include any encryption. If we used plain UDP and RaptorQ only we'd be sharing our content with the internet at large. That's a problem for many folks.

Luckily, there are some "almost standard" options.

DTLS exists, but it's not mainstream, is ungainly to implement and enforces some "features" we don't need.

A more modern approach is WireGuard which uses the Noise handshake and fixed ChaCha20 encryption. This is a lightweight and proven way of protecting data in flight and based on UDP. Most VPN product support it. Hopefully it becomes a recognized standard someday.

Putting UDP, RaptorQ and WireGuard encryption together we get a very lightweight way to send files from the edge to the cloud.

Potentially. All this was all just a thought experiment until recently.

I first prototyped this idea with a WireGuard tunnel setup with the common wg command.

The files to be sent were passed though a RaptorQ encoder library and saved to disk (each packet as a separate file). I'm lazy like that. I then sent them with cat {file} > /dev/udp/1.2.3.4/4576. Note that sending via bash like this is slow, which is good.

Don't send UDP as fast as your computer can; it isn't nice.

To keep it short, that prototype worked and I got started on a more "real" backend. Serverless, of course.

To handle the packets without a server I used a UDP Gateway Listener configured for WireGuard. Since I was using the standard wg on the client side I enabled DecapsulatedDelivery which strips the IP and UDP headers before delivering the payloads to the backend (in this case a simple Lambda).

From there the packets were collected in DDB, and once a sufficient number had been received, then the reassembly process kicked-off. That process read all the records, decoded and reassembled them into an S3 object.

Magic happened. I got files.

I got files even with significant "packet loss" (intentionally skipping random packets).

The potential was manifest, but I needed to go fast to make it interesting.

I wanted a CLI that was small, dependency-free and fast. I chose C# and .Net 10 with Ahead-of-Time compilation to build native, dependency free, single-file executables. The raptor executable weighs-in at about 3.5MB on both Linux and Windows. The codebase is the same for both platform.

Modern cross-platform development is so very nice. No #ifdef conditional compilation anywhere.

The WireGuard encryption comes courtesy of Proxylity.WireGuardClient nuget library (MIT license, repo here) that implements a simple, UdpClient-like interface. It's about 800 lines of code in total.

As I noted above, spewing packets blindly isn't cool. Some kind of congestion control (slowing down and not overburdening an already busy network) is mandatory for a tool like this -- even when the goal is speed.

My solution to congestion control (for now) is using a fixed, configurable bandwidth for the transfer (--rate-mbps). It's easy to implement (just a delay between packets), and effective. It also has the downside of underutilizing the connection, and can add to congestion on a very busy network, but in practice this is a practical approach for me that allows tinkering.

I added a couple more interesting options. The first implements different confirmation levels from "none" to "yep, the object is in S3" (see --confirm), and the second the amount of extra repair packets to send (see --overhead). Using these together allows sending the file's packets without waiting for confirmations (--confirm NONE) while being pretty-sure the file will get there (--overhead 0.4 or whatever, depending on packet loss).

So is it faster? Yes!

In the benchmarks with 1KB, 10KB and 1MB files raptor outperforms aws s3 sync by a bit. It outperforms successive aws s3 cp by 25x. Nice.

The smaller the files, the bigger the advantage. For a few small files, it's essentially instant. And, it's more flexible.

Being real, raptor loses its advantage as files get larger because of the fixed bandwidth. The --rate-mbps option can be used to saturate a link, but in the end the repair and CPU overhead of the RaptorQ encoding starts to become significant. There are ways to fix this, but given my target use case (and limited knowledge) I'm stopping here for now.

Takeaway

Sure, it's faster but the most significant benefit of raptor is that it uses 10x less CPU power. Think about that. The world runs on TLS-protected requests and each one could use 90% less electricity. Billions per second -- 200 million per second for S3 alone -- the potential electricity savings are astounding.

Why aren't we doing this more? HTTPS and TLS are so entrenched they are rarely questioned. And moving legacy systems to more modern and efficient network stacks is understandably infeasible.

But if you're working on something new, think about making that greenfield project a little more "green".

Can you make raptor faster? Clone the repo, tweak it, experiment with different backends and let me know.

Contributions welcome!