There is much to be said for the merits of eBPF when it comes to the common problems of network filtering, and one cannot help but observe the swift evolution of this splendid technology. However, one must admit that certain undertakings remain decidedly more tiresome than others. While dropping packets comes across as a rather straightforward affair, the pursuit of accelerating a stateful, high-performance userspace relay server presents a most formidable set of challenges.
Notably, TURN (RFC 8656: Traversal Using Relays around NAT) serves as a proverbial specimen of burning CPU cycles owing to an eye-watering amount of kernel-to-user mode switches and vice versa, yet offloading the channel traffic to the eBPF layer demands a most substantial and, one might say, exhaustive quantity of logic to conduct packet processing with surgical precision.
The proposition to accelerate TURN using an eBPF bypass scheme has been mooted for a considerable duration. Notably, there have been feature requests (like this one in the coturn project) and solutions, like the exemplary work of Tamás Lévai et al., entitled "Supercharge WebRTC: Accelerate TURN Services with eBPF/XDP". Although such remediations are thoughtfully designed to handle the protocol with utmost care, they require substantial assistance from the server to do their job. While this approach is certainly commendable, I take a rather different view on what the more beneficial arrangement might be.
When it comes to unencrypted TURN traffic, one may construct a stateful eBPF component that is fully protocol-aware but learns about new TURN channel bindings by "snooping", ensuring the userspace server stands unmodified and blissfully unaware of the offload. In this post, I should like to present a humble prototype of this very approach, an open-source project that is named TURN-BPF.
TURN-BPF: research into eBPF offloads of RFC 8656 channels
TURN-BPF is a personal development effort aiming to reckon the feasibility
of using XDP programs to bypass the userspace for the TURN channel traffic
without the need to tamper with the code of the TURN implementation itself.
These programs conduct NAT (client <> TURN | relay <> peer), strip/add the
TURN channel tag, update the checksums, rewrite the MAC addresses and send
the resulting packets onto the wire via interfaces chosen based on the FIB.
The tool requires no configuration from the user, except for the interface
name(s) and is supposed to snoop on relay allocations and channel bindings
by capturing the said control packet handshakes at the XDP/TC hooks on the
main network interface. For the sake of keeping the channels active in the
userspace TURN server, the tool employs a 'heartbeat' approach, spilling a
small fraction of data packets… [ CLIENTS ] [ PEERS ]
| | |
(STUN) | | (Tagged Data) (Untagged Data)
| | |
+--------+-----+--------+ +--------+--------+
| Interface for Clients | | Relay Interface |
+--------+-----+--------+ +--------+--------+
| | |
| +-----+ |
| | |
=========|===========|=================================|=========
KERNEL | | |
| | |
+--------+--------+ | +-------------------------+ |
| XDP/TC Snoopers | +--<|> XDP cli2rem / rem2cli <|>--+
+--------+--------+ +---+---------------------+ |
(Maps) | | (Fast Path) |
| | ^
| (Heartbeat Spill) |
| | |
=========|===================|=========================|=========
USERLAND | | |
| +---->+-------------+-->--+
| | TURN Server |
+------------------------<+>------------+
Architecture: The Snooping Approach
The structural underpinnings of the arrangement include a set of XDP and TC (Traffic Control) programs written in C, some of which snoop on control packets to commit the channels for an offload, while others conduct the offload per se. Depending on the configuration (a single network interface acting as both the client endpoint and the relay versus a separate client-facing interface and one or multiple relays), the eBPF component provides either a separate XDP rem2cli program or a combined XDP section with cli2rem, rem2cli, and the STUN snooper baked in.
One might inquire why the tool employs a TC hook for snooping on the server's responses rather than keeping everything within the XDP layer. The reasoning is as follows: while XDP is unparalleled for raw speed on ingress, the TC egress hook on the client-facing interface allows one to observe the packets after they have been processed by the userspace stack and the kernel's networking subsystem. At this stage, the ChannelBind success response is fully formed and ready to depart. By intercepting it here, one can ensure that a channel is only committed to the fast path once the server has officially sanctioned the allocation.
Under the Hood: FIB Lookups & Heartbeats
Indubitably, the implementation possesses considerably more depth than might initially meet the eye. In particular, the knowledge of which IP addresses map to which MAC addresses and, more crucially, network interfaces, comes from the FIB lookup. The lookup is performed in the control path (in the snoopers), when the channel binding is committed, and relies upon a remarkably helpful and powerful building block from the kernel, the bpf_fib_lookup function.
Furthermore, to ensure the longevity of the session within the userspace daemon, the tool employs a "heartbeat spill" mechanism. By deliberately passing a minute fraction of the data packets up the network stack, we allow the TURN server to perceive the channel as active, thereby preventing the premature expiration of the allocation while the bulk of the throughput enjoys the celerity of the eBPF fast path. Regrettably, there are certain limitations, too. The project, which is merely a proof-of-concept, does not handle encrypted channels and can only support IPv4 traffic.
The tool comes with a loader program written in Rust, which blocks waiting for the Ctrl+C keystroke upon successful activation of the kernel component. This part is a hodgepodge of foundational knowledge from the textbook I have been reading, some AI advice, and my occupational hazards from the ten years of experience in C programming; therefore, one should take the code quality with a grain of salt.
Deployment & Verification
On Debian 13 (kernel 6.12), one may run the tool as follows (which should be done in a separate terminal on the server machine, prior to the launch of TURN):
sudo apt update
sudo apt install --yes cargo clang git libelf-dev pkg-config
git clone https://github.com/ivanmtech/turn-bpf
cd turn-bpf
cargo build
sudo ./target/debug/turn-bpf <main_ifname> [relay_ifnames...]
My poor man's test rig consisted of a pair of laptops connected via a commodity 100 Megabit/s USB Ethernet link. The server-side coturn configuration was:
listening-port=3478
listening-ip=192.168.47.1
relay-ip=192.168.47.1
user=user:password
realm=turn.test
These are my configuration steps on the server-side laptop:
sudo apt update
sudo apt install --yes coturn
sudo service coturn stop
sudo nmcli device set enx0 managed no
sudo ip addr add 192.168.47.1/24 dev enx0
sudo ip link set enx0 up
For the client-side laptop:
sudo nmcli device set enx0 managed no
sudo ip addr add 192.168.47.2/24 dev enx0
sudo ip link set enx0 up
Running coturn on the server-side laptop is as simple as:
sudo turnserver -c ~/test.cfg
While the client-side laptop acts both as a client and a remote peer:
turnutils_peer -L 192.168.47.2
turnutils_uclient -u user password \
-w p -e 192.168.47.2 -n 100000 -m 50 -g 192.168.47.1
Performance Results
In my case, running pidstat for coturn indicates CPU usage at a negligible level, virtually 0%, which jumps swiftly to 20% when Ctrl+C is pressed in the TURN-BPF terminal. System-wide CPU usage is 0-1% when the offload is active, versus 6-7% in its absence. Again, my test rig was lacking in many ways, so the results presented might not necessarily meet production-grade expectations.
As I continue to explore the frontiers of high-performance networking, I should like to remain open to communication with peers and would be delighted to converse about any other architectural and system design challenges of the modern age.
Top comments (0)