DEV Community

M
M

Posted on

Pushing wireguard-go from 3.5 Gbps to 5 Gbps: A battle against the Go GC on old CPU 🚀

Hey DEV community! 👋

I’ve been on a mission to push standard wireguard-go to its absolute limits. I was sitting at a respectable 3.5 Gbps on my old CPU, but I wanted that magic 5 Gbps number.

Here is how I squeezed out that last 1.5 Gbps and hit my goal:

  1. Slaying the Allocation Dragon 🐉
    At millions of packets per second, Go's Garbage Collector is your biggest enemy. I used an LLM as an "escape analysis copilot" to hunt down heap allocations on the hot path. We dropped the alloc/op from 72 down to exactly 1.

  2. Stack > Heap 📚
    To keep the GC asleep, variables cannot escape to the heap. We swapped the standard crypto/curve25519 for Cloudflare's circl/dh/x25519 (which uses fixed-size, stack-friendly arrays) and rewrote constructors to modify variables in-place (CreateMessageInitiationInto) instead of returning pointers.

  3. The "Duh" Moment 🤦‍♂️
    After all that complex crypto optimization, I gained an instant 200+ Mbps just by completely disabling verbose debug logs. Even with fast checks, hidden mutex locks and I/O syscalls are silent performance killers. This got me to 4.5 Gbps.

  4. Clearing the Runway ✈️
    The final push to a flat 5.0 Gbps? Realizing the CPU was dealing with background noise. When your code is this highly optimized, your bottleneck is the CPU scheduler. Killing background processes and giving the Go app a clear runway unlocked the final 500 Mbps.

The Takeaway:
You can write ultra-fast networking gear in pure Go, but you have to treat the Garbage Collector like an enemy on the hot path.

performance report: https://github.com/blinkinglight/wireguard-go/blob/performance/PERF_REPORT.md
branch if you want to test it: performance ( https://github.com/blinkinglight/wireguard-go/blob/performance )

_ p.s. LLM is used to translate article _

Top comments (0)