SIMD in Go — Cryptographic XOR Benchmark

Demonstrates Go 1.26's simd/archsimd package applied to the core operation behind stream ciphers (AES-CTR, ChaCha20, OTP): XOR of a plaintext buffer with a keystream.

What's benchmarked

Implementation	Strategy	Instructions
`XORScalar`	Byte-by-byte loop	`XOR r8, r8`
`XORSimd256`	32 bytes/iteration via AVX2	`VPXOR ymm, ymm, ymm`
`XORSimd256Unrolled`	128 bytes/iteration (4× unrolled)	4× `VPXOR ymm` per iter

Run (macOS with Docker)

./run.sh

Expected speedup

On x86_64 (emulated via Docker on Apple Silicon, real numbers will be higher on native):

Small payloads (256B): ~5-10× faster with SIMD
Large payloads (1MB+): ~15-25× faster with SIMD (memory-bandwidth limited)

On native x86 hardware (e.g. Emerald Rapids Xeon), expect even better numbers since VPXOR has 1-cycle latency and can retire 3 per cycle on ports p0/p1/p5.

uops.info reference

For the presentation, show the Emerald Rapids measurements for:

VPXOR (YMM, YMM, YMM) — the instruction behind Uint8x32.Xor() - link
VAESENC (XMM, XMM,…

// XORSimd256 applies a repeating keystream over plaintext using 256-bit (32-byte) // SIMD vectors. Each VPXOR instruction XORs 32 bytes in a single cycle. // // This means the CPU can XOR 96 bytes per clock cycle with 256-bit vectors. func XORSimd256(destination, plaintext, keystream []byte) { n := len(plaintext) i := 0 // Process 32 bytes at a time using AVX2 VPXOR for i+32 <= n { p := archsimd.LoadUint8x32((*[32]byte)(plaintext[i : i+32])) k := archsimd.LoadUint8x32((*[32]byte)(keystream[i : i+32])) r := p.Xor(k) // r holds the XOR result in a SIMD register (a CPU vector register, not memory). // .Store() writes those 32 bytes from the register back into the destination slice in main memory. // Without it, the result would just be discarded when the register gets reused. r.Store((*[32]byte)(destination[i : i+32])) i += 32 } // Handle remaining bytes (< 32) with scalar fallback for ; i < n; i++ { destination[i] = plaintext[i] ^ keystream[i] } }

DEV Community

SIMD in GO

oriiyx / simd-presentation

SIMD in Go — Cryptographic XOR Benchmark

What's benchmarked

Run (macOS with Docker)

Expected speedup

uops.info reference

Top comments (0)