Diogo Martins

Posted on May 27

C# Networking Deep Dive with io_uring part 6 - Numbers

#csharp #linux #performance #networking

For part 6 let's do some benchmarks;

What is going to be benchmarked

io_uring read+write with IVTS reactor inline continuations (RunAsynchrounousContinuation = false)
io_uring read+write without IVTS reactor inline continuations(threadpool) (RunAsynchrounousContinuation = true)
io_uring read + libc send write without IVTS reactor inline continuations(threadpool) (RunAsynchrounousContinuation = true)
epoll read+write with IVTS reactor inline continuations
epoll read+write without IVTS reactor inline continuations
System.Net.Socket (Kestrel stock) - epoll threadpool

Tests

(No pipelining)

Synchronous lightweight plaintext "OK" response.
Asynchronous workload to serialize a very large object.

Hardware

i9 14900k
64GB DDR5 6400MHz
Linux Kernel 6.17.0-22-generic

Tests are done through localhost loopback (no NIC influence)
MTU 1500

Load generators

Http/1.1 no TLS

wrk (epoll)
gcannon (io_uring)

io_uring read+write with IVTS reactor inline continuations

This is the exact model explored throughout the series, expected to deliver high performance on synchronous test.

Reactor count: 12

Sync Workload

wrk -c 512 -t18 -d5s http://localhost:8080/

18 threads and 512 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
  Latency   121.45us  178.81us   8.32ms   99.05%
  Req/Sec   201.31k    40.61k  350.92k    73.09%
  18299278 requests in 5.10s, 1.12GB read
Requests/sec: 3588059.25
Transfer/sec:    225.84MB

gcannon http://localhost:8080/ -c 512 -t 16 -d 5

gcannon v0.5.3
Target:    localhost:8080/
Threads:   16
Conns:     512 (32/thread)
Pipeline:  1
Req/conn:  unlimited (keep-alive)
Expected:  200
Duration:  5s

Thread Stats   Avg      p50      p90      p99    p99.9
  Latency    129us    125us    185us    245us    317us

19735722 requests in 5.00s, 19735721 responses
Throughput: 3.95M req/s
Bandwidth:  248.42MB/s
Status codes: 2xx=19735721, 3xx=0, 4xx=0, 5xx=0
Latency samples: 19735657 / 19735721 responses (100.0%)

Async Workload (Very unstable)

wrk -c 512 -t18 -d5s http://localhost:8080/

18 threads and 512 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
  Latency   435.74us  795.84us  12.73ms   88.81%
  Req/Sec   142.93k    29.31k  265.52k    68.29%
12883294 requests in 5.10s, 810.91MB read
Requests/sec: 2526866.89
Transfer/sec:    159.05MB

gcannon http://localhost:8080/ -c 512 -t 16 -d 5

gcannon v0.5.3
Target:    localhost:8080/
Threads:   16
Conns:     512 (32/thread)
Pipeline:  1
Req/conn:  unlimited (keep-alive)
Expected:  200
Duration:  5s

Thread Stats   Avg      p50      p90      p99    p99.9
  Latency    185us    135us    229us   1.84ms   4.10ms

13797048 requests in 5.00s, 13797048 responses
Throughput: 2.76M req/s
Bandwidth:  173.67MB/s
Status codes: 2xx=13797048, 3xx=0, 4xx=0, 5xx=0
Latency samples: 13796999 / 13797048 responses (100.0%)

io_uring read+write without IVTS reactor inline

Similar model explored throughout the series but with RunAsynchronousContinuation set to true on both IVTS, expected to deliver close results on both tests.

Reactor count: 12

Sync Workload

wrk -c 512 -t18 -d5s http://localhost:8080/

18 threads and 512 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
  Latency   515.72us  821.99us  12.67ms   87.67%
  Req/Sec   110.03k    21.14k  212.25k    71.55%
9946282 requests in 5.10s, 626.04MB read
Requests/sec: 1950919.66
Transfer/sec:    122.80MB

gcannon http://localhost:8080/ -c 512 -t 16 -d 5

gcannon v0.5.3
Target:    localhost:8080/
Threads:   16
Conns:     512 (32/thread)
Pipeline:  1
Req/conn:  unlimited (keep-alive)
Expected:  200
Duration:  5s

Thread Stats   Avg      p50      p90      p99    p99.9
  Latency    211us    164us    273us   1.55ms   3.79ms

12080236 requests in 5.00s, 12080325 responses
Throughput: 2.41M req/s
Bandwidth:  151.97MB/s
Status codes: 2xx=12080325, 3xx=0, 4xx=0, 5xx=0
Latency samples: 12080192 / 12080325 responses (100.0%)

Async Workload

wrk -c 512 -t18 -d5s http://localhost:8080/

18 threads and 512 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
  Latency   530.17us  842.05us  13.37ms   87.50%
  Req/Sec   108.43k    26.31k  204.89k    71.33%
9726083 requests in 5.03s, 612.18MB read
Requests/sec: 1935462.26
Transfer/sec:    121.82MB

gcannon http://localhost:8080/ -c 512 -t 16 -d 5

gcannon v0.5.3
Target:    localhost:8080/
Threads:   16
Conns:     512 (32/thread)
Pipeline:  1
Req/conn:  unlimited (keep-alive)
Expected:  200
Duration:  5s

Thread Stats   Avg      p50      p90      p99    p99.9
  Latency    213us    146us    265us   2.27ms   4.38ms

11952675 requests in 5.00s, 11952749 responses
Throughput: 2.39M req/s
Bandwidth:  150.45MB/s
Status codes: 2xx=11952749, 3xx=0, 4xx=0, 5xx=0
Latency samples: 11952633 / 11952749 responses (100.0%)

io_uring read + libc send write without IVTS reactor inline continuations

Similar model explored throughout the series but with RunAsynchronousContinuation set to true on both IVTS and the write branch is not io_uring, instead we use the libc's send, expected to deliver close results on both tests. This is an hybrid approach and should be the middle ground between the first two models.

Reactor count: 12

Sync Workload

wrk -c 512 -t18 -d5s http://localhost:8080/

18 threads and 512 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
  Latency   410.23us  782.03us  12.08ms   87.21%
  Req/Sec   158.40k    45.57k  251.18k    63.78%
14361239 requests in 5.10s, 0.88GB read
Requests/sec: 2817277.09

gcannon http://localhost:8080/ -c 512 -t 16 -d 5

gcannon v0.5.3
Target:    localhost:8080/
Threads:   16
Conns:     512 (32/thread)
Pipeline:  1
Req/conn:  unlimited (keep-alive)
Expected:  200
Duration:  5s

Thread Stats   Avg      p50      p90      p99    p99.9
  Latency    154us     84us    176us   2.68ms   4.32ms

16551871 requests in 5.00s, 16551875 responses
Throughput: 3.31M req/s
Bandwidth:  208.27MB/s
Status codes: 2xx=16551875, 3xx=0, 4xx=0, 5xx=0
Latency samples: 16551825 / 16551875 responses (100.0%)

Async Workload

wrk -c 512 -t18 -d5s http://localhost:8080/

18 threads and 512 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
  Latency   418.96us  824.32us  17.51ms   88.51%
  Req/Sec   154.72k    25.68k  240.94k    68.76%
13955371 requests in 5.09s, 0.86GB read
Requests/sec: 2742025.94
Transfer/sec:    172.59MB

gcannon http://localhost:8080/ -c 512 -t 16 -d 5

gcannon v0.5.3
Target:    localhost:8080/
Threads:   16
Conns:     512 (32/thread)
Pipeline:  1
Req/conn:  unlimited (keep-alive)
Expected:  200
Duration:  5s

Thread Stats   Avg      p50      p90      p99    p99.9
  Latency    159us     85us    198us   1.99ms   4.41ms

15997491 requests in 5.00s, 15997498 responses
Throughput: 3.20M req/s
Bandwidth:  201.18MB/s
Status codes: 2xx=15997498, 3xx=0, 4xx=0, 5xx=0
Latency samples: 15997425 / 15997498 responses (100.0%)

epoll read+write with IVTS reactor inline continuations

Pure epoll approach with same reactor threading architecture. Inline handler continuation for both IVTS.

Reactor count: 12

Sync Workload

wrk -c 512 -t18 -d5s http://localhost:8080/

18 threads and 512 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
  Latency   284.42us  610.90us  11.06ms   91.79%
  Req/Sec   188.08k    42.17k  288.89k    60.15%
17141225 requests in 5.10s, 2.01GB read
Requests/sec: 3358876.80
Transfer/sec:    403.61MB

gcannon http://localhost:8080/ -c 512 -t 16 -d 5

gcannon v0.5.3
Target:    localhost:8080/
Threads:   16
Conns:     512 (32/thread)
Pipeline:  1
Req/conn:  unlimited (keep-alive)
Expected:  200
Duration:  5s

Thread Stats   Avg      p50      p90      p99    p99.9
  Latency    160us     86us    194us   2.07ms   4.39ms

15856691 requests in 5.00s, 15856698 responses
Throughput: 3.17M req/s
Bandwidth:  199.56MB/s
Status codes: 2xx=15856698, 3xx=0, 4xx=0, 5xx=0
Latency samples: 15856636 / 15856698 responses (100.0%)

Async Workload

wrk -c 512 -t18 -d5s http://localhost:8080/

18 threads and 512 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
  Latency   458.63us    0.90ms  15.96ms   88.39%
  Req/Sec   150.84k    25.75k  232.74k    65.71%
13670697 requests in 5.10s, 1.60GB read
Requests/sec: 2680674.42
Transfer/sec:    322.12MB

gcannon http://localhost:8080/ -c 512 -t 16 -d 5

gcannon v0.5.3
Target:    localhost:8080/
Threads:   16
Conns:     512 (32/thread)
Pipeline:  1
Req/conn:  unlimited (keep-alive)
Expected:  200
Duration:  5s

Thread Stats   Avg      p50      p90      p99    p99.9
  Latency    159us     74us    185us   2.68ms   5.32ms

15386279 requests in 5.00s, 15386278 responses
Throughput: 3.08M req/s
Bandwidth:  369.72MB/s
Status codes: 2xx=15386278, 3xx=0, 4xx=0, 5xx=0
Latency samples: 15386230 / 15386278 responses (100.0%)

epoll read+write without IVTS reactor inline continuations

Pure epoll approach with same reactor threading architecture. Threadpool handler continuation for both IVTS.

Reactor count: 6

Sync Workload

wrk -c 512 -t18 -d5s http://localhost:8080/

18 threads and 512 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
  Latency   391.31us  764.42us  13.71ms   88.16%
  Req/Sec   167.26k    26.31k  244.01k    75.88%
15179066 requests in 5.10s, 1.78GB read
Requests/sec: 2975933.84
Transfer/sec:    357.60MB

gcannon http://localhost:8080/ -c 512 -t 16 -d 5

gcannon v0.5.3
Target:    localhost:8080/
Threads:   16
Conns:     512 (32/thread)
Pipeline:  1
Req/conn:  unlimited (keep-alive)
Expected:  200
Duration:  5s

Thread Stats   Avg      p50      p90      p99    p99.9
  Latency    140us     96us    150us   2.06ms   4.15ms

18019801 requests in 5.00s, 18019801 responses
Throughput: 3.60M req/s
Bandwidth:  432.83MB/s
Status codes: 2xx=18019801, 3xx=0, 4xx=0, 5xx=0
Latency samples: 18019763 / 18019801 responses (100.0%)

Async Workload

wrk -c 512 -t18 -d5s http://localhost:8080/

18 threads and 512 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
  Latency   464.15us  838.78us  10.74ms   87.28%
  Req/Sec   158.12k    14.36k  266.80k    72.35%
14231176 requests in 5.10s, 1.18GB read
Requests/sec: 2790992.53
Transfer/sec:    236.89MB

gcannon http://localhost:8080/ -c 512 -t 16 -d 5

gcannon v0.5.3
Target:    localhost:8080/
Threads:   16
Conns:     512 (32/thread)
Pipeline:  1
Req/conn:  unlimited (keep-alive)
Expected:  200
Duration:  5s

Thread Stats   Avg      p50      p90      p99    p99.9
  Latency    154us     96us    154us   2.22ms   4.48ms

16342325 requests in 5.00s, 16342325 responses
Throughput: 3.27M req/s
Bandwidth:  277.35MB/s
Status codes: 2xx=16342325, 3xx=0, 4xx=0, 5xx=0
Latency samples: 16342273 / 16342325 responses (100.0%)

System.Net.Socket (Kestrel stock) - epoll threadpool

Kestrel's stock network I/O with some tunning:

listener.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
client.NoDelay = true;   // TCP_NODELAY

Sync Workload

wrk -c 512 -t18 -d5s http://localhost:8080/

18 threads and 512 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
  Latency   156.79us  342.31us   6.98ms   96.45%
  Req/Sec   174.25k    35.85k  266.63k    73.35%
15748223 requests in 5.10s, 0.97GB read
Requests/sec: 3088338.61
Transfer/sec:    194.39MB

gcannon http://localhost:8080/ -c 512 -t 16 -d 5

gcannon v0.5.3
Target:    localhost:8080/
Threads:   16
Conns:     512 (32/thread)
Pipeline:  1
Req/conn:  unlimited (keep-alive)
Expected:  200
Duration:  5s

Thread Stats   Avg      p50      p90      p99    p99.9
  Latency    141us    129us    176us    305us   3.17ms

18024579 requests in 5.00s, 18024579 responses
Throughput: 3.60M req/s
Bandwidth:  226.84MB/s
Status codes: 2xx=18024579, 3xx=0, 4xx=0, 5xx=0
Latency samples: 18024567 / 18024579 responses (100.0%)

Async Workload

wrk -c 512 -t18 -d5s http://localhost:8080/

18 threads and 512 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
  Latency   255.07us  507.29us  12.53ms   93.36%
  Req/Sec   150.64k    15.91k  235.46k    73.35%
13618906 requests in 5.10s, 857.21MB read
Requests/sec: 2671254.72
Transfer/sec:    168.14MB

gcannon http://localhost:8080/ -c 512 -t 16 -d 5

gcannon v0.5.3
Target:    localhost:8080/
Threads:   16
Conns:     512 (32/thread)
Pipeline:  1
Req/conn:  unlimited (keep-alive)
Expected:  200
Duration:  5s

Thread Stats   Avg      p50      p90      p99    p99.9
  Latency    169us    123us    237us   1.25ms   3.89ms

15043820 requests in 5.00s, 15043820 responses
Throughput: 3.01M req/s
Bandwidth:  189.25MB/s
Status codes: 2xx=15043820, 3xx=0, 4xx=0, 5xx=0
Latency samples: 15043756 / 15043820 responses (100.0%)

DEV Community

C# Networking Deep Dive with io_uring part 6 - Numbers

What is going to be benchmarked

Tests

Hardware

Load generators

io_uring read+write with IVTS reactor inline continuations

io_uring read+write without IVTS reactor inline

io_uring read + libc send write without IVTS reactor inline continuations

epoll read+write with IVTS reactor inline continuations

epoll read+write without IVTS reactor inline continuations

System.Net.Socket (Kestrel stock) - epoll threadpool

Top comments (0)