For part 6 let's do some benchmarks;
What is going to be benchmarked
io_uring read+write with IVTS reactor inline continuations (RunAsynchrounousContinuation = false)
io_uring read+write without IVTS reactor inline continuations(threadpool) (RunAsynchrounousContinuation = true)
io_uring read + libc send write without IVTS reactor inline continuations(threadpool) (RunAsynchrounousContinuation = true)
epoll read+write with IVTS reactor inline continuations
epoll read+write without IVTS reactor inline continuations
System.Net.Socket (Kestrel stock) - epoll threadpool
Tests
(No pipelining)
- Synchronous lightweight plaintext "OK" response.
- Asynchronous workload to serialize a very large object.
Hardware
i9 14900k
64GB DDR5 6400MHz
Linux Kernel 6.17.0-22-generic
Tests are done through localhost loopback (no NIC influence)
MTU 1500
Load generators
Http/1.1 no TLS
wrk (epoll)
gcannon (io_uring)
io_uring read+write with IVTS reactor inline continuations
This is the exact model explored throughout the series, expected to deliver high performance on synchronous test.
Reactor count: 12
Sync Workload
wrk -c 512 -t18 -d5s http://localhost:8080/
18 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 121.45us 178.81us 8.32ms 99.05%
Req/Sec 201.31k 40.61k 350.92k 73.09%
18299278 requests in 5.10s, 1.12GB read
Requests/sec: 3588059.25
Transfer/sec: 225.84MB
gcannon http://localhost:8080/ -c 512 -t 16 -d 5
gcannon v0.5.3
Target: localhost:8080/
Threads: 16
Conns: 512 (32/thread)
Pipeline: 1
Req/conn: unlimited (keep-alive)
Expected: 200
Duration: 5s
Thread Stats Avg p50 p90 p99 p99.9
Latency 129us 125us 185us 245us 317us
19735722 requests in 5.00s, 19735721 responses
Throughput: 3.95M req/s
Bandwidth: 248.42MB/s
Status codes: 2xx=19735721, 3xx=0, 4xx=0, 5xx=0
Latency samples: 19735657 / 19735721 responses (100.0%)
Async Workload (Very unstable)
wrk -c 512 -t18 -d5s http://localhost:8080/
18 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 435.74us 795.84us 12.73ms 88.81%
Req/Sec 142.93k 29.31k 265.52k 68.29%
12883294 requests in 5.10s, 810.91MB read
Requests/sec: 2526866.89
Transfer/sec: 159.05MB
gcannon http://localhost:8080/ -c 512 -t 16 -d 5
gcannon v0.5.3
Target: localhost:8080/
Threads: 16
Conns: 512 (32/thread)
Pipeline: 1
Req/conn: unlimited (keep-alive)
Expected: 200
Duration: 5s
Thread Stats Avg p50 p90 p99 p99.9
Latency 185us 135us 229us 1.84ms 4.10ms
13797048 requests in 5.00s, 13797048 responses
Throughput: 2.76M req/s
Bandwidth: 173.67MB/s
Status codes: 2xx=13797048, 3xx=0, 4xx=0, 5xx=0
Latency samples: 13796999 / 13797048 responses (100.0%)
io_uring read+write without IVTS reactor inline
Similar model explored throughout the series but with RunAsynchronousContinuation set to true on both IVTS, expected to deliver close results on both tests.
Reactor count: 12
Sync Workload
wrk -c 512 -t18 -d5s http://localhost:8080/
18 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 515.72us 821.99us 12.67ms 87.67%
Req/Sec 110.03k 21.14k 212.25k 71.55%
9946282 requests in 5.10s, 626.04MB read
Requests/sec: 1950919.66
Transfer/sec: 122.80MB
gcannon http://localhost:8080/ -c 512 -t 16 -d 5
gcannon v0.5.3
Target: localhost:8080/
Threads: 16
Conns: 512 (32/thread)
Pipeline: 1
Req/conn: unlimited (keep-alive)
Expected: 200
Duration: 5s
Thread Stats Avg p50 p90 p99 p99.9
Latency 211us 164us 273us 1.55ms 3.79ms
12080236 requests in 5.00s, 12080325 responses
Throughput: 2.41M req/s
Bandwidth: 151.97MB/s
Status codes: 2xx=12080325, 3xx=0, 4xx=0, 5xx=0
Latency samples: 12080192 / 12080325 responses (100.0%)
Async Workload
wrk -c 512 -t18 -d5s http://localhost:8080/
18 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 530.17us 842.05us 13.37ms 87.50%
Req/Sec 108.43k 26.31k 204.89k 71.33%
9726083 requests in 5.03s, 612.18MB read
Requests/sec: 1935462.26
Transfer/sec: 121.82MB
gcannon http://localhost:8080/ -c 512 -t 16 -d 5
gcannon v0.5.3
Target: localhost:8080/
Threads: 16
Conns: 512 (32/thread)
Pipeline: 1
Req/conn: unlimited (keep-alive)
Expected: 200
Duration: 5s
Thread Stats Avg p50 p90 p99 p99.9
Latency 213us 146us 265us 2.27ms 4.38ms
11952675 requests in 5.00s, 11952749 responses
Throughput: 2.39M req/s
Bandwidth: 150.45MB/s
Status codes: 2xx=11952749, 3xx=0, 4xx=0, 5xx=0
Latency samples: 11952633 / 11952749 responses (100.0%)
io_uring read + libc send write without IVTS reactor inline continuations
Similar model explored throughout the series but with RunAsynchronousContinuation set to true on both IVTS and the write branch is not io_uring, instead we use the libc's send, expected to deliver close results on both tests. This is an hybrid approach and should be the middle ground between the first two models.
Reactor count: 12
Sync Workload
wrk -c 512 -t18 -d5s http://localhost:8080/
18 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 410.23us 782.03us 12.08ms 87.21%
Req/Sec 158.40k 45.57k 251.18k 63.78%
14361239 requests in 5.10s, 0.88GB read
Requests/sec: 2817277.09
gcannon http://localhost:8080/ -c 512 -t 16 -d 5
gcannon v0.5.3
Target: localhost:8080/
Threads: 16
Conns: 512 (32/thread)
Pipeline: 1
Req/conn: unlimited (keep-alive)
Expected: 200
Duration: 5s
Thread Stats Avg p50 p90 p99 p99.9
Latency 154us 84us 176us 2.68ms 4.32ms
16551871 requests in 5.00s, 16551875 responses
Throughput: 3.31M req/s
Bandwidth: 208.27MB/s
Status codes: 2xx=16551875, 3xx=0, 4xx=0, 5xx=0
Latency samples: 16551825 / 16551875 responses (100.0%)
Async Workload
wrk -c 512 -t18 -d5s http://localhost:8080/
18 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 418.96us 824.32us 17.51ms 88.51%
Req/Sec 154.72k 25.68k 240.94k 68.76%
13955371 requests in 5.09s, 0.86GB read
Requests/sec: 2742025.94
Transfer/sec: 172.59MB
gcannon http://localhost:8080/ -c 512 -t 16 -d 5
gcannon v0.5.3
Target: localhost:8080/
Threads: 16
Conns: 512 (32/thread)
Pipeline: 1
Req/conn: unlimited (keep-alive)
Expected: 200
Duration: 5s
Thread Stats Avg p50 p90 p99 p99.9
Latency 159us 85us 198us 1.99ms 4.41ms
15997491 requests in 5.00s, 15997498 responses
Throughput: 3.20M req/s
Bandwidth: 201.18MB/s
Status codes: 2xx=15997498, 3xx=0, 4xx=0, 5xx=0
Latency samples: 15997425 / 15997498 responses (100.0%)
epoll read+write with IVTS reactor inline continuations
Pure epoll approach with same reactor threading architecture. Inline handler continuation for both IVTS.
Reactor count: 12
Sync Workload
wrk -c 512 -t18 -d5s http://localhost:8080/
18 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 284.42us 610.90us 11.06ms 91.79%
Req/Sec 188.08k 42.17k 288.89k 60.15%
17141225 requests in 5.10s, 2.01GB read
Requests/sec: 3358876.80
Transfer/sec: 403.61MB
gcannon http://localhost:8080/ -c 512 -t 16 -d 5
gcannon v0.5.3
Target: localhost:8080/
Threads: 16
Conns: 512 (32/thread)
Pipeline: 1
Req/conn: unlimited (keep-alive)
Expected: 200
Duration: 5s
Thread Stats Avg p50 p90 p99 p99.9
Latency 160us 86us 194us 2.07ms 4.39ms
15856691 requests in 5.00s, 15856698 responses
Throughput: 3.17M req/s
Bandwidth: 199.56MB/s
Status codes: 2xx=15856698, 3xx=0, 4xx=0, 5xx=0
Latency samples: 15856636 / 15856698 responses (100.0%)
Async Workload
wrk -c 512 -t18 -d5s http://localhost:8080/
18 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 458.63us 0.90ms 15.96ms 88.39%
Req/Sec 150.84k 25.75k 232.74k 65.71%
13670697 requests in 5.10s, 1.60GB read
Requests/sec: 2680674.42
Transfer/sec: 322.12MB
gcannon http://localhost:8080/ -c 512 -t 16 -d 5
gcannon v0.5.3
Target: localhost:8080/
Threads: 16
Conns: 512 (32/thread)
Pipeline: 1
Req/conn: unlimited (keep-alive)
Expected: 200
Duration: 5s
Thread Stats Avg p50 p90 p99 p99.9
Latency 159us 74us 185us 2.68ms 5.32ms
15386279 requests in 5.00s, 15386278 responses
Throughput: 3.08M req/s
Bandwidth: 369.72MB/s
Status codes: 2xx=15386278, 3xx=0, 4xx=0, 5xx=0
Latency samples: 15386230 / 15386278 responses (100.0%)
epoll read+write without IVTS reactor inline continuations
Pure epoll approach with same reactor threading architecture. Threadpool handler continuation for both IVTS.
Reactor count: 6
Sync Workload
wrk -c 512 -t18 -d5s http://localhost:8080/
18 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 391.31us 764.42us 13.71ms 88.16%
Req/Sec 167.26k 26.31k 244.01k 75.88%
15179066 requests in 5.10s, 1.78GB read
Requests/sec: 2975933.84
Transfer/sec: 357.60MB
gcannon http://localhost:8080/ -c 512 -t 16 -d 5
gcannon v0.5.3
Target: localhost:8080/
Threads: 16
Conns: 512 (32/thread)
Pipeline: 1
Req/conn: unlimited (keep-alive)
Expected: 200
Duration: 5s
Thread Stats Avg p50 p90 p99 p99.9
Latency 140us 96us 150us 2.06ms 4.15ms
18019801 requests in 5.00s, 18019801 responses
Throughput: 3.60M req/s
Bandwidth: 432.83MB/s
Status codes: 2xx=18019801, 3xx=0, 4xx=0, 5xx=0
Latency samples: 18019763 / 18019801 responses (100.0%)
Async Workload
wrk -c 512 -t18 -d5s http://localhost:8080/
18 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 464.15us 838.78us 10.74ms 87.28%
Req/Sec 158.12k 14.36k 266.80k 72.35%
14231176 requests in 5.10s, 1.18GB read
Requests/sec: 2790992.53
Transfer/sec: 236.89MB
gcannon http://localhost:8080/ -c 512 -t 16 -d 5
gcannon v0.5.3
Target: localhost:8080/
Threads: 16
Conns: 512 (32/thread)
Pipeline: 1
Req/conn: unlimited (keep-alive)
Expected: 200
Duration: 5s
Thread Stats Avg p50 p90 p99 p99.9
Latency 154us 96us 154us 2.22ms 4.48ms
16342325 requests in 5.00s, 16342325 responses
Throughput: 3.27M req/s
Bandwidth: 277.35MB/s
Status codes: 2xx=16342325, 3xx=0, 4xx=0, 5xx=0
Latency samples: 16342273 / 16342325 responses (100.0%)
System.Net.Socket (Kestrel stock) - epoll threadpool
Kestrel's stock network I/O with some tunning:
listener.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
client.NoDelay = true; // TCP_NODELAY
Sync Workload
wrk -c 512 -t18 -d5s http://localhost:8080/
18 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 156.79us 342.31us 6.98ms 96.45%
Req/Sec 174.25k 35.85k 266.63k 73.35%
15748223 requests in 5.10s, 0.97GB read
Requests/sec: 3088338.61
Transfer/sec: 194.39MB
gcannon http://localhost:8080/ -c 512 -t 16 -d 5
gcannon v0.5.3
Target: localhost:8080/
Threads: 16
Conns: 512 (32/thread)
Pipeline: 1
Req/conn: unlimited (keep-alive)
Expected: 200
Duration: 5s
Thread Stats Avg p50 p90 p99 p99.9
Latency 141us 129us 176us 305us 3.17ms
18024579 requests in 5.00s, 18024579 responses
Throughput: 3.60M req/s
Bandwidth: 226.84MB/s
Status codes: 2xx=18024579, 3xx=0, 4xx=0, 5xx=0
Latency samples: 18024567 / 18024579 responses (100.0%)
Async Workload
wrk -c 512 -t18 -d5s http://localhost:8080/
18 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 255.07us 507.29us 12.53ms 93.36%
Req/Sec 150.64k 15.91k 235.46k 73.35%
13618906 requests in 5.10s, 857.21MB read
Requests/sec: 2671254.72
Transfer/sec: 168.14MB
gcannon http://localhost:8080/ -c 512 -t 16 -d 5
gcannon v0.5.3
Target: localhost:8080/
Threads: 16
Conns: 512 (32/thread)
Pipeline: 1
Req/conn: unlimited (keep-alive)
Expected: 200
Duration: 5s
Thread Stats Avg p50 p90 p99 p99.9
Latency 169us 123us 237us 1.25ms 3.89ms
15043820 requests in 5.00s, 15043820 responses
Throughput: 3.01M req/s
Bandwidth: 189.25MB/s
Status codes: 2xx=15043820, 3xx=0, 4xx=0, 5xx=0
Latency samples: 15043756 / 15043820 responses (100.0%)
Top comments (0)