DEV Community: Benny

The Transport Raises the Floor, Not the Ceiling: GenHTTP, ASP.NET, and an io_uring Power-Up

Benny — Thu, 25 Jun 2026 21:41:16 +0000

In round one, Benny Franciscus left us with a tidy moral: performance is contextual. ASP.NET took raw throughput, GenHTTP took the workloads, everybody nodded, nobody got fired. That conclusion has aged in an interesting way, because two things changed underneath it. GenHTTP swapped engines, and a new io_uring runtime called ioxide showed up offering to re-plumb the I/O layer of whatever you point it at.

So this is a controlled experiment, not a rematch. Two stages. Stage 1 establishes the gap between stock genhttp-11 and stock aspnet-minimal. Stage 2 changes exactly one variable — the transport — on both sides, holding the application code constant, and lets the deltas confess which framework was carrying more I/O overhead. Every number below is paired with the mechanism that produced it, because a benchmark without a mechanism is just a rumor with a chart.

Stage 1 — Stock vs Stock: GenHTTP 12, ASP.NET 0

First, the correction round one earned. Benny's genhttp was the old default engine, and ASP.NET beat it on baseline by roughly 10x. genhttp-11 is GenHTTP's tuned native HTTP/1.1 engine — its own socket loop plus its routing and serialization pipeline, server and framework fused into one library. That is not "round one lied." That is GenHTTP changing the motor and the result flipping. Here is the stock board.

Workload	genhttp-11	aspnet-minimal	Ratio
baseline	2,075,459	1,269,768	1.63x
pipelined	17,885,664	6,887,346	2.60x
limited-conn	1,093,905	458,236	2.39x
json	834,527	577,052	1.45x
json-comp	462,495	337,876	1.37x
json-tls	682,330	511,002	1.34x
static	914,726	356,294	2.57x
upload	2,932	1,539	1.91x
crud	518,433	356,311	1.46x
async-db	196,599	169,173	1.16x
api-4	53,376	40,202	1.33x
api-16	149,741	118,886	1.26x

GenHTTP 12, ASP.NET 0. A clean sweep, and a structurally honest one.

The why is architecture, not luck. aspnet-minimal is Kestrel on the default .NET sockets transport: epoll readiness, then a syscall to actually move the bytes, then continuations dispatched onto the thread pool. Every dependency it touches has its own private I/O stack stapled on the side — Npgsql brings its own socket layer and a locked connection pool, TLS goes through user-space SslStream crypto and copies, static files go through the filesystem. General-purpose, composable, and paying a coordination tax at every layer.

genhttp-11 is a purpose-built HTTP/1.1 engine that never agreed to be general-purpose. The widest stock gaps are exactly where Kestrel's generality costs the most: pipelined 2.60x, where the bottleneck is per-request HTTP/1.1 parsing and ASP.NET's pipeline does more bookkeeping per request; static 2.57x, GenHTTP's native file path versus a filesystem round-trip; limited-conn 2.39x, where the workload is dominated by accept/close churn and Kestrel's accept path is the slower one. The narrow gaps — async-db 1.16x, json-tls 1.34x — are the tell, and worth bookmarking: those are workloads dominated by something other than HTTP, where Npgsql and SslStream drag both frameworks down toward a common floor.

Stage 2 — Power Up Both: GenHTTP 11, ASP.NET 1

Now the one variable. Both frameworks keep their entire application layer — same routing, same serialization, same handlers — and swap their I/O substrate for the ioxide io_uring runtime.

ioxide is a shared-nothing io_uring runtime: one ring per reactor, one reactor per core (capped at 64). Each reactor owns its sockets, buffers, and DB connections, so there are no cross-core locks on the hot path. SO_REUSEPORT plus multishot accept lets the kernel load-balance connections and turns one submitted accept into many. Multishot recv harvests many completions into a pre-registered buffer ring. Completions resume the request state machine inline on the reactor thread via IValueTaskSource — no thread-pool handoff, no per-await allocation — and it talks to the kernel through raw io_uring_enter, no liburing in between. Where epoll says "this socket is ready, now go syscall," io_uring says "here is the completed read," and ioxide does the continuation on the same core that owns the data.

For aspnet-minimal-ioxide the change is — almost insultingly — one line: builder.WebHost.UseIoxide(), plus ioxide.pg for the database and kTLS instead of UseHttps. HTTP/2 and HTTP/3 are intentionally omitted, because ioxide is HTTP/1.1-only with no ALPN. genhttp-11-ioxide runs GenHTTP's same pipeline on GenHTTP.Engine.Ioxide, with the same per-reactor seams for ioxide.pg and kTLS. App layer constant; substrate swapped. Anything that moves is attributable to I/O.

Workload	genhttp-11-ioxide	aspnet-minimal-ioxide	Ratio	vs stock
baseline	3,207,811	1,806,612	1.78x	↑
pipelined	21,625,750	7,024,560	3.08x	widened
limited-conn	2,572,916	1,145,776	2.25x	↓
json	1,115,920	676,306	1.65x	↑
json-comp	565,487	382,233	1.48x	↑
json-tls	870,802	738,079	1.18x	narrowed
static	858,611	332,365	2.58x	~
upload	2,222	1,958	1.13x	collapsed
crud	651,333	555,136	1.17x	narrowed
async-db	274,238	291,828	0.94x	THE FLIP
api-4	55,032	48,477	1.14x	↓
api-16	184,888	154,249	1.20x	↓

Stock 12-0 becomes powered 11-1. ASP.NET takes async-db. To understand why that is the only flip — and why it was always going to be that one — read the per-framework power-up deltas, because the aggregate ratios hide the real story. Each cell here is measured against that framework's own stock baseline.

Workload	genhttp-11 → ioxide	aspnet-minimal → ioxide
baseline	+55%	+42%
pipelined	+21%	+2%
limited-conn	+135%	+150%
json-tls	+28%	+44%
crud	+26%	+56%
async-db	+39%	+73%
upload	-24%	+27%
static	-6%	-7%

The Payoff — Who Had More to Shed

Same substrate, two very different windfalls.

The database path is where ASP.NET was bleeding. async-db jumps +73% for ASP.NET versus +39% for GenHTTP; crud jumps +56% versus +26%. Mechanism: stock ASP.NET ran Npgsql, which means its own socket stack, a locked pool, and a thread-pool hop between the DB completion and the request continuation — bytes arrive on one thread, the request resumes on another. ioxide.pg deletes all of it. DB sockets ride the same ring as HTTP, the pool is per-reactor and shared-nothing, and rows stream from the driver's receive buffer straight into the HTTP response — query and response on one core, one ring, one buffer. GenHTTP gained from the same change, but it had less coordination overhead to begin with, so there was less to delete. Result: aspnet-minimal-ioxide at 291,828 overtakes genhttp-11-ioxide at 274,238. The single flip on the entire board, and it is a DB-transport effect, full stop. ASP.NET got rescued, because Npgsql's cross-thread choreography was a bigger fire to put out. The workload where the server mattered least in Stage 1 is exactly the workload that ioxide could swing — because the thing it fixed was never the server.

TLS tells the same story in miniature. json-tls gains +44% for ASP.NET versus +28% for GenHTTP, narrowing 1.34x → 1.18x. kTLS pushes record encryption into the kernel and retires Kestrel's user-space SslStream crypto and copies — and SslStream was the heavier starting point, so the heavier starting point loses more weight. Same with limited-conn: ASP.NET's accept/close churn was the worse path, so multishot accept helps it a hair more (+150% vs +135%).

Notice the pattern: ioxide removes the most where ASP.NET had the most — the plumbing. DB, TLS, connection churn. That is the floor rising.

Now the ceiling, which does not move. On raw HTTP, GenHTTP gains more: baseline +55% vs +42%, and pipelined +21% vs a rounding-error +2%. The pipelined gap widens, 2.60x → 3.08x. The mechanism is the whole thesis: pipelined is bottlenecked by Kestrel's per-request HTTP/1.1 parsing and pipeline work, which sits above the transport. Swapping sockets for a ring cannot accelerate parsing you still have to do. The ceiling is the parser, and the parser didn't change. ioxide raises the floor under both frameworks; it cannot lift application-layer request handling. GenHTTP's structural advantage there survives the power-up intact, and then some.

And, honestly, the regressions. Upload collapses from 1.91x to 1.13x — but not because ASP.NET got clever. ASP.NET's buffering path improved +27%, while GenHTTP regressed -24%: ioxide's recv buffer ring caps each slice at 16KB, so a 20MB body shatters into hundreds of completions and buffer returns. That ring is tuned for small requests, and it makes GenHTTP's otherwise-excellent bulk path pay for it; stock genhttp-11 remains GenHTTP's best uploader. Static regresses ~6-7% on both sides. ioxide.file serves static as a baked snapshot — full response, headers and body plus precompressed .br/.gz siblings, precomputed in native memory at startup and sent from a slab. But GenHTTP's FileResource overflows that 128KB write slab, so genhttp-11-ioxide routes static through IoxideFiles and still edges just under its own native path. The substrate is not a free lunch; it is a differently-priced lunch.

One belt ioxide never reaches for: HTTP/2 and HTTP/3. ioxide is HTTP/1.1-only — no framing, no HPACK, no ALPN. ASP.NET's live h2 numbers (baseline-h2 around 2.2M; static-h2 a 2.2-2.4x edge for the AOT build) are uncontested here because the powered-up variants simply do not speak those protocols. If your edge is multiplexed h2/h3, this entire experiment is beneath your actual workload, and ASP.NET keeps that crown unchallenged.

Conclusion

The transport raises the floor, not the ceiling. ioxide deletes the most overhead where ASP.NET was carrying the most — the database driver, the TLS stack, the connection churn — which closes nearly every gap and steals async-db outright. But it cannot touch the layer above itself, so GenHTTP's purpose-built HTTP/1.1 parsing and pipeline keep their lead, and on the most parsing-bound workload the gap actually grows. Stock 12-0 becomes powered 11-1, and that single point changed hands in the plumbing, exactly where the theory predicted it would.

Round one's "performance is contextual" was right; this is the same truth at higher resolution. The substrate sets your floor. The architecture above it sets your ceiling. Pick the workload, then pick the layer that owns it.

See the live filtered board and sort it yourself: https://www.http-arena.com/#sort=rps:-1&type=emerging,experimental,flagship

FastPySGI-WSGI: How a Libuv-Powered Python Server Hits 7.5 Million Requests Per Second

Benny — Tue, 31 Mar 2026 12:46:24 +0000

Introduction

When most developers think of Python web performance, they think "slow." Frameworks like Flask and Django are beloved for developer experience, but rarely win benchmarking contests. FastPySGI-WSGI challenges that assumption entirely.

In the HttpArena benchmark suite -- a standardized HTTP framework benchmark platform running on dedicated 64-core hardware with 18 test profiles -- FastPySGI-WSGI delivers numbers that rival Rust and Go implementations. We're talking 1.3 million RPS on baseline tests and 707K RPS while processing JSON.

Let's break down how it works, why it's fast, and what lessons we can take away.

What Is FastPySGI?

FastPySGI is an ultra-fast WSGI/ASGI server for Python built on top of libuv -- the same C-based event loop that powers Node.js. Unlike traditional Python servers (Gunicorn, Uvicorn), FastPySGI bypasses Python's asyncio entirely and handles networking at the C level.

The "WSGI" variant specifically uses the standard WSGI interface, meaning it's synchronous Python code running on an asynchronous C event loop. This is a critical architectural choice: you get libuv's raw networking speed without requiring async/await in your application code.

Repository: https://github.com/remittor/fastpysgi

The Architecture: Fewer Layers, More Speed

Minimal Dependencies

The entire dependency list fits on a sticky note:

fastpysgi==0.4
orjson==3.10.15
psycopg[binary]==3.2.4
psycopg_pool==3.2.6

Four packages. Compare that to FastAPI's 20+ transitive dependencies or Django's sprawling ecosystem. Every layer you remove is latency you eliminate.

Single-File Application

The entire benchmark implementation is a single 349-line Python file. No framework overhead, no middleware chains, no decorator magic. Just a WSGI callable:

def app(env, start_response):
    method = env["REQUEST_METHOD"]
    path = env["PATH_INFO"]

    if method not in ("GET", "POST"):
        return respond_405(start_response)

    if path == "/pipeline":
        return respond_ok(start_response)
    elif path == "/baseline11":
        return handle_baseline(env, start_response)
    # ... more routes

Routing is a simple if/elif chain. No regex compilation, no route tree traversal, no parameter extraction framework. For a benchmark, this is the right call -- every nanosecond in routing overhead gets multiplied by millions of requests.

Multi-Worker Model

FastPySGI spawns one worker per available CPU core:

WRK_COUNT = min(len(os.sched_getaffinity(0)), 128)

Each worker runs its own libuv event loop, and the OS distributes connections across them. On the benchmark's 64-core machine, that's 64 workers hammering through requests in parallel.

Performance-Critical Design Choices

1. Pre-Loaded Static Files

Static files aren't read from disk on each request. They're loaded entirely into memory at startup:

STATIC_DIR = "/data/static"
static_files = {}

for fname in os.listdir(STATIC_DIR):
    fpath = os.path.join(STATIC_DIR, fname)
    with open(fpath, "rb") as f:
        content = f.read()
    mime = get_mime_type(fname)
    static_files["/" + fname] = (content, mime)

When a request hits /static/main.css, it's a dictionary lookup and a pointer return. Zero disk I/O, zero syscalls.

2. Fast JSON with orjson

The standard library json module is pure Python. FastPySGI uses orjson, a Rust-based JSON serializer that's 3-10x faster:

import orjson

body = orjson.dumps(result)

For the JSON benchmark profile, this choice alone could account for hundreds of thousands of additional RPS.

3. Pre-Compressed Responses

For the compression test, the large JSON dataset is compressed once at startup, not on every request:

large_buf = orjson.dumps(large_dataset)
compressed = zlib.compress(large_buf, level=1)

Every request gets the cached compressed buffer. No CPU burned on repeated gzip operations.

4. Tuned Server Parameters

Socket backlog and read buffers are explicitly tuned for high throughput:

fastpysgi.server.backlog = 16 * 1024      # 16K pending connections
fastpysgi.server.read_buffer_size = 256000  # 256KB read buffer

These aren't arbitrary numbers -- they're sized for the benchmark's connection patterns (up to 16,384 concurrent connections).

5. Thread-Local SQLite with MMAP

Database tests use thread-local SQLite connections with memory-mapped I/O:

db_local = threading.local()

def get_db():
    if not hasattr(db_local, "conn"):
        conn = sqlite3.connect("/data/benchmark.db")
        conn.execute(f"PRAGMA mmap_size={268*1024*1024}")  # 268MB
        db_local.conn = conn
    return db_local.conn

MMAP lets SQLite bypass the filesystem cache and read directly from memory-mapped pages, dramatically reducing query latency.

6. PostgreSQL Connection Pooling

For async database tests, a bounded connection pool prevents connection storm overhead:

from psycopg_pool import ConnectionPool

pool = ConnectionPool(
    conninfo="host=...",
    min_size=2,
    max_size=3
)

The Benchmark Numbers

All benchmarks run on identical 64-core dedicated hardware via Docker containers, using h2load as the load generator with 64 threads. Duration: 5 seconds per run, best of 3 kept.

Baseline (Simple Response)

Connections	RPS	Avg Latency	P99 Latency	Memory
512	1,301,932	392us	2.00ms	408 MiB
4,096	1,371,836	2.99ms	33.80ms	922 MiB
16,384	1,324,561	11.93ms	60.10ms	2.5 GiB

Over 1.3 million requests per second on a simple response. Latency stays sub-millisecond at 512 connections.

JSON Processing

Connections	RPS	Avg Latency	P99 Latency	Bandwidth
4,096	707,282	4.56ms	17.20ms	5.63 GB/s
16,384	670,914	21.88ms	67.70ms	5.34 GB/s

707K RPS while parsing and serializing JSON, pushing 5.6 GB/s of bandwidth. The orjson investment pays off massively here.

Static File Serving

Connections	RPS	Avg Latency	Bandwidth
4,096	724,526	3.00ms	10.91 GB/s

Nearly 11 GB/s of throughput from pre-loaded static files. Memory-resident serving eliminates disk I/O entirely.

Async Database (PostgreSQL)

Connections	RPS	Avg Latency	P99 Latency	Memory
1,024	79,200	12.16ms	31.10ms	1.0 GiB

Even with real PostgreSQL queries over the network, it sustains 79K RPS with reasonable latency.

Mixed Workload (Realistic Traffic)

Connections	RPS	Avg Latency	P99 Latency	Bandwidth
4,096	53,005	72.87ms	658.60ms	1.70 GB/s
16,384	48,546	312.08ms	2.09s	1.56 GB/s

The mixed test blends baseline, JSON, database, upload, and compression requests -- a more realistic workload. Still delivers 53K RPS.

How Does It Compare to Other Python Frameworks?

While exact apples-to-apples comparisons depend on the specific benchmark run, the architectural differences are telling:

Aspect	FastPySGI-WSGI	FastAPI	Flask	Django
Server	Built-in (libuv)	Uvicorn (asyncio)	Gunicorn (prefork)	Gunicorn (prefork)
Event Loop	libuv (C)	uvloop (Python)	None	None
Dependencies	4	20+	8+	15+
Routing	if/elif chain	Decorator + Starlette	Decorator + Werkzeug	URL patterns + ORM
JSON	orjson (Rust)	stdlib json	stdlib json	stdlib json

The key insight: FastPySGI removes Python from the hot path of networking. The event loop, connection handling, and buffer management all happen in C (libuv). Python only runs for application logic -- routing, data processing, response building.

Lessons for Your Own Projects

You probably shouldn't rewrite your production Flask app as a raw WSGI handler. But there are transferable lessons:

1. Know Your Bottleneck

FastPySGI proves that Python application code isn't usually the bottleneck -- it's the layers between the OS and your code. If you're I/O bound, the event loop implementation matters more than your language choice.

2. Pre-compute What You Can

Pre-loading static files, pre-compressing responses, and pre-serializing datasets at startup are techniques that work in any framework. If data doesn't change per-request, don't process it per-request.

3. Choose Your Serializer Wisely

Swapping json for orjson is a one-line change in most Python projects and can yield 3-10x faster serialization. For API-heavy services, this is low-hanging fruit.

4. Tune Your Server Parameters

Most developers never touch socket backlog, buffer sizes, or connection pool bounds. The defaults are conservative. If you know your traffic patterns, tuning these can unlock significant performance.

5. Fewer Dependencies = Fewer Layers

Every middleware, every abstraction, every framework feature adds overhead. When performance matters, audit your dependency tree and question whether each layer is earning its keep.

Conclusion

FastPySGI-WSGI demonstrates that Python can compete at the highest levels of HTTP performance when you strip away the abstractions and let C do what C does best. By building on libuv, minimizing dependencies, and making smart caching decisions, it achieves numbers that most developers would associate with Rust or Go.

The HttpArena project (https://www.http-arena.com/) provides a fascinating lens into how different frameworks and languages approach the same problems. FastPySGI-WSGI stands out not because it reinvents Python, but because it strategically removes Python from the parts of the stack where it's slowest.

Whether you're building the next high-performance Python server or just optimizing your existing API, the principles behind FastPySGI's design are worth studying.

All benchmark data from HttpArena, run on dedicated 64-core hardware with standardized Docker containers. Results reflect framework performance under controlled conditions.

Check out the HttpArena repository on GitHub to explore how 78+ frameworks compare: https://github.com/MDA2AV/HttpArena

Fiber: Built on fasthttp, But 28x Slower at Pipelining — What Happened? (HttpArena Deep Dive)

Benny — Sun, 29 Mar 2026 14:22:36 +0000

Fiber is one of the most popular Go web frameworks on GitHub. 34K+ stars. Express-inspired API. And it's built on top of fasthttp — the same engine that crushes most benchmarks.

So you'd expect Fiber to be fast, right? Maybe not quite as fast as raw fasthttp, but close?

I dug into HttpArena's benchmark data to find out. The results surprised me.

The Quick Summary

Fiber is the most memory-efficient Go framework in almost every test. It's also last place among Go frameworks in most throughput tests. And in pipelining, it's not just slower than fasthttp — it's 28x slower.

But the story is more nuanced than "Fiber is slow." Let's dig in.

Baseline Performance: Last Among Go Peers

In the standard baseline test at 4,096 connections:

Framework	Requests/sec	Memory	Avg Latency
go-fasthttp	1,464,168	188 MB	2.79ms
gin	430,086	375 MB	9.45ms
echo	424,337	249 MB	9.54ms
chi	422,523	359 MB	9.62ms
fiber	397,172	144 MB	6.47ms

Fiber comes in 5th out of 5 Go frameworks for raw throughput, and #39 out of 51 frameworks overall. That's... not what you'd expect from something built on fasthttp.

But look at that memory column. 144 MB. That's the lowest of any Go framework by a wide margin — 42% less than echo, and 62% less than gin. And the latency is actually better than gin/echo/chi despite lower throughput.

The Pipelining Gap: 28x

This is where things get wild. HTTP pipelining at 4,096 connections with 16 requests per pipeline:

Framework	Requests/sec	Memory
go-fasthttp	17,808,031	196 MB
gin	1,046,933	1,003 MB
echo	1,016,858	492 MB
chi	937,099	692 MB
fiber	623,248	96 MB

go-fasthttp does 17.8 million requests per second. Fiber does 623K. That's a 28.6x gap.

Even gin manages 1M rps in pipelining — 68% more than Fiber. And again, look at gin's memory usage: over 1 GB. Fiber? 96 MB. Sipping resources.

Why Such a Massive Gap?

I read both implementations. The difference is architectural.

go-fasthttp in HttpArena uses SO_REUSEPORT — it spawns one listener per CPU core, each with its own fasthttp.Server. Incoming connections get distributed by the kernel. The routing is a raw switch statement on ctx.Path(). Zero middleware, zero overhead, zero allocations on the hot path.

// go-fasthttp: one listener per CPU core
for i := 0; i < numCPU; i++ {
    go func() {
        ln, _ := reuseport.Listen("tcp4", ":8080")
        s := &fasthttp.Server{Handler: handler}
        s.Serve(ln)
    }()
}

Fiber runs a single app.Listen(":8080") with its Express-style router, middleware chain, and compress.New() applied globally. Every request walks through middleware functions. The router does pattern matching instead of a switch statement.

// fiber: single listener with middleware chain
app := fiber.New(fiber.Config{...})
app.Use(compress.New(compress.Config{Level: compress.LevelBestSpeed}))
app.Get("/pipeline", handler)
app.Listen(":8080")

That compression middleware is applied globally — even on the /pipeline endpoint that returns a 2-byte "ok" response. Every baseline request pays the cost of checking Accept-Encoding headers for no reason.

This is the cost of ergonomics. Fiber gives you Express-style middleware, clean routing, and a nice API. That costs CPU cycles.

Where Fiber Actually Wins

Here's the twist: there are two categories where Fiber outperforms its Go peers.

Limited Connections (512)

When connections are scarce and there's keep-alive and reconnection churn:

Framework	Requests/sec	Memory
fiber	178,746	68 MB
gin	149,330	94 MB
go-fasthttp	147,847	100 MB
chi	144,893	94 MB
echo	136,646	93 MB

Fiber is #1 among Go frameworks here, beating even raw fasthttp by 21%. Under connection churn at lower concurrency, Fiber's lightweight connection handling shines. Fasthttp's multi-listener approach actually hurts here — distributing 512 connections across many listeners means some sit idle while others are busy.

Mixed Workload

The mixed workload test hits all endpoints (baseline, JSON, DB, uploads, compression, static files) simultaneously at 4,096 connections:

Framework	Requests/sec	Memory
go-fasthttp	71,173	79.9 GB
fiber	58,490	761 MB
echo	36,125	1.7 GB
chi	34,365	595 MB
gin	32,477	988 MB

Fiber is solidly #2, beating echo/chi/gin by 60-80%. And look at that memory story: go-fasthttp uses 79.9 GB of RAM to achieve 71K rps. Fiber uses 761 MB for 58.5K rps. That's 105x less memory for only 18% less throughput.

Per-megabyte efficiency, Fiber is the clear winner in mixed workloads.

Compression: The Hidden Strength

Fiber's global compression middleware — the same thing that hurts pipelining — actually pays off here:

Framework	Requests/sec	Memory
go-fasthttp	14,771	14.4 GB
fiber	9,483	5.9 GB
chi	7,602	3.4 GB
gin	7,578	2.9 GB
echo	7,536	3.1 GB

Second place among Go frameworks. Fiber uses andybalholm/brotli and klauspost/compress through its middleware — solid libraries. The 25% lead over gin/echo/chi is real.

JSON Serialization: The Weak Spot

JSON processing at 4,096 connections:

Framework	Requests/sec	Memory
go-fasthttp	314,945	696 MB
gin	174,851	433 MB
echo	164,227	371 MB
chi	158,040	390 MB
fiber	125,297	171 MB

Last place again. 28% slower than gin. The implementation is interesting — Fiber's handler allocates a new []ProcessedItem slice on every request, processes the dataset, then marshals to JSON with json.Marshal. The other net/http-based frameworks do essentially the same thing, but they have more CPU available since they're not running through Fiber's middleware stack.

Database Performance: Quietly Strong

Async database queries via PostgreSQL at 1,024 connections:

Framework	Requests/sec	Memory
go-fasthttp	30,784	359 MB
fiber	19,196	192 MB
gin	17,660	220 MB
echo	17,486	284 MB
chi	17,324	211 MB

Clear #2 among Go frameworks, and 9-11% ahead of the gin/echo/chi cluster. Both Fiber and go-fasthttp use pgxpool with NumCPU * 4 max connections. The gap between them is mostly the overhead of Fiber's framework layer, but when the bottleneck shifts to the database, that overhead matters less.

The Memory Story

Let's talk about what Fiber does really well. Across every single test, Fiber uses less memory than any other Go framework:

Baseline 4K: 144 MB (vs gin's 375 MB)
Pipelined 4K: 96 MB (vs gin's 1 GB)
JSON 4K: 171 MB (vs gin's 433 MB)
Mixed 4K: 761 MB (vs go-fasthttp's 79.9 GB)
Uploads 256: 296 MB (vs echo's 541 MB)
Limited Conn: 68 MB (vs go-fasthttp's 100 MB)

This is fasthttp's zero-allocation philosophy showing through. Even with Fiber's middleware layer on top, the underlying engine reuses buffers aggressively and avoids heap allocations. In constrained environments — containers, edge deployments, shared hosting — this matters more than raw throughput.

Uploads: Beating fasthttp

Here's a fun one. File uploads at 256 connections:

Framework	Requests/sec	Memory
echo	1,334	541 MB
chi	1,326	509 MB
gin	1,320	582 MB
fiber	1,222	296 MB
go-fasthttp	910	15.5 GB

Fiber is 4th, but go-fasthttp is last — and using 15.5 GB of RAM to process uploads at only 910 rps. Fiber handles uploads with StreamRequestBody: true and c.Request().BodyStream(), which streams the body to /dev/null efficiently. The net/http-based frameworks (gin, echo, chi) do slightly better here, but fasthttp's approach of reading the entire body into memory is catastrophic for large file uploads.

Architecture Deep Dive

Looking at Fiber's HttpArena implementation:

The good:

StreamRequestBody: true — avoids buffering entire request bodies
BodyLimit: 25 * 1024 * 1024 — explicit limits prevent OOM
Pre-computed JSON for the compression endpoint (jsonLargeResponse)
Static files loaded into memory at startup (staticFiles map)
Using pgxpool for async DB connections, modernc.org/sqlite for sync DB

The concerning:

Global compression middleware penalizes all endpoints
Single app.Listen() vs go-fasthttp's per-core listeners with SO_REUSEPORT
JSON endpoint allocates new slices per request (could pre-compute like the compression endpoint)
Manual json.Marshal instead of writing directly to the response writer

What could be improved:

Apply compression only to the /compression endpoint
Use Fiber's Prefork mode (built-in!) to match go-fasthttp's multi-listener approach
Pre-compute the JSON response like the compression response
Use sonic or go-json instead of encoding/json

Fiber actually has a Prefork: true config option that does SO_REUSEPORT under the hood. The benchmark implementation doesn't use it. That alone could close a significant chunk of the gap with raw fasthttp.

The Verdict: Who Should Use Fiber?

Fiber is perfect for you if:

You want Express-like ergonomics in Go
Memory efficiency matters (containers, K8s with resource limits)
You're building APIs that do real work (DB queries, mixed workloads) rather than pure I/O benchmarks
You want a single framework that handles compression, routing, and middleware cleanly

Consider raw fasthttp if:

You need maximum pipelining throughput (17.8M vs 623K rps is hard to ignore)
You're building a proxy or gateway where every microsecond counts
You don't mind manual routing and zero framework niceties

Consider gin/echo if:

You want net/http compatibility and the broader Go ecosystem
Upload performance matters more than memory efficiency
You're more comfortable with net/http patterns

Fiber occupies an interesting niche: it's the most memory-efficient Go framework while being competitive in real-world mixed workloads. It's not the throughput king, and it probably shouldn't be — that's not what frameworks are for. Frameworks trade raw speed for developer experience. Fiber makes that trade while keeping memory usage remarkably low.

The 28x pipelining gap is eye-catching, but pipelining is a synthetic benchmark that few production workloads actually use. In mixed workloads — which better simulate real APIs — Fiber beats gin by 80% while using less RAM.

That's a framework doing its job well.

All data from HttpArena (GitHub). Test environment: 64 threads, various connection counts. Check the site for the full leaderboards and methodology.

GenHTTP vs ASP.NET Minimal APIs: The C# Benchmark Showdown Nobody Expected

Benny — Fri, 27 Mar 2026 18:46:34 +0000

Two C# frameworks walk into a benchmark. One is the industry standard backed by Microsoft. The other is a scrappy indie framework most .NET developers have never heard of. You'd expect a blowout — and you'd be right. You just might be wrong about who gets blown out where.

HttpArena recently added both GenHTTP and ASP.NET Minimal APIs to their benchmark suite, and the results tell a story that's way more interesting than "Microsoft wins." Let's dig in.

The Contenders

ASP.NET Minimal APIs needs no introduction. It's Microsoft's lightweight API framework running on Kestrel, the battle-tested HTTP server that powers half the internet's .NET workloads. Minimal APIs strip away controllers and give you app.MapGet("/route", handler) — clean, fast, no ceremony.

GenHTTP is a modular, embeddable C# web server that runs its own HTTP engine. It layers on abstractions — layouts, services, concerns, resource methods — giving you a higher-level programming model. Think "convention over configuration" but for HTTP servers.

Both target .NET 10. Both speak C#. Same runtime, same GC, same JIT. So any performance delta comes down to the framework itself.

Let the games begin.

Round 1: Baseline — The 10x Gap Nobody Expected

Concurrency	GenHTTP	ASP.NET Minimal
512	49,507 req/s	495,831 req/s
4,096	48,551 req/s	459,783 req/s
16,384	22,169 req/s	353,394 req/s

Yeah. Read that again. ASP.NET is doing 10x the requests per second on a simple GET-with-query-params endpoint.

This is where GenHTTP pays the tax for its abstraction layer. Look at the code — ASP.NET's handler is a direct IResult return. GenHTTP routes through a LayoutBuilder, resolves a ServiceResource, matches a ResourceMethod attribute, and wraps things in a Concern pipeline. Every request walks through that entire abstraction stack.

Is it elegant? Sure. Is it 350,000 requests per second worth of overhead? Also yes.

To be fair, ~50K req/s is still plenty fast for most real applications. But if your use case is "handle a simple request as fast as physically possible," ASP.NET Minimal isn't even trying and it's lapping GenHTTP.

Round 2: Pipelined — Wait, They're Tied?

Concurrency	GenHTTP	ASP.NET Minimal
512	12,349,888 req/s	12,790,124 req/s
4,096	13,148,857 req/s	13,748,303 req/s
16,384	10,999,584 req/s	12,741,100 req/s

Thirteen. Million. Requests per second. From both of them.

The pipelined test is the great equalizer. When you shove requests down a persistent connection as fast as TCP allows, you're basically benchmarking the I/O layer, not the framework. And GenHTTP, despite its abstraction overhead, keeps pace with ASP.NET within ~15%.

This tells us something important: GenHTTP has solid I/O fundamentals. The bottleneck in baseline isn't bad networking code — it's the request processing pipeline on top. Good bones, heavy coat.

Round 3: Upload — The 23GB Elephant in the Room

Concurrency	GenHTTP (req/s / RAM)	ASP.NET Minimal (req/s / RAM)
64	609 / 429MB	184 / 23.2GB
256	630 / 591MB	187 / 24.1GB
512	622 / 862MB	160 / 22.7GB

I need you to look at this table and understand what you're seeing. ASP.NET Minimal is using twenty-three gigabytes of RAM to handle file uploads at 160 requests per second. GenHTTP handles nearly 4x the throughput while sipping under a gig.

What is happening here? Let's look at the code.

ASP.NET's upload handler:

public static async Task<IResult> Upload(HttpRequest req)
{
    using var ms = new MemoryStream();
    await req.Body.CopyToAsync(ms);
    return Results.Text(ms.Length.ToString());
}

It copies the entire request body into a MemoryStream. Every upload, fully buffered in RAM. With concurrent uploads, you're stacking multi-megabyte buffers on top of each other. The GC is crying.

GenHTTP's approach:

public ValueTask<long> Compute(Stream input)
{
    if (input.CanSeek)
        return ValueTask.FromResult(input.Length);
    return ComputeManually(input);
}

GenHTTP's engine provides a seekable stream, so it just reads the length directly — zero buffering, zero copying. When running on Kestrel, it falls back to reading chunks through a small buffer. Either way, it's dramatically more memory-efficient.

Now, is this an ASP.NET framework problem? Not exactly — it's a handler implementation choice. You could write a streaming handler in ASP.NET. But the fact that the "obvious" way to handle uploads in Minimal APIs leads to 23GB of RAM usage is... not great. GenHTTP's abstraction actually protects you from this footgun by giving you a smarter stream interface out of the box.

GenHTTP wins this round by a country mile, and it's not even close.

Round 4: JSON Serialization — The Underdog Bites

Concurrency	GenHTTP	ASP.NET Minimal
4,096	582,510 req/s	515,135 req/s
16,384	561,453 req/s	440,034 req/s

Wait. The framework that was 10x slower on baseline is now beating ASP.NET on JSON serialization? By 13% at 4K connections and 28% at 16K?

The implementations are virtually identical — both deserialize a dataset, transform it, and serialize the response. Same System.Text.Json under the hood. The difference likely comes down to how each framework handles response writing and buffering for larger payloads. GenHTTP's response pipeline may have less overhead when serializing structured data compared to ASP.NET's Results.Json() wrapper.

This is where the "faster framework" narrative breaks down. Performance isn't one number. It's a profile.

Round 5: Compression — GenHTTP Keeps Winning

Concurrency	GenHTTP	ASP.NET Minimal
4,096	8,565 req/s	7,183 req/s
16,384	7,572 req/s	5,949 req/s

GenHTTP leads by 19-27% on compressed responses. Both use gzip at the fastest compression level, so the delta is in how they pipe compressed bytes to the wire. Another quiet win for the underdog.

Round 6: Noisy Neighbors — Dead Heat

Concurrency	GenHTTP	ASP.NET Minimal
512	1,080,435 req/s	1,110,339 req/s
4,096	1,218,298 req/s	1,210,574 req/s
16,384	1,017,992 req/s	1,007,141 req/s

Under CPU contention with background noise, they trade punches within 3%. At a million requests per second each, both frameworks handle stress gracefully. No complaints from either corner.

Round 7: Mixed Workload & Database — GenHTTP's Sweet Spot

Test	GenHTTP	ASP.NET Minimal
Mixed 4,096c	7,962 req/s	6,650 req/s
Mixed 16,384c	9,269 req/s	6,240 req/s
Async DB 512c	138,179 req/s	122,812 req/s
Async DB 1,024c	161,488 req/s	147,924 req/s

GenHTTP pulls ahead by 20-49% on mixed workloads and 9-12% on async database queries. For realistic API workloads that juggle multiple endpoint types and database access — you know, actual applications — GenHTTP holds its own and then some.

Round 8: HTTP/2 and HTTP/3 — GenHTTP Sits This One Out

Test	ASP.NET Minimal
H2 Baseline 256c	254,371 req/s
H2 Baseline 1,024c	203,325 req/s
H3 Baseline 256c	41,922 req/s

ASP.NET supports HTTP/2 and HTTP/3 out of the box via Kestrel's QUIC integration. GenHTTP doesn't participate in these tests — it's HTTP/1.1 only for now. If you need modern protocol support, that's a clear win for ASP.NET.

The Verdict: It's Complicated (Obviously)

If you looked at the baseline numbers alone, you'd conclude ASP.NET Minimal is in a different league. And for raw request routing throughput, it is.

But real applications don't just route requests. They serialize JSON, compress responses, handle file uploads, query databases, and juggle mixed workloads. And in those scenarios, GenHTTP frequently wins — sometimes by significant margins.

Choose ASP.NET Minimal if:

Raw request throughput is critical
You need HTTP/2 or HTTP/3 support
You want the massive .NET ecosystem and tooling
You're building at scale and need Microsoft's backing

Choose GenHTTP if:

You want a lightweight, embeddable server
Memory efficiency matters (looking at you, upload numbers)
Your workload is JSON-heavy, database-driven, or mixed
You appreciate a higher-level API that handles footguns for you

The most interesting takeaway? A small indie framework with a handful of contributors is genuinely competitive with — and sometimes faster than — one of the most optimized web stacks on the planet. That's impressive engineering from the GenHTTP team.

Performance isn't a single number. It's a conversation. And this one just got a lot more interesting.

All benchmarks from HttpArena — an open-source, reproducible HTTP benchmark suite. Full source code and methodology available on GitHub.

Bun HTTP Server: #1 in Mixed Workloads, #41 in Pipelining — The Full Picture (HttpArena Deep Dive)

Benny — Fri, 27 Mar 2026 18:34:40 +0000

Every few months, someone posts "Bun is fast" on Twitter and the replies turn into a warzone. Node fans say it doesn't matter. Deno fans say their runtime is better. Rust folks just post flamegraphs.

So let's look at actual numbers. I ran Bun through HttpArena, an open-source benchmark suite that tests HTTP frameworks across a bunch of real-world-ish scenarios — not just "hello world" in a loop. We're talking baseline throughput, pipelining, JSON serialization, compression, mixed workloads, uploads, noisy neighbor tolerance, and more.

The results are... honestly fascinating. Bun is a study in contrasts.

What is Bun?

If you've been living under a rock: Bun is a JavaScript/TypeScript runtime built from scratch using JavaScriptCore (Safari's engine) instead of V8. It's written in Zig and aims to be a drop-in replacement for Node.js — but faster at everything.

Its built-in HTTP server () skips the Node.js module entirely and goes straight to optimized native code. In the HttpArena benchmark, the implementation uses and spawns one Bun process per CPU core — simple multiprocess scaling with no clustering library needed.

The Headline Numbers

Let me just lay out where Bun landed across the key tests (at 4,096 connections unless noted):

Test	Rank	RPS	Latency (avg)	Memory
Baseline	#13/51	1,557,305	2.62ms	2.2 GiB
Pipelined	#41/51	491,345	106.30ms	2.0 GiB
JSON	#9/50	708,960	4.58ms	2.9 GiB
Compression	#2/49	15,804	251.28ms	3.3 GiB
Mixed workload	#1/47	52,274	72.41ms	6.1 GiB
Noisy neighbor	#11/47	1,939,652	1.25ms	2.3 GiB
Limited conn	#6/51	1,388,768	2.85ms	2.4 GiB
Upload (256 conn)	#42/48	264	866.85ms	10.3 GiB
H2 baseline (256)	#18/21	378,032	72.87ms	2.2 GiB

Read that again. #1 in mixed workloads. #2 in compression. But #41 in pipelining and #42 in uploads. That's wild range for a single runtime.

Where Bun Dominates

Mixed Workload: The Overall Champion

The mixed workload test is the closest thing to a "real app" benchmark — it combines baseline requests, JSON serialization, compression, static file serving, and database queries all in one stream. And Bun sits at #1:

Look at that. Bun beats go-fasthttp, which is usually a throughput monster. And it does it with 6.1 GiB of RAM vs go-fasthttp's absurd 80.7 GiB. That's over 13x more memory-efficient.

Three of the top 5 run on the Bun runtime (bun, Elysia, Hono). The Bun ecosystem basically owns this test.

Why? My theory: Bun's built-in gzip (), native JSON handling, and pre-loaded static files all contribute. When you mix these operations together, Bun's "everything is native" approach pays off vs frameworks that rely on separate npm packages for each concern.

Compression: Natively Fast Gzip

Bun's with level 1 compression keeps it competitive with Rust's salvo and ahead of nearly everything else. Deno edges it out here (probably because Deno's compression pipeline is also quite optimized), but check the memory: Bun uses 3.3 GiB vs Deno's 12.8 GiB. Nearly 4x more efficient.

The implementation is elegant too — pre-compute the JSON buffer once, compress per-request:

JSON Serialization: Top 10 Overall

At #9/50 with 708,960 rps, Bun is the fastest JS/TS runtime for JSON workloads (tied with Elysia which runs on Bun). For context:

Bun is ~20% faster than Node and ~23% faster than Fastify at JSON. Not the 10x improvement some marketing suggests, but a solid, consistent edge.

Limited Connections & Noisy Neighbors

Bun handles constrained scenarios well. At limited connections (#6/51, 1,388,768 rps), it beats everything in the JS/TS world by a comfortable margin — Elysia is next at #12. Under noisy neighbor conditions (#11/47, ~1.94M rps), Bun stays stable and leads the JS/TS pack again.

Where Bun Struggles

Pipelining: The Big Miss

This is the elephant in the room. #41 out of 51 frameworks in pipelining at 4,096 connections, with only 491,345 rps and 106ms average latency.

Node.js is nearly 5x faster than Bun at pipelining. Even Express — Express! — only drops to #42. The entire Bun ecosystem (bun, Elysia, Hono) clusters at the bottom.

What's happening? HTTP pipelining sends multiple requests over a single TCP connection without waiting for responses. Bun's likely processes requests one-at-a-time per connection rather than batching pipelined requests. Node's module has years of pipelining optimization baked in.

Is this a dealbreaker? Honestly, not for most real apps. HTTP pipelining is rarely used in production (browsers don't even support it over HTTP/1.1 anymore). But if you're building an internal service-to-service API where clients pipeline aggressively, this matters.

Uploads: Surprisingly Weak

At 256 concurrent connections uploading data, Bun lands at #42/48 with only 264 rps and 867ms latency, using 10.3 GiB of memory:

Node is 3.5x faster at uploads. Express — the framework everyone loves to call slow — handles uploads 3.6x faster than Bun. This suggests Bun's request body reading () has significant overhead for large payloads, or there's a memory management issue when buffering upload data.

HTTP/2: Not There Yet

Bun's H2 support exists but isn't competitive:

Node.js beats Bun by almost 4x in HTTP/2. Even at higher connection counts (1,024), Bun only reaches 558,342 rps. If you're doing H2-heavy work, Node is the better runtime right now.

The "Bun Ecosystem" Effect

One of the coolest things the data reveals: frameworks running on Bun tend to perform very similarly to bare Bun. At 4,096 connections:

Framework	Baseline RPS	JSON RPS	Mixed RPS
bun (bare)	1,557,305	708,960	52,274
Elysia	1,458,341	722,557	51,251
Hono (Bun)	1,242,917	662,019	49,378

The abstraction cost of a framework on top of Bun is remarkably small — maybe 5-20%. Compare that to Node.js where Express is 76% slower than bare Node in baseline. Bun's API is apparently so close to what frameworks need that there's minimal overhead in wrapping it.

Architecture: What Makes It Tick

Looking at the HttpArena implementation, a few things stand out:

Multi-process via reusePort: The entrypoint script spawns one Bun process per CPU core, each calling . The kernel load-balances connections across processes. Simple, effective, no IPC overhead.

Everything pre-loaded: Static files, datasets, and the SQLite database are all loaded at startup. The endpoint re-processes the dataset per request (as the benchmark requires), but the raw data is already in memory.

Bun.gzipSync() over zlib: The compression endpoint uses Bun's native gzip instead of Node's zlib bindings. This is why compression performance is stellar — it's going through Zig's optimized zlib implementation.

Minimal dependencies: Only for PostgreSQL. Everything else — HTTP serving, SQLite, gzip, file reading — uses Bun built-ins. Fewer layers, fewer places for overhead to hide.

Who Should Use Bun?

Based on these numbers, Bun is a strong choice if:

Your app does a mix of things (JSON, compression, static files, DB queries) — Bun literally wins this category
You want good JSON throughput without leaving the JS/TS ecosystem
You care about memory efficiency — Bun consistently uses less RAM than Node/Deno at similar throughput
You want minimal deps — built-in SQLite, gzip, and HTTP server reduce your node_modules

Think twice if:

You need HTTP/2 performance — Node is 4x faster today
You handle lots of uploads — Node/Express handle it much better
You're building pipelining-heavy internal services — unlikely, but if so, Node or Deno serve this better

The Bottom Line

Bun isn't uniformly faster than everything — no runtime is. But it has a genuinely impressive performance shape: it excels at the things most web apps actually do (mixed workloads, JSON, compression) while being memory-efficient. Its weaknesses (pipelining, uploads, H2) are in areas that matter less for typical web services.

The real story isn't "Bun fast, Node slow." It's that Bun makes different tradeoffs. JavaScriptCore over V8. Native built-ins over npm packages. Simple multi-process over clustering. And for a lot of real-world use cases, those tradeoffs pay off beautifully.

All data from HttpArena (GitHub). Tests run on identical hardware with standardized configurations. Check the repo for methodology and raw data.

Previous deep dives: Actix-web | go-fasthttp | Drogon

Actix-web: #1 in 15 Out of 22 Tests — Dissecting the Benchmark King (HttpArena Deep Dive)

Benny — Wed, 25 Mar 2026 14:22:15 +0000

There's a framework that keeps showing up at the top of benchmark charts, and it's not written in C.

Actix-web, Rust's battle-tested async web framework, just put up numbers in HttpArena that are genuinely hard to argue with. We're talking #1 overall in 15 out of the 22 test profiles it competed in, across 47 frameworks. Not #1 among Rust frameworks. #1 overall.

Let's dig into what's going on.

What is Actix-web?

Actix-web is a Rust web framework built on top of the Tokio async runtime. It's been around since ~2017, making it one of the more mature options in the Rust ecosystem. Version 4 (the one tested here) dropped the actor model dependency that gave it its name — now it's just a really fast, really ergonomic async web framework.

It uses rustls for TLS (no OpenSSL dependency), compiles with thin LTO and -O3, and targets native CPU instructions. The HttpArena implementation runs one worker per CPU core with a backlog of 4096.

The Headline Numbers

Let's start with where actix absolutely dominates.

Baseline (Plain HTTP/1.1)

At 4,096 connections, actix hits 2.61M requests/sec with 1.57ms average latency and only 158MB of memory. For context, that puts it:

#6 overall out of 47 frameworks (behind ringzero, h2o, nginx, blitz, and hyper)
#2 among Rust frameworks (hyper edges it out at 2.76M rps)
Ahead of bun (1.56M), drogon (1.69M), and every Go framework

But here's the thing — at 512 connections, actix climbs to 2.49M rps with a tiny 205μs average latency and 93MB RAM. The consistency across connection counts is impressive.

Pipelined Requests — Where Actix Gets Scary

This is where things get wild. With HTTP pipelining (16 requests per connection), actix hits:

Connections	Requests/sec	Latency	Memory
512	20.4M	400μs	123MB
4,096	23.0M	2.84ms	220MB
16,384	21.4M	10.7ms	689MB

That's 23 million requests per second at peak. #3 overall, behind only ringzero (46.8M, written in C) and blitz (39.5M, written in Zig). Actix beats hyper (16.3M), go-fasthttp (17.8M), and the entire JVM ecosystem in this test.

For a framework that gives you routing, middleware, and a full request/response abstraction — doing 23M rps in pipelining is absurd.

JSON Serialization — The Practical Test

The JSON test serializes a dataset, computes derived fields, and sends it back. This is closer to what a real API does.

At 4,096 connections: 1.13M rps, pushing 8.92 GB/s of bandwidth. That's #3 overall, right behind hyper (1.17M) and nginx (1.14M). Actix is neck-and-neck with its own underlying HTTP library here.

Interesting detail: actix uses serde_json (Rust's standard JSON library) — no exotic SIMD JSON tricks. And it still hangs with nginx, which uses a highly optimized C JSON implementation. Rust's zero-cost abstractions are doing real work here.

Mixed Workload — The Real World Simulation

The mixed test combines baseline requests, JSON serialization, database queries, file uploads, and compression — all hitting the server simultaneously. This is the closest thing to a production workload in HttpArena.

At 4,096 connections:

#2 overall: 75,948 rps (52ms avg latency, 2.1GB RAM)
Behind only go-fasthttp at 87,964 rps (but fasthttp uses 10.2GB RAM — 5x more memory)
Ahead of salvo (73.5K), bun (70.7K), and ultimate-express (63K)

At 16,384 connections, actix takes #1: 157,549 rps. Go-fasthttp can't keep up at this connection count.

The memory efficiency here is the real story. Actix handles a brutal mixed workload with 2.1GB while go-fasthttp needs 10.2GB and bun needs 5.4GB.

HTTP/2 Baseline

Actix uses rustls for HTTP/2. At 256 connections: 3.05M rps, ranking #8 out of 21 HTTP/2-capable frameworks. h2o (C) leads at 14.1M, and hyper takes #2 at 8.15M.

This is one of actix's weaker areas relatively speaking — the rustls + actix HTTP/2 implementation doesn't match h2o's purpose-built HTTP/2 stack. But 3M rps for HTTP/2 is still excellent in absolute terms.

Noisy Neighbor — Handling Bad Traffic

The noisy test throws malformed requests, connection resets, and garbage traffic at the server alongside legitimate requests. It's a resilience test.

Actix handles it beautifully: 2.43M rps at 4,096 connections (#5 overall), correctly returning 4xx for bad requests while maintaining throughput. Only the C trio (ringzero, h2o, nginx) and hyper beat it.

Zero 5xx errors. Zero crashes. That's Rust's memory safety paying dividends — no segfaults from malformed input, no buffer overflows from garbage data.

Limited Connections — Efficiency Under Constraints

With connection reuse disabled (every request opens a new TCP connection), actix hits 1.07M rps at 512 connections, ranking #8 overall. The connection setup overhead is real, but actix handles it gracefully with only 128MB of memory.

Where Actix Struggles

No framework is perfect, and actix has clear weak spots.

Compression — Room to Grow

At 4,096 connections: 14,220 rps, #8 overall. Not bad, but blitz (89K rps) is 6x faster, and even deno (17.7K) and bun (15.8K) outpace it.

The culprit is likely the compression middleware implementation. Actix uses flate2 through its compress-gzip feature — solid but not cutting-edge. The 5.7GB memory usage at 4K connections also suggests the compression pipeline could be more efficient.

Uploads — The Weak Spot

File uploads reveal actix's biggest weakness. At 256 connections: 616 rps, #15 overall. At 512 connections: 559 rps, #16. Spring JVM leads at 1,265 rps — more than double.

The upload handler in the HttpArena implementation is simple (web::Bytes → count length → respond), so this isn't a code issue. Actix's body parsing pipeline likely has overhead for large payloads that frameworks like Spring and nginx handle more efficiently.

H2 Static Files at Scale

At 1,024 HTTP/2 connections serving static files: 946K rps, #6. Nginx (1.80M) and hyper (1.66M) are significantly faster. At lower connection counts actix does better (#2 at 64 connections with 1.35M rps), but it doesn't scale as well under HTTP/2 pressure.

The Rust Showdown

How does actix stack up against its Rust siblings?

Test	hyper	actix	salvo	rocket
Baseline 4K	2.76M	2.61M	1.26M	86K
Pipelined 4K	16.3M	23.0M	3.3M	176K
JSON 4K	1.17M	1.13M	781K	44K
Mixed 4K	—	75.9K	73.5K	34.7K
Compression 4K	—	14.2K	15.3K	10.1K

The pattern is clear:

hyper wins raw throughput (it's the HTTP library actix is compared against, not built on — actix has its own HTTP implementation)
actix wins pipelining and mixed workloads by a huge margin
salvo is competitive in practical tests and wins compression
rocket is... in a different league (trading performance for developer ergonomics)

Actix vs hyper is the most interesting comparison. Hyper is a lower-level HTTP library — less abstraction, less overhead. The fact that actix, with its routing, middleware stack, and request extraction pipeline, comes within 5-10% of hyper in most tests is remarkable. And in pipelining, actix actually crushes hyper by 41%.

Reading the Implementation

Looking at the actual HttpArena implementation, a few things stand out:

Smart caching: The JSON large dataset is pre-serialized at startup (build_json_cache) and served as raw bytes for the compression test. No re-serialization per request.

Per-worker database connections: Each actix worker gets its own SQLite connection with PRAGMA mmap_size=268435456 (256MB memory-mapped I/O). No connection pooling overhead, no cross-thread synchronization on DB access.

Static header values: The SERVER header is a static HeaderValue — allocated once, cloned cheaply. Small thing, but at 23M rps, small things matter.

Compile-time optimization: codegen-units = 1 + thin LTO + target-cpu=native + panic=abort. This squeezes every last drop out of the compiler. The Dockerfile even uses RUSTFLAGS="-C target-cpu=native" for native SIMD instructions.

Middleware approach: Compression uses actix_web::middleware::Compress::default() — it's applied globally, so the compression endpoint benefits from the framework's built-in gzip handling rather than manual compression.

Who Should Use Actix-web?

If you're building a Rust web service and care about performance, actix-web is the obvious choice. The numbers speak for themselves, but more importantly:

It's mature: Version 4 has been stable for years. The ecosystem (middleware, extractors, websockets) is deep.
It's ergonomic: Compared to hyper (which requires you to handle everything manually), actix gives you routing, middleware, typed extractors, and a clean API.
Memory efficiency: Consistently low memory usage across all tests. When go-fasthttp needs 10GB for a mixed workload, actix does it in 2GB.
Battle-tested: Powers production systems at scale. Microsoft, for example, uses actix-web internally.

The main trade-off is Rust itself — compile times, borrow checker learning curve, and a smaller hiring pool. But if you've already committed to Rust, actix-web should be your default choice for web APIs.

The Bottom Line

Actix-web is the most complete high-performance web framework in the HttpArena benchmark. It doesn't always take #1 (the C frameworks and hyper beat it in raw throughput), but no other framework combines:

Top-6 baseline performance
Top-3 pipelined throughput (23M rps!)
Top-3 JSON serialization
Top-2 mixed workload handling
Excellent memory efficiency
Full framework features (routing, middleware, extractors)

The only frameworks that consistently outperform it are either bare HTTP libraries (hyper, h2o) or purpose-built C/Zig systems (ringzero, blitz, nginx) that sacrifice developer ergonomics for raw speed.

For a framework that gives you #[get("/api/items")] syntax and middleware stacks, doing 23 million pipelined requests per second is not normal. Actix makes it look easy.

All benchmarks from HttpArena — an open-source HTTP framework benchmark suite. Full results, methodology, and source code on GitHub.

Got questions about the data or want to see another framework deep dive? Drop a comment!

go-fasthttp: The Go Framework That Dominates Mixed Workloads (HttpArena Deep Dive)

Benny — Fri, 20 Mar 2026 17:18:40 +0000

If you've spent any time looking at Go HTTP performance, you've probably heard of fasthttp. It's been around for years, and its pitch is simple: it's way faster than net/http. But how does it actually stack up against everything else — not just Go frameworks, but Rust, C, Zig, and the whole zoo?

I ran go-fasthttp through HttpArena, an open benchmark suite that tests frameworks across a bunch of realistic scenarios. Here's what I found.

What is fasthttp?

fasthttp is a high-performance HTTP library for Go built by Aliaksandr Valialkin. Unlike Go's standard net/http, it avoids allocations wherever possible. Instead of creating a new request/response object per request, it pools and reuses them. It's basically Go's answer to "what if we cared about garbage collection pressure?"

The HttpArena implementation uses reuseport to spin up one listener per CPU core, which is a neat trick — each goroutine gets its own socket listener, reducing lock contention.

The Numbers

Let's get into it.

Baseline (Plain Text Responses)

Connections	RPS	Avg Latency	Rank
512	1,337,651	382us	#14/30
4,096	1,478,446	2.76ms	#13/30
16,384	1,310,081	10.63ms	#13/30

Middle of the pack. Honestly, for baseline text responses, 1.3-1.5M RPS is solid — but ringzero (C) is hitting 3.4M at the top. The Rust and C frameworks dominate here. For plain "return a string" work, fasthttp lands in the top half but doesn't lead.

Compared to other Go frameworks:

go-fasthttp: 1,478,446 rps
caddy: 582,248 rps
echo: 456,383 rps
gin: 446,160 rps

It's about 3x faster than standard Go HTTP frameworks. That's the fasthttp promise delivering.

Pipelined Requests — Where fasthttp Shines

This is where things get interesting.

Connections	RPS	Rank
512	16,786,953	#4/30
4,096	17,808,031	#4/30
16,384	16,403,972	#4/30

17.8 million requests per second. That's not a typo. fasthttp handles HTTP pipelining exceptionally well. It sits right behind ringzero (C, 46.8M), blitz (Zig, 39.5M), and actix (Rust, 23M).

Top 5 at 4,096 connections:

ringzero (C) — 46,803,504 rps
blitz (Zig) — 39,534,054 rps
actix (Rust) — 23,001,200 rps
go-fasthttp (Go) — 17,808,031 rps
hyper (Rust) — 16,273,142 rps

Being #4 overall and beating hyper (Rust!) at pipelining is impressive. The zero-allocation design really pays off when you're hammering the same connection with sequential requests. Meanwhile, gin and echo are down at ~1M rps — a 17x gap.

Mixed Workload — The Crown Jewel

Here's the headline result:

Connections	RPS	Rank
4,096	87,964	#1/27
16,384	164,178	#1/27

go-fasthttp wins the mixed workload test. Both connection levels. Beating actix, beating nginx, beating everything.

Top 5 at 16,384 connections:

go-fasthttp (Go) — 164,178 rps
actix (Rust) — 157,549 rps
salvo (Rust) — 67,520 rps
bun (TS) — 64,614 rps
node (JS) — 55,988 rps

The mixed workload combines baseline requests, JSON processing, compression, uploads, and database queries into a single test. It's the closest thing to a "real-world" scenario in the suite. And fasthttp tops it — barely edging out actix at 16K connections, and more convincingly at 4K (87,964 vs 75,948 rps).

This tells us something important: fasthttp's architecture handles diverse workloads better than almost anything else. It's not just fast at one thing.

JSON Processing

Connections	RPS	Rank
4,096	311,030	#18/29
16,384	265,535	#22/29
32,768	149,159	#10/14

This is the weak spot. Bottom half in JSON serialization. Go's encoding/json is famously not great, and it shows here. nginx leads at 1.18M rps (it's serving pre-computed static JSON), and actix with serde is at ~1M.

For the Go comparison: fasthttp still beats caddy (308K), gin (169K), and echo (158K) — but only by a modest margin on JSON. The serialization bottleneck is the equalizer.

Compression

Connections	RPS	Rank
4,096	14,771	#5/28
16,384	13,736	#5/28

Solid #5 finish. The implementation uses compress/flate with BestSpeed level — trading compression ratio for throughput. blitz (Zig) leads at 89K rps, then actix, salvo, and h2o. But fasthttp holds its own at nearly 15K rps, which is almost 2x the caddy result (8,147 rps).

Upload Handling

Connections	RPS	Rank
256	910	#6/29
512	842	mid-table

Mid-table for uploads. spring-jvm actually wins this category at 1,294 rps — the JVM's buffered I/O handling pays off for large body ingestion.

Limited Connections — The Achilles' Heel

Connections	RPS	Rank
512	147,847	#26/30
4,096	636,185	#13/30

Oof. At 512 connections, fasthttp drops to #26/30. h2o leads at 1.6M rps, and most frameworks do significantly better. This is surprising — you'd expect a fast framework to shine with fewer connections too.

The likely culprit? fasthttp's architecture is optimized for high concurrency. The reuseport multi-listener design and goroutine-per-connection model have overhead that doesn't amortize well with limited connections. At 4,096 connections it recovers to #13, which matches baseline performance. This framework wants a lot of concurrent work to hit its stride.

Reading the Source Code

The HttpArena implementation reveals some interesting architectural choices:

SO_REUSEPORT with per-CPU listeners:

for i := 0; i < numCPU; i++ {
    go func() {
        ln, _ := reuseport.Listen("tcp4", ":8080")
        s := &fasthttp.Server{Handler: handler}
        s.Serve(ln)
    }()
}

Each CPU core gets its own listener socket. The kernel distributes incoming connections across them. This avoids the thundering herd problem and reduces lock contention — a big deal at high concurrency.

Manual path-based routing: No router library. Just a switch on ctx.Path(). Zero overhead from regex matching or tree traversal. For a benchmark this makes sense, but it also shows the performance ceiling when you strip away routing abstractions.

Pre-computed large JSON responses: The compression endpoint pre-marshals the JSON during startup and serves the cached bytes. Smart — you avoid re-serializing on every request.

Memory usage: 188 MiB for baseline, 716 MiB for JSON, but 10.2 GiB for mixed workloads. The garbage collector is working hard under mixed load. That's the Go trade-off — the runtime manages memory for you, but at scale it adds up.

Who Should Use fasthttp?

Great fit if:

You're building Go services that need raw throughput
Your workload involves lots of concurrent connections
You need pipelining support (API gateways, proxies)
You want to stay in the Go ecosystem but need better performance than net/http
Mixed workloads (the typical real-world pattern)

Maybe look elsewhere if:

You have few connections and need maximum per-connection throughput
JSON serialization is your bottleneck (consider Rust frameworks or use a faster JSON lib like sonic/jsoniter)
You need HTTP/2 or HTTP/3 (fasthttp is HTTP/1.1 only)
You want a batteries-included framework with routing, middleware, etc. (that's echo/gin territory)

The Verdict

go-fasthttp is a fascinating framework. It's not the fastest at any single micro-benchmark (C and Zig own that space), but it's the most versatile performer in the entire HttpArena suite. Winning the mixed workload test — beating actix, nginx, and h2o — is a big deal because that's the test that most resembles real production traffic.

The pipelining numbers (17.8M rps, #4 overall) show its architecture is fundamentally sound. The limited-connection weakness is real but only matters in specific scenarios. And being 3-17x faster than standard Go frameworks (gin, echo, caddy) makes it the clear choice for performance-sensitive Go work.

Just watch out for that encoding/json bottleneck. Pair fasthttp with a faster serializer and you might climb even higher.

All data from HttpArena (GitHub). Check the full results if you want to compare frameworks yourself.

What framework should I deep-dive next? Drop a comment!

Drogon: The C++ Framework That Tops HTTP/2 Benchmarks (And Where It Struggles)

Benny — Tue, 17 Mar 2026 14:26:37 +0000

I've been digging through HttpArena benchmark data lately — it's an open benchmark suite that tests HTTP frameworks across a bunch of realistic scenarios — and Drogon caught my eye. It's quietly one of the most interesting performers in the entire dataset.

Let me walk you through what I found.

What is Drogon?

Drogon is a C++ web framework built on top of its own async networking library (Trantor). It's been around since 2018, and it's designed for high-performance HTTP services. Think of it as what you'd reach for if you need raw C++ speed but don't want to hand-roll everything from scratch.

The HttpArena implementation uses Drogon v1.9.10, compiled with -O3 -flto (link-time optimization), running on Ubuntu 24.04. C++17, CMake build, nothing exotic.

The Numbers

Baseline (Plain Text Response)

In the standard baseline test, Drogon lands #7 out of 30 frameworks across all connection levels:

Connections	RPS	Avg Latency	P99 Latency	Memory
512	1,928,561	264μs	1.61ms	81.7 MiB
4,096	2,249,513	1.82ms	9.67ms	129.4 MiB
16,384	2,087,751	7.57ms	42.80ms	314.6 MiB

For context, the top 10 at 4,096 connections looks like this:

ringzero (C) — 3,452,370
h2o (C) — 3,162,875
blitz (Zig) — 3,071,375
nginx (C) — 3,028,812
hyper (Rust) — 2,942,685
actix (Rust) — 2,711,945
drogon (C++) — 2,249,513
kemal (Crystal) — 2,154,014
quarkus-jvm (Java) — 2,102,344
bun (TS) — 1,956,298

Solid company. Drogon is the only C++ framework in the benchmark and it's hanging with the Rust and C heavyweights.

HTTP/2 — Where Drogon Shines ✨

Here's where things get really interesting. In the HTTP/2 baseline tests:

Connections	RPS	Rank	Memory
64	10,631,440	#1/14 🏆	98.6 MiB
256	6,725,340	#3/16	155.7 MiB
1,024	6,859,540	#3/16	357.0 MiB

Drogon takes first place in the HTTP/2 baseline with 64 connections, pushing over 10.6 million requests per second. At that concurrency level, it beats hyper (6.88M), h2o (which dominates at higher concurrency), and everything else. It's also serving at 1.48 GB/s bandwidth while using under 100 MiB of memory.

Even at higher concurrency where h2o takes the lead (14M+ RPS), Drogon stays comfortably in the top 3.

Static File Serving over HTTP/2

Drogon's HTTP/2 dominance extends to static files too:

Connections	RPS	Rank	Bandwidth
64	1,813,238	#1/14 🏆	27.77 GB/s
256	1,546,328	#2/16	23.66 GB/s
1,024	1,018,221	#5/16	15.57 GB/s

Another first-place finish at 64 connections, beating actix by a significant margin (1.81M vs 1.35M). The bandwidth numbers are massive — nearly 28 GB/s of static content over HTTP/2.

Pipelined Requests

Pipelining shows Drogon's solid but not spectacular HTTP/1.1 parsing:

Connections	RPS	Rank
512	7,828,214	#9/30
4,096	7,612,822	#9/30
16,384	7,260,243	#9/30

Consistently 9th place across all concurrency levels. The gap to the top is real though — ringzero hits 47M RPS pipelined, roughly 6x what Drogon manages. But 7.8M pipelined RPS is nothing to sneeze at for a full-featured framework.

JSON Serialization — The Plot Twist

Okay, this is where the story gets complicated. In the JSON test (serialize a 50-item dataset):

Connections	RPS	Rank
4,096	128,946	#26/29 😬
16,384	124,793	#26/29

That's... not great. Drogon drops to near the bottom of the pack for JSON serialization. For context, nginx (using its native JSON module) hits 1.18M RPS in the same test. Even Flask manages 107K — Drogon is barely ahead of it.

But here's the twist. At 32,768 connections, Drogon jumps to #3 out of 14 with 933,156 RPS. The framework seems to have a specific performance cliff at moderate connection counts for JSON workloads, then recovers dramatically at very high concurrency.

Looking at the implementation, the likely culprit is jsoncpp. Drogon uses jsoncpp for JSON serialization, which is known to be one of the slower JSON libraries in C++. The code builds each JSON response by constructing Json::Value objects field by field, then serializes with Json::StreamWriterBuilder. At lower concurrency where the CPU isn't fully utilized across all event loop threads, the per-request serialization overhead dominates.

Compression

This is Drogon's worst showing:

Connections	RPS	Rank
4,096	4,348	#24/28
16,384	4,173	#23/28

Only 4K RPS with gzip compression enabled. The memory usage spikes to 556 MiB and CPU pegs at 12,153%. Drogon uses zlib for compression, and compressing the large JSON response on every request absolutely tanks throughput. The top performer here (blitz) manages 89K RPS — over 20x more.

Mixed Workload

The mixed test hits multiple endpoints in a realistic traffic pattern:

Connections	RPS	Rank
512	21,593	#16/17
4,096	22,858	#20/27
16,384	22,100	#20/27

Bottom half of the field. The mixed workload combines plain text, JSON, compression, static files, and database queries — and the JSON/compression weakness drags the composite score down significantly. go-fasthttp leads here with 87K RPS at 4,096 connections.

Limited Connections & Noisy Neighbor

Drogon recovers nicely in constrained scenarios:

Limited connections (512): #6/30 with 1,251,259 RPS
Limited connections (4096): #4/30 with 1,646,234 RPS
Noisy neighbor (4096): #6/30 with 1,965,305 RPS

When the playing field is leveled by connection limits or background noise, Drogon's efficient event loop and low per-connection overhead keep it competitive.

Architecture Deep Dive

Looking at the HttpArena implementation, a few things stand out:

Thread-Local SQLite

static thread_local sqlite3 *tl_db = nullptr;
static thread_local sqlite3_stmt *tl_stmt = nullptr;

Each event loop thread gets its own SQLite connection with a pre-prepared statement. No mutex contention, no connection pooling overhead. The PRAGMA mmap_size=268435456 enables memory-mapped I/O for the database file. Clean approach.

Pre-loaded Everything

Datasets and static files are loaded entirely into memory at startup:

static std::vector<DataItem> dataset;
static std::string json_large_response;
static std::unordered_map<std::string, StaticFile> static_files;

The large JSON response is pre-serialized once and served as a raw string. Static files sit in an unordered_map for O(1) lookups. This is why the static file serving numbers are so good — there's zero disk I/O.

Async Callback Pattern

Drogon uses the classic async callback style:

void pipeline(const HttpRequestPtr &req,
              std::function<void(const HttpResponsePtr &)> &&callback)
{
    auto resp = HttpResponse::newHttpResponse();
    resp->setBody("ok");
    resp->setContentTypeCode(CT_TEXT_PLAIN);
    callback(resp);
}

No coroutines, no futures — just raw callbacks. This keeps the overhead minimal but makes complex async chains harder to write. Drogon does support coroutines in newer versions, but this benchmark sticks with callbacks.

Build Optimization

The Dockerfile shows Drogon built from source with LTO enabled, and the app compiled with -O3 -flto. Notably, drogon itself is built with -DBUILD_ORM=OFF -DBUILD_BROTLI=OFF — stripping out unused features for a leaner binary.

Who Should Use Drogon?

Good fit if you:

Already have a C++ codebase and need HTTP endpoints
Need excellent HTTP/2 performance (seriously, those numbers are elite)
Want a mature, feature-complete framework (ORM, WebSocket, middleware, etc.)
Need low memory usage under moderate load (~80-130 MiB)
Serve mostly static or pre-computed content

Maybe look elsewhere if you:

Need fast JSON serialization (consider Rust/actix or use a faster JSON lib)
Need strong gzip compression throughput
Prefer modern async patterns over callbacks
Want a large ecosystem and community (Drogon's is growing but still niche)

The Verdict

Drogon is a framework of extremes. Its HTTP/2 performance is genuinely best-in-class — taking #1 in both baseline and static file tests at moderate concurrency. The plain HTTP/1.1 baseline numbers are consistently top-10 across all concurrency levels. Memory efficiency is excellent.

But the JSON serialization bottleneck is real and dramatic. Dropping from top-7 in baseline to #26 in JSON tests is a stark reminder that framework performance isn't one-dimensional. The jsoncpp dependency is the obvious weak link — swapping it for simgleson, rapidjson, or even nlohmann/json could dramatically change those numbers.

If you're building an HTTP/2 service that mostly serves pre-computed or static content, Drogon might be the fastest option available. If you're building a JSON API that serializes data on every request... you might want to benchmark carefully first.

All benchmark data from HttpArena (GitHub). Tests run under controlled conditions with consistent hardware across all frameworks.

Why We Built HttpArena — A Better Way to Benchmark HTTP Frameworks

Benny — Sun, 15 Mar 2026 14:06:39 +0000

The Problem

Every framework claims to be fast. Blog posts benchmark X vs Y with a single plaintext endpoint, one concurrency level, one metric. The results are interesting for five minutes, then a new version ships and everything changes.

The real question developers face is harder: which framework performs best for my workload? And nobody can answer that, because most benchmarks don't test real workloads.

What Is HttpArena?

HttpArena is an open-source benchmarking platform that tests HTTP frameworks across 16 different test profiles on dedicated, reproducible hardware. No cloud VMs. No noisy neighbors. Same machine, same load generator, same conditions for every framework.

The source is on GitHub and the results are live at mda2av.github.io/HttpArena.

Why 16 Test Profiles?

This is the core idea. A single "requests per second" number is almost meaningless without context. HttpArena tests frameworks across a range of realistic scenarios:

Connection Behavior

Baseline at 512, 4K, 16K, and 32K concurrent connections — how does performance scale as you push connection counts higher?
Pipelined — HTTP pipelining with 16 requests per connection
Limited connections — connection reuse under constrained pools

Real Workloads

JSON processing — parse a dataset, compute derived fields, serialize the response
Compression — gzip a large payload on the fly
Upload — handle incoming request bodies of varying sizes
Database — SQLite queries under concurrent load

Resilience

Noisy — a mix of valid requests, bad methods, and nonexistent paths. Does the server stay stable?
Mixed — all endpoint types hit concurrently. This is closest to real-world traffic.

Protocols

HTTP/2 and HTTP/3 — for frameworks that support them
Static file serving over H2
gRPC — unary calls with and without TLS
WebSocket — echo server performance

A framework that dominates at plaintext might fall apart under JSON serialization. One that handles 512 connections beautifully might choke at 32K. One that aces every individual test might have contention issues when all endpoints are hit simultaneously. You don't see any of this with single-profile benchmarks.

What Makes It Different

Reproducibility

Every framework runs in a Docker container on the same dedicated hardware. The Dockerfiles, source code, and test configurations are all in the repo. Anyone can clone it and reproduce the results.

Correctness First

Before any performance testing happens, every framework goes through an 18-point validation suite that checks:

Correct arithmetic on query params and request bodies
Anti-cheat with randomized inputs (no hardcoded responses)
Proper HTTP status codes (404 for missing routes, 4xx for bad methods)
Correct Content-Type headers
Valid JSON processing with computed fields
Gzip compression that actually compresses
Resilience under malformed requests

If your framework doesn't pass validation, it doesn't get benchmarked. Performance numbers are useless if the server isn't doing the work correctly.

Apples to Apples

Every framework implements the same endpoints with the same behavior. The JSON endpoint processes the same dataset. The compression endpoint gzips the same payload. The database endpoint runs the same queries. The only variable is the framework itself.

Growing Framework List

We currently test 35+ frameworks across languages including Rust, Go, C, C++, Java, C#, JavaScript (Node, Bun, Deno), Python, Ruby, Lua, and more. New frameworks are being added regularly — recent additions include Crystal, Zig, Nim, Swift, and Gleam.

Add Your Framework

Adding a framework is straightforward:

Create a Dockerfile — multi-stage build, minimal runtime image
Implement the endpoints — /baseline11, /pipeline, /json, /compression, /upload, and optionally /db, /baseline2 (H2), /static/*
Add a meta.json — declare which test profiles your framework subscribes to
Open a PR — validation runs automatically in CI

Look at any existing framework in the frameworks/ directory for a working example. The whole process takes about an hour if you know your framework well.

Who Is This For?

Developers choosing a framework — see how candidates perform across diverse workloads, not just plaintext
Framework authors — get multi-dimensional performance data and a standardized way to compare against the ecosystem
Performance engineers — reproducible, open-source benchmarks you can run on your own hardware
The curious — sometimes you just want to know how fast things can go

Check It Out

Live results: mda2av.github.io/HttpArena
Source & contribute: github.com/MDA2AV/HttpArena

We're building this in the open and actively welcoming contributions. If your favorite framework isn't represented yet, come add it. If you think our methodology could be better, open an issue. The goal is to give the community the most useful, honest benchmark data possible.