<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Benny</title>
    <description>The latest articles on DEV Community by Benny (@fbio_reis_355b87b508598e).</description>
    <link>https://dev.to/fbio_reis_355b87b508598e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3673917%2Fad6b20f4-3d45-4db6-b007-55d1f1c0da16.jpeg</url>
      <title>DEV Community: Benny</title>
      <link>https://dev.to/fbio_reis_355b87b508598e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/fbio_reis_355b87b508598e"/>
    <language>en</language>
    <item>
      <title>FastPySGI-WSGI: How a Libuv-Powered Python Server Hits 7.5 Million Requests Per Second</title>
      <dc:creator>Benny</dc:creator>
      <pubDate>Tue, 31 Mar 2026 12:46:24 +0000</pubDate>
      <link>https://dev.to/fbio_reis_355b87b508598e/fastpysgi-wsgi-how-a-libuv-powered-python-server-hits-75-million-requests-per-second-fgd</link>
      <guid>https://dev.to/fbio_reis_355b87b508598e/fastpysgi-wsgi-how-a-libuv-powered-python-server-hits-75-million-requests-per-second-fgd</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When most developers think of Python web performance, they think "slow." Frameworks like Flask and Django are beloved for developer experience, but rarely win benchmarking contests. FastPySGI-WSGI challenges that assumption entirely.&lt;/p&gt;

&lt;p&gt;In the HttpArena benchmark suite -- a standardized HTTP framework benchmark platform running on dedicated 64-core hardware with 18 test profiles -- FastPySGI-WSGI delivers numbers that rival Rust and Go implementations. We're talking &lt;strong&gt;1.3 million RPS on baseline tests&lt;/strong&gt; and &lt;strong&gt;707K RPS while processing JSON&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Let's break down how it works, why it's fast, and what lessons we can take away.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is FastPySGI?
&lt;/h2&gt;

&lt;p&gt;FastPySGI is an ultra-fast WSGI/ASGI server for Python built on top of &lt;strong&gt;libuv&lt;/strong&gt; -- the same C-based event loop that powers Node.js. Unlike traditional Python servers (Gunicorn, Uvicorn), FastPySGI bypasses Python's asyncio entirely and handles networking at the C level.&lt;/p&gt;

&lt;p&gt;The "WSGI" variant specifically uses the standard WSGI interface, meaning it's synchronous Python code running on an asynchronous C event loop. This is a critical architectural choice: you get libuv's raw networking speed without requiring async/await in your application code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repository:&lt;/strong&gt; &lt;a href="https://github.com/remittor/fastpysgi" rel="noopener noreferrer"&gt;https://github.com/remittor/fastpysgi&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: Fewer Layers, More Speed
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Minimal Dependencies
&lt;/h3&gt;

&lt;p&gt;The entire dependency list fits on a sticky note:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fastpysgi==0.4
orjson==3.10.15
psycopg[binary]==3.2.4
psycopg_pool==3.2.6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four packages. Compare that to FastAPI's 20+ transitive dependencies or Django's sprawling ecosystem. Every layer you remove is latency you eliminate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Single-File Application
&lt;/h3&gt;

&lt;p&gt;The entire benchmark implementation is a single 349-line Python file. No framework overhead, no middleware chains, no decorator magic. Just a WSGI callable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;app&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;method&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REQUEST_METHOD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PATH_INFO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;respond_405&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start_response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/pipeline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;respond_ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start_response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/baseline11&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;handle_baseline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# ... more routes
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Routing is a simple if/elif chain. No regex compilation, no route tree traversal, no parameter extraction framework. For a benchmark, this is the right call -- every nanosecond in routing overhead gets multiplied by millions of requests.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Worker Model
&lt;/h3&gt;

&lt;p&gt;FastPySGI spawns one worker per available CPU core:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;WRK_COUNT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sched_getaffinity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each worker runs its own libuv event loop, and the OS distributes connections across them. On the benchmark's 64-core machine, that's 64 workers hammering through requests in parallel.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance-Critical Design Choices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Pre-Loaded Static Files
&lt;/h3&gt;

&lt;p&gt;Static files aren't read from disk on each request. They're loaded entirely into memory at startup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;STATIC_DIR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/data/static&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;static_files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fname&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;STATIC_DIR&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;fpath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;STATIC_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fpath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;mime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_mime_type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fname&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;static_files&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;fname&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mime&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a request hits &lt;code&gt;/static/main.css&lt;/code&gt;, it's a dictionary lookup and a pointer return. Zero disk I/O, zero syscalls.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Fast JSON with orjson
&lt;/h3&gt;

&lt;p&gt;The standard library &lt;code&gt;json&lt;/code&gt; module is pure Python. FastPySGI uses &lt;strong&gt;orjson&lt;/strong&gt;, a Rust-based JSON serializer that's 3-10x faster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;orjson&lt;/span&gt;

&lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;orjson&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the JSON benchmark profile, this choice alone could account for hundreds of thousands of additional RPS.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Pre-Compressed Responses
&lt;/h3&gt;

&lt;p&gt;For the compression test, the large JSON dataset is compressed once at startup, not on every request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;large_buf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;orjson&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;large_dataset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;compressed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;zlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;large_buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every request gets the cached compressed buffer. No CPU burned on repeated gzip operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Tuned Server Parameters
&lt;/h3&gt;

&lt;p&gt;Socket backlog and read buffers are explicitly tuned for high throughput:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;fastpysgi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;backlog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;      &lt;span class="c1"&gt;# 16K pending connections
&lt;/span&gt;&lt;span class="n"&gt;fastpysgi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_buffer_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;256000&lt;/span&gt;  &lt;span class="c1"&gt;# 256KB read buffer
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These aren't arbitrary numbers -- they're sized for the benchmark's connection patterns (up to 16,384 concurrent connections).&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Thread-Local SQLite with MMAP
&lt;/h3&gt;

&lt;p&gt;Database tests use thread-local SQLite connections with memory-mapped I/O:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;db_local&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;local&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_db&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db_local&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;conn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/data/benchmark.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PRAGMA mmap_size=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;268&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 268MB
&lt;/span&gt;        &lt;span class="n"&gt;db_local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db_local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MMAP lets SQLite bypass the filesystem cache and read directly from memory-mapped pages, dramatically reducing query latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. PostgreSQL Connection Pooling
&lt;/h3&gt;

&lt;p&gt;For async database tests, a bounded connection pool prevents connection storm overhead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;psycopg_pool&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConnectionPool&lt;/span&gt;

&lt;span class="n"&gt;pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConnectionPool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;conninfo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host=...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;min_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Benchmark Numbers
&lt;/h2&gt;

&lt;p&gt;All benchmarks run on identical 64-core dedicated hardware via Docker containers, using h2load as the load generator with 64 threads. Duration: 5 seconds per run, best of 3 kept.&lt;/p&gt;

&lt;h3&gt;
  
  
  Baseline (Simple Response)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Avg Latency&lt;/th&gt;
&lt;th&gt;P99 Latency&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;1,301,932&lt;/td&gt;
&lt;td&gt;392us&lt;/td&gt;
&lt;td&gt;2.00ms&lt;/td&gt;
&lt;td&gt;408 MiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;1,371,836&lt;/td&gt;
&lt;td&gt;2.99ms&lt;/td&gt;
&lt;td&gt;33.80ms&lt;/td&gt;
&lt;td&gt;922 MiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;1,324,561&lt;/td&gt;
&lt;td&gt;11.93ms&lt;/td&gt;
&lt;td&gt;60.10ms&lt;/td&gt;
&lt;td&gt;2.5 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Over &lt;strong&gt;1.3 million requests per second&lt;/strong&gt; on a simple response. Latency stays sub-millisecond at 512 connections.&lt;/p&gt;

&lt;h3&gt;
  
  
  JSON Processing
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Avg Latency&lt;/th&gt;
&lt;th&gt;P99 Latency&lt;/th&gt;
&lt;th&gt;Bandwidth&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;707,282&lt;/td&gt;
&lt;td&gt;4.56ms&lt;/td&gt;
&lt;td&gt;17.20ms&lt;/td&gt;
&lt;td&gt;5.63 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;670,914&lt;/td&gt;
&lt;td&gt;21.88ms&lt;/td&gt;
&lt;td&gt;67.70ms&lt;/td&gt;
&lt;td&gt;5.34 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;707K RPS&lt;/strong&gt; while parsing and serializing JSON, pushing &lt;strong&gt;5.6 GB/s&lt;/strong&gt; of bandwidth. The orjson investment pays off massively here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Static File Serving
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Avg Latency&lt;/th&gt;
&lt;th&gt;Bandwidth&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;724,526&lt;/td&gt;
&lt;td&gt;3.00ms&lt;/td&gt;
&lt;td&gt;10.91 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Nearly &lt;strong&gt;11 GB/s&lt;/strong&gt; of throughput from pre-loaded static files. Memory-resident serving eliminates disk I/O entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Async Database (PostgreSQL)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Avg Latency&lt;/th&gt;
&lt;th&gt;P99 Latency&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1,024&lt;/td&gt;
&lt;td&gt;79,200&lt;/td&gt;
&lt;td&gt;12.16ms&lt;/td&gt;
&lt;td&gt;31.10ms&lt;/td&gt;
&lt;td&gt;1.0 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Even with real PostgreSQL queries over the network, it sustains &lt;strong&gt;79K RPS&lt;/strong&gt; with reasonable latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mixed Workload (Realistic Traffic)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Avg Latency&lt;/th&gt;
&lt;th&gt;P99 Latency&lt;/th&gt;
&lt;th&gt;Bandwidth&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;53,005&lt;/td&gt;
&lt;td&gt;72.87ms&lt;/td&gt;
&lt;td&gt;658.60ms&lt;/td&gt;
&lt;td&gt;1.70 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;48,546&lt;/td&gt;
&lt;td&gt;312.08ms&lt;/td&gt;
&lt;td&gt;2.09s&lt;/td&gt;
&lt;td&gt;1.56 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The mixed test blends baseline, JSON, database, upload, and compression requests -- a more realistic workload. Still delivers &lt;strong&gt;53K RPS&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Does It Compare to Other Python Frameworks?
&lt;/h2&gt;

&lt;p&gt;While exact apples-to-apples comparisons depend on the specific benchmark run, the architectural differences are telling:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;FastPySGI-WSGI&lt;/th&gt;
&lt;th&gt;FastAPI&lt;/th&gt;
&lt;th&gt;Flask&lt;/th&gt;
&lt;th&gt;Django&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Server&lt;/td&gt;
&lt;td&gt;Built-in (libuv)&lt;/td&gt;
&lt;td&gt;Uvicorn (asyncio)&lt;/td&gt;
&lt;td&gt;Gunicorn (prefork)&lt;/td&gt;
&lt;td&gt;Gunicorn (prefork)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Event Loop&lt;/td&gt;
&lt;td&gt;libuv (C)&lt;/td&gt;
&lt;td&gt;uvloop (Python)&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependencies&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;20+&lt;/td&gt;
&lt;td&gt;8+&lt;/td&gt;
&lt;td&gt;15+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Routing&lt;/td&gt;
&lt;td&gt;if/elif chain&lt;/td&gt;
&lt;td&gt;Decorator + Starlette&lt;/td&gt;
&lt;td&gt;Decorator + Werkzeug&lt;/td&gt;
&lt;td&gt;URL patterns + ORM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;td&gt;orjson (Rust)&lt;/td&gt;
&lt;td&gt;stdlib json&lt;/td&gt;
&lt;td&gt;stdlib json&lt;/td&gt;
&lt;td&gt;stdlib json&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key insight: FastPySGI removes Python from the hot path of networking. The event loop, connection handling, and buffer management all happen in C (libuv). Python only runs for application logic -- routing, data processing, response building.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons for Your Own Projects
&lt;/h2&gt;

&lt;p&gt;You probably shouldn't rewrite your production Flask app as a raw WSGI handler. But there are transferable lessons:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Know Your Bottleneck
&lt;/h3&gt;

&lt;p&gt;FastPySGI proves that Python application code isn't usually the bottleneck -- it's the layers between the OS and your code. If you're I/O bound, the event loop implementation matters more than your language choice.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Pre-compute What You Can
&lt;/h3&gt;

&lt;p&gt;Pre-loading static files, pre-compressing responses, and pre-serializing datasets at startup are techniques that work in any framework. If data doesn't change per-request, don't process it per-request.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Choose Your Serializer Wisely
&lt;/h3&gt;

&lt;p&gt;Swapping &lt;code&gt;json&lt;/code&gt; for &lt;code&gt;orjson&lt;/code&gt; is a one-line change in most Python projects and can yield 3-10x faster serialization. For API-heavy services, this is low-hanging fruit.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Tune Your Server Parameters
&lt;/h3&gt;

&lt;p&gt;Most developers never touch socket backlog, buffer sizes, or connection pool bounds. The defaults are conservative. If you know your traffic patterns, tuning these can unlock significant performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Fewer Dependencies = Fewer Layers
&lt;/h3&gt;

&lt;p&gt;Every middleware, every abstraction, every framework feature adds overhead. When performance matters, audit your dependency tree and question whether each layer is earning its keep.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;FastPySGI-WSGI demonstrates that Python can compete at the highest levels of HTTP performance when you strip away the abstractions and let C do what C does best. By building on libuv, minimizing dependencies, and making smart caching decisions, it achieves numbers that most developers would associate with Rust or Go.&lt;/p&gt;

&lt;p&gt;The HttpArena project (&lt;a href="https://www.http-arena.com/" rel="noopener noreferrer"&gt;https://www.http-arena.com/&lt;/a&gt;) provides a fascinating lens into how different frameworks and languages approach the same problems. FastPySGI-WSGI stands out not because it reinvents Python, but because it strategically removes Python from the parts of the stack where it's slowest.&lt;/p&gt;

&lt;p&gt;Whether you're building the next high-performance Python server or just optimizing your existing API, the principles behind FastPySGI's design are worth studying.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;All benchmark data from &lt;a href="https://www.http-arena.com/" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt;, run on dedicated 64-core hardware with standardized Docker containers. Results reflect framework performance under controlled conditions.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Check out the HttpArena repository on GitHub to explore how 78+ frameworks compare: &lt;a href="https://github.com/MDA2AV/HttpArena" rel="noopener noreferrer"&gt;https://github.com/MDA2AV/HttpArena&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>backend</category>
      <category>performance</category>
      <category>python</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Fiber: Built on fasthttp, But 28x Slower at Pipelining — What Happened? (HttpArena Deep Dive)</title>
      <dc:creator>Benny</dc:creator>
      <pubDate>Sun, 29 Mar 2026 14:22:36 +0000</pubDate>
      <link>https://dev.to/fbio_reis_355b87b508598e/fiber-built-on-fasthttp-but-28x-slower-at-pipelining-what-happened-httparena-deep-dive-1o9c</link>
      <guid>https://dev.to/fbio_reis_355b87b508598e/fiber-built-on-fasthttp-but-28x-slower-at-pipelining-what-happened-httparena-deep-dive-1o9c</guid>
      <description>&lt;p&gt;Fiber is one of the most popular Go web frameworks on GitHub. 34K+ stars. Express-inspired API. And it's built on top of fasthttp — the same engine that crushes most benchmarks.&lt;/p&gt;

&lt;p&gt;So you'd expect Fiber to be &lt;em&gt;fast&lt;/em&gt;, right? Maybe not quite as fast as raw fasthttp, but close?&lt;/p&gt;

&lt;p&gt;I dug into &lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;HttpArena's benchmark data&lt;/a&gt; to find out. The results surprised me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Quick Summary
&lt;/h2&gt;

&lt;p&gt;Fiber is the &lt;strong&gt;most memory-efficient Go framework&lt;/strong&gt; in almost every test. It's also &lt;strong&gt;last place among Go frameworks&lt;/strong&gt; in most throughput tests. And in pipelining, it's not just slower than fasthttp — it's &lt;strong&gt;28x slower&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But the story is more nuanced than "Fiber is slow." Let's dig in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Baseline Performance: Last Among Go Peers
&lt;/h2&gt;

&lt;p&gt;In the standard baseline test at 4,096 connections:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Requests/sec&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;th&gt;Avg Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;go-fasthttp&lt;/td&gt;
&lt;td&gt;1,464,168&lt;/td&gt;
&lt;td&gt;188 MB&lt;/td&gt;
&lt;td&gt;2.79ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin&lt;/td&gt;
&lt;td&gt;430,086&lt;/td&gt;
&lt;td&gt;375 MB&lt;/td&gt;
&lt;td&gt;9.45ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;echo&lt;/td&gt;
&lt;td&gt;424,337&lt;/td&gt;
&lt;td&gt;249 MB&lt;/td&gt;
&lt;td&gt;9.54ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chi&lt;/td&gt;
&lt;td&gt;422,523&lt;/td&gt;
&lt;td&gt;359 MB&lt;/td&gt;
&lt;td&gt;9.62ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;fiber&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;397,172&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;144 MB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6.47ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Fiber comes in 5th out of 5 Go frameworks for raw throughput, and #39 out of 51 frameworks overall. That's... not what you'd expect from something built on fasthttp.&lt;/p&gt;

&lt;p&gt;But look at that memory column. 144 MB. That's the lowest of any Go framework by a wide margin — 42% less than echo, and 62% less than gin. And the latency is actually better than gin/echo/chi despite lower throughput.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pipelining Gap: 28x
&lt;/h2&gt;

&lt;p&gt;This is where things get wild. HTTP pipelining at 4,096 connections with 16 requests per pipeline:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Requests/sec&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;go-fasthttp&lt;/td&gt;
&lt;td&gt;17,808,031&lt;/td&gt;
&lt;td&gt;196 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin&lt;/td&gt;
&lt;td&gt;1,046,933&lt;/td&gt;
&lt;td&gt;1,003 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;echo&lt;/td&gt;
&lt;td&gt;1,016,858&lt;/td&gt;
&lt;td&gt;492 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chi&lt;/td&gt;
&lt;td&gt;937,099&lt;/td&gt;
&lt;td&gt;692 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;fiber&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;623,248&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;96 MB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;go-fasthttp does &lt;strong&gt;17.8 million requests per second&lt;/strong&gt;. Fiber does 623K. That's a 28.6x gap.&lt;/p&gt;

&lt;p&gt;Even gin manages 1M rps in pipelining — 68% more than Fiber. And again, look at gin's memory usage: over 1 GB. Fiber? 96 MB. Sipping resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Such a Massive Gap?
&lt;/h3&gt;

&lt;p&gt;I read both implementations. The difference is architectural.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;go-fasthttp&lt;/strong&gt; in HttpArena uses &lt;code&gt;SO_REUSEPORT&lt;/code&gt; — it spawns one listener per CPU core, each with its own &lt;code&gt;fasthttp.Server&lt;/code&gt;. Incoming connections get distributed by the kernel. The routing is a raw &lt;code&gt;switch&lt;/code&gt; statement on &lt;code&gt;ctx.Path()&lt;/code&gt;. Zero middleware, zero overhead, zero allocations on the hot path.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// go-fasthttp: one listener per CPU core&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;numCPU&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;ln&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;reuseport&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tcp4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;":8080"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;fasthttp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Handler&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ln&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fiber&lt;/strong&gt; runs a single &lt;code&gt;app.Listen(":8080")&lt;/code&gt; with its Express-style router, middleware chain, and &lt;code&gt;compress.New()&lt;/code&gt; applied globally. Every request walks through middleware functions. The router does pattern matching instead of a switch statement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// fiber: single listener with middleware chain&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;fiber&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fiber&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;compress&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;compress&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;compress&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LevelBestSpeed&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/pipeline"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;":8080"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That compression middleware is applied globally — even on the &lt;code&gt;/pipeline&lt;/code&gt; endpoint that returns a 2-byte &lt;code&gt;"ok"&lt;/code&gt; response. Every baseline request pays the cost of checking Accept-Encoding headers for no reason.&lt;/p&gt;

&lt;p&gt;This is the cost of ergonomics. Fiber gives you Express-style middleware, clean routing, and a nice API. That costs CPU cycles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Fiber Actually Wins
&lt;/h2&gt;

&lt;p&gt;Here's the twist: there are two categories where Fiber outperforms its Go peers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limited Connections (512)
&lt;/h3&gt;

&lt;p&gt;When connections are scarce and there's keep-alive and reconnection churn:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Requests/sec&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;fiber&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;178,746&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;68 MB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin&lt;/td&gt;
&lt;td&gt;149,330&lt;/td&gt;
&lt;td&gt;94 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;go-fasthttp&lt;/td&gt;
&lt;td&gt;147,847&lt;/td&gt;
&lt;td&gt;100 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chi&lt;/td&gt;
&lt;td&gt;144,893&lt;/td&gt;
&lt;td&gt;94 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;echo&lt;/td&gt;
&lt;td&gt;136,646&lt;/td&gt;
&lt;td&gt;93 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Fiber is #1 among Go frameworks here, beating even raw fasthttp by 21%. Under connection churn at lower concurrency, Fiber's lightweight connection handling shines. Fasthttp's multi-listener approach actually hurts here — distributing 512 connections across many listeners means some sit idle while others are busy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mixed Workload
&lt;/h3&gt;

&lt;p&gt;The mixed workload test hits all endpoints (baseline, JSON, DB, uploads, compression, static files) simultaneously at 4,096 connections:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Requests/sec&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;go-fasthttp&lt;/td&gt;
&lt;td&gt;71,173&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;79.9 GB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;fiber&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;58,490&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;761 MB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;echo&lt;/td&gt;
&lt;td&gt;36,125&lt;/td&gt;
&lt;td&gt;1.7 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chi&lt;/td&gt;
&lt;td&gt;34,365&lt;/td&gt;
&lt;td&gt;595 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin&lt;/td&gt;
&lt;td&gt;32,477&lt;/td&gt;
&lt;td&gt;988 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Fiber is solidly #2, beating echo/chi/gin by 60-80%. And look at that memory story: go-fasthttp uses &lt;strong&gt;79.9 GB of RAM&lt;/strong&gt; to achieve 71K rps. Fiber uses 761 MB for 58.5K rps. That's 105x less memory for only 18% less throughput.&lt;/p&gt;

&lt;p&gt;Per-megabyte efficiency, Fiber is the clear winner in mixed workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compression: The Hidden Strength
&lt;/h2&gt;

&lt;p&gt;Fiber's global compression middleware — the same thing that hurts pipelining — actually pays off here:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Requests/sec&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;go-fasthttp&lt;/td&gt;
&lt;td&gt;14,771&lt;/td&gt;
&lt;td&gt;14.4 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;fiber&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9,483&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.9 GB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chi&lt;/td&gt;
&lt;td&gt;7,602&lt;/td&gt;
&lt;td&gt;3.4 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin&lt;/td&gt;
&lt;td&gt;7,578&lt;/td&gt;
&lt;td&gt;2.9 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;echo&lt;/td&gt;
&lt;td&gt;7,536&lt;/td&gt;
&lt;td&gt;3.1 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Second place among Go frameworks. Fiber uses &lt;code&gt;andybalholm/brotli&lt;/code&gt; and &lt;code&gt;klauspost/compress&lt;/code&gt; through its middleware — solid libraries. The 25% lead over gin/echo/chi is real.&lt;/p&gt;

&lt;h2&gt;
  
  
  JSON Serialization: The Weak Spot
&lt;/h2&gt;

&lt;p&gt;JSON processing at 4,096 connections:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Requests/sec&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;go-fasthttp&lt;/td&gt;
&lt;td&gt;314,945&lt;/td&gt;
&lt;td&gt;696 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin&lt;/td&gt;
&lt;td&gt;174,851&lt;/td&gt;
&lt;td&gt;433 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;echo&lt;/td&gt;
&lt;td&gt;164,227&lt;/td&gt;
&lt;td&gt;371 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chi&lt;/td&gt;
&lt;td&gt;158,040&lt;/td&gt;
&lt;td&gt;390 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;fiber&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;125,297&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;171 MB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Last place again. 28% slower than gin. The implementation is interesting — Fiber's handler allocates a new &lt;code&gt;[]ProcessedItem&lt;/code&gt; slice on every request, processes the dataset, then marshals to JSON with &lt;code&gt;json.Marshal&lt;/code&gt;. The other net/http-based frameworks do essentially the same thing, but they have more CPU available since they're not running through Fiber's middleware stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Database Performance: Quietly Strong
&lt;/h2&gt;

&lt;p&gt;Async database queries via PostgreSQL at 1,024 connections:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Requests/sec&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;go-fasthttp&lt;/td&gt;
&lt;td&gt;30,784&lt;/td&gt;
&lt;td&gt;359 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;fiber&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;19,196&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;192 MB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin&lt;/td&gt;
&lt;td&gt;17,660&lt;/td&gt;
&lt;td&gt;220 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;echo&lt;/td&gt;
&lt;td&gt;17,486&lt;/td&gt;
&lt;td&gt;284 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chi&lt;/td&gt;
&lt;td&gt;17,324&lt;/td&gt;
&lt;td&gt;211 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Clear #2 among Go frameworks, and 9-11% ahead of the gin/echo/chi cluster. Both Fiber and go-fasthttp use &lt;code&gt;pgxpool&lt;/code&gt; with &lt;code&gt;NumCPU * 4&lt;/code&gt; max connections. The gap between them is mostly the overhead of Fiber's framework layer, but when the bottleneck shifts to the database, that overhead matters less.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Memory Story
&lt;/h2&gt;

&lt;p&gt;Let's talk about what Fiber does really well. Across every single test, Fiber uses less memory than any other Go framework:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Baseline 4K:&lt;/strong&gt; 144 MB (vs gin's 375 MB)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipelined 4K:&lt;/strong&gt; 96 MB (vs gin's 1 GB)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON 4K:&lt;/strong&gt; 171 MB (vs gin's 433 MB)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mixed 4K:&lt;/strong&gt; 761 MB (vs go-fasthttp's 79.9 GB)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Uploads 256:&lt;/strong&gt; 296 MB (vs echo's 541 MB)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited Conn:&lt;/strong&gt; 68 MB (vs go-fasthttp's 100 MB)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is fasthttp's zero-allocation philosophy showing through. Even with Fiber's middleware layer on top, the underlying engine reuses buffers aggressively and avoids heap allocations. In constrained environments — containers, edge deployments, shared hosting — this matters more than raw throughput.&lt;/p&gt;

&lt;h2&gt;
  
  
  Uploads: Beating fasthttp
&lt;/h2&gt;

&lt;p&gt;Here's a fun one. File uploads at 256 connections:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Requests/sec&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;echo&lt;/td&gt;
&lt;td&gt;1,334&lt;/td&gt;
&lt;td&gt;541 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chi&lt;/td&gt;
&lt;td&gt;1,326&lt;/td&gt;
&lt;td&gt;509 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gin&lt;/td&gt;
&lt;td&gt;1,320&lt;/td&gt;
&lt;td&gt;582 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;fiber&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1,222&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;296 MB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;go-fasthttp&lt;/td&gt;
&lt;td&gt;910&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;15.5 GB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Fiber is 4th, but go-fasthttp is &lt;em&gt;last&lt;/em&gt; — and using 15.5 GB of RAM to process uploads at only 910 rps. Fiber handles uploads with &lt;code&gt;StreamRequestBody: true&lt;/code&gt; and &lt;code&gt;c.Request().BodyStream()&lt;/code&gt;, which streams the body to &lt;code&gt;/dev/null&lt;/code&gt; efficiently. The net/http-based frameworks (gin, echo, chi) do slightly better here, but fasthttp's approach of reading the entire body into memory is catastrophic for large file uploads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Deep Dive
&lt;/h2&gt;

&lt;p&gt;Looking at Fiber's HttpArena implementation:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The good:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;StreamRequestBody: true&lt;/code&gt; — avoids buffering entire request bodies&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;BodyLimit: 25 * 1024 * 1024&lt;/code&gt; — explicit limits prevent OOM&lt;/li&gt;
&lt;li&gt;Pre-computed JSON for the compression endpoint (&lt;code&gt;jsonLargeResponse&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Static files loaded into memory at startup (&lt;code&gt;staticFiles&lt;/code&gt; map)&lt;/li&gt;
&lt;li&gt;Using &lt;code&gt;pgxpool&lt;/code&gt; for async DB connections, &lt;code&gt;modernc.org/sqlite&lt;/code&gt; for sync DB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The concerning:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Global compression middleware penalizes all endpoints&lt;/li&gt;
&lt;li&gt;Single &lt;code&gt;app.Listen()&lt;/code&gt; vs go-fasthttp's per-core listeners with &lt;code&gt;SO_REUSEPORT&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;JSON endpoint allocates new slices per request (could pre-compute like the compression endpoint)&lt;/li&gt;
&lt;li&gt;Manual &lt;code&gt;json.Marshal&lt;/code&gt; instead of writing directly to the response writer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What could be improved:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apply compression only to the &lt;code&gt;/compression&lt;/code&gt; endpoint&lt;/li&gt;
&lt;li&gt;Use Fiber's &lt;code&gt;Prefork&lt;/code&gt; mode (built-in!) to match go-fasthttp's multi-listener approach&lt;/li&gt;
&lt;li&gt;Pre-compute the JSON response like the compression response&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;sonic&lt;/code&gt; or &lt;code&gt;go-json&lt;/code&gt; instead of &lt;code&gt;encoding/json&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fiber actually has a &lt;code&gt;Prefork: true&lt;/code&gt; config option that does &lt;code&gt;SO_REUSEPORT&lt;/code&gt; under the hood. The benchmark implementation doesn't use it. That alone could close a significant chunk of the gap with raw fasthttp.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Verdict: Who Should Use Fiber?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Fiber is perfect for you if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want Express-like ergonomics in Go&lt;/li&gt;
&lt;li&gt;Memory efficiency matters (containers, K8s with resource limits)&lt;/li&gt;
&lt;li&gt;You're building APIs that do real work (DB queries, mixed workloads) rather than pure I/O benchmarks&lt;/li&gt;
&lt;li&gt;You want a single framework that handles compression, routing, and middleware cleanly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consider raw fasthttp if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need maximum pipelining throughput (17.8M vs 623K rps is hard to ignore)&lt;/li&gt;
&lt;li&gt;You're building a proxy or gateway where every microsecond counts&lt;/li&gt;
&lt;li&gt;You don't mind manual routing and zero framework niceties&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consider gin/echo if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want net/http compatibility and the broader Go ecosystem&lt;/li&gt;
&lt;li&gt;Upload performance matters more than memory efficiency&lt;/li&gt;
&lt;li&gt;You're more comfortable with net/http patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fiber occupies an interesting niche: it's the &lt;strong&gt;most memory-efficient Go framework&lt;/strong&gt; while being &lt;strong&gt;competitive in real-world mixed workloads&lt;/strong&gt;. It's not the throughput king, and it probably shouldn't be — that's not what frameworks are for. Frameworks trade raw speed for developer experience. Fiber makes that trade while keeping memory usage remarkably low.&lt;/p&gt;

&lt;p&gt;The 28x pipelining gap is eye-catching, but pipelining is a synthetic benchmark that few production workloads actually use. In mixed workloads — which better simulate real APIs — Fiber beats gin by 80% while using less RAM.&lt;/p&gt;

&lt;p&gt;That's a framework doing its job well.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;All data from &lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt; (&lt;a href="https://github.com/MDA2AV/HttpArena" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;). Test environment: 64 threads, various connection counts. Check the site for the full leaderboards and methodology.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>webdev</category>
      <category>performance</category>
      <category>benchmarks</category>
    </item>
    <item>
      <title>GenHTTP vs ASP.NET Minimal APIs: The C# Benchmark Showdown Nobody Expected</title>
      <dc:creator>Benny</dc:creator>
      <pubDate>Fri, 27 Mar 2026 18:46:34 +0000</pubDate>
      <link>https://dev.to/fbio_reis_355b87b508598e/genhttp-vs-aspnet-minimal-apis-the-c-benchmark-showdown-nobody-expected-1dhf</link>
      <guid>https://dev.to/fbio_reis_355b87b508598e/genhttp-vs-aspnet-minimal-apis-the-c-benchmark-showdown-nobody-expected-1dhf</guid>
      <description>&lt;p&gt;Two C# frameworks walk into a benchmark. One is the industry standard backed by Microsoft. The other is a scrappy indie framework most .NET developers have never heard of. You'd expect a blowout — and you'd be right. You just might be wrong about &lt;em&gt;who gets blown out where&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt; recently added both &lt;strong&gt;GenHTTP&lt;/strong&gt; and &lt;strong&gt;ASP.NET Minimal APIs&lt;/strong&gt; to their benchmark suite, and the results tell a story that's way more interesting than "Microsoft wins." Let's dig in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Contenders
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;ASP.NET Minimal APIs&lt;/strong&gt; needs no introduction. It's Microsoft's lightweight API framework running on Kestrel, the battle-tested HTTP server that powers half the internet's .NET workloads. Minimal APIs strip away controllers and give you &lt;code&gt;app.MapGet("/route", handler)&lt;/code&gt; — clean, fast, no ceremony.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GenHTTP&lt;/strong&gt; is a modular, embeddable C# web server that runs its own HTTP engine. It layers on abstractions — layouts, services, concerns, resource methods — giving you a higher-level programming model. Think "convention over configuration" but for HTTP servers.&lt;/p&gt;

&lt;p&gt;Both target &lt;strong&gt;.NET 10&lt;/strong&gt;. Both speak C#. Same runtime, same GC, same JIT. So any performance delta comes down to the framework itself.&lt;/p&gt;

&lt;p&gt;Let the games begin.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 1: Baseline — The 10x Gap Nobody Expected
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concurrency&lt;/th&gt;
&lt;th&gt;GenHTTP&lt;/th&gt;
&lt;th&gt;ASP.NET Minimal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;49,507 req/s&lt;/td&gt;
&lt;td&gt;495,831 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;48,551 req/s&lt;/td&gt;
&lt;td&gt;459,783 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;22,169 req/s&lt;/td&gt;
&lt;td&gt;353,394 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Yeah. Read that again. ASP.NET is doing &lt;strong&gt;10x&lt;/strong&gt; the requests per second on a simple GET-with-query-params endpoint.&lt;/p&gt;

&lt;p&gt;This is where GenHTTP pays the tax for its abstraction layer. Look at the code — ASP.NET's handler is a direct &lt;code&gt;IResult&lt;/code&gt; return. GenHTTP routes through a &lt;code&gt;LayoutBuilder&lt;/code&gt;, resolves a &lt;code&gt;ServiceResource&lt;/code&gt;, matches a &lt;code&gt;ResourceMethod&lt;/code&gt; attribute, and wraps things in a &lt;code&gt;Concern&lt;/code&gt; pipeline. Every request walks through that entire abstraction stack.&lt;/p&gt;

&lt;p&gt;Is it elegant? Sure. Is it 350,000 requests per second worth of overhead? Also yes.&lt;/p&gt;

&lt;p&gt;To be fair, ~50K req/s is still plenty fast for most real applications. But if your use case is "handle a simple request as fast as physically possible," ASP.NET Minimal isn't even trying and it's lapping GenHTTP.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 2: Pipelined — Wait, They're &lt;em&gt;Tied&lt;/em&gt;?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concurrency&lt;/th&gt;
&lt;th&gt;GenHTTP&lt;/th&gt;
&lt;th&gt;ASP.NET Minimal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;12,349,888 req/s&lt;/td&gt;
&lt;td&gt;12,790,124 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;13,148,857 req/s&lt;/td&gt;
&lt;td&gt;13,748,303 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;10,999,584 req/s&lt;/td&gt;
&lt;td&gt;12,741,100 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Thirteen. Million. Requests per second. From &lt;em&gt;both&lt;/em&gt; of them.&lt;/p&gt;

&lt;p&gt;The pipelined test is the great equalizer. When you shove requests down a persistent connection as fast as TCP allows, you're basically benchmarking the I/O layer, not the framework. And GenHTTP, despite its abstraction overhead, keeps pace with ASP.NET within ~15%.&lt;/p&gt;

&lt;p&gt;This tells us something important: GenHTTP has solid I/O fundamentals. The bottleneck in baseline isn't bad networking code — it's the request processing pipeline on top. Good bones, heavy coat.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 3: Upload — The 23GB Elephant in the Room
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concurrency&lt;/th&gt;
&lt;th&gt;GenHTTP (req/s / RAM)&lt;/th&gt;
&lt;th&gt;ASP.NET Minimal (req/s / RAM)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;609 / 429MB&lt;/td&gt;
&lt;td&gt;184 / &lt;strong&gt;23.2GB&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;td&gt;630 / 591MB&lt;/td&gt;
&lt;td&gt;187 / &lt;strong&gt;24.1GB&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;622 / 862MB&lt;/td&gt;
&lt;td&gt;160 / &lt;strong&gt;22.7GB&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I need you to look at this table and understand what you're seeing. ASP.NET Minimal is using &lt;strong&gt;twenty-three gigabytes of RAM&lt;/strong&gt; to handle file uploads at 160 requests per second. GenHTTP handles nearly 4x the throughput while sipping under a gig.&lt;/p&gt;

&lt;p&gt;What is happening here? Let's look at the code.&lt;/p&gt;

&lt;p&gt;ASP.NET's upload handler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;IResult&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Upload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HttpRequest&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;MemoryStream&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CopyToAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToString&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It copies the &lt;em&gt;entire request body into a MemoryStream&lt;/em&gt;. Every upload, fully buffered in RAM. With concurrent uploads, you're stacking multi-megabyte buffers on top of each other. The GC is crying.&lt;/p&gt;

&lt;p&gt;GenHTTP's approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;ValueTask&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Compute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Stream&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CanSeek&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ValueTask&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;FromResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;ComputeManually&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GenHTTP's engine provides a seekable stream, so it just reads the length directly — zero buffering, zero copying. When running on Kestrel, it falls back to reading chunks through a small buffer. Either way, it's dramatically more memory-efficient.&lt;/p&gt;

&lt;p&gt;Now, is this an ASP.NET framework problem? Not exactly — it's a handler implementation choice. You &lt;em&gt;could&lt;/em&gt; write a streaming handler in ASP.NET. But the fact that the "obvious" way to handle uploads in Minimal APIs leads to 23GB of RAM usage is... not great. GenHTTP's abstraction actually &lt;em&gt;protects&lt;/em&gt; you from this footgun by giving you a smarter stream interface out of the box.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GenHTTP wins this round by a country mile&lt;/strong&gt;, and it's not even close.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 4: JSON Serialization — The Underdog Bites
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concurrency&lt;/th&gt;
&lt;th&gt;GenHTTP&lt;/th&gt;
&lt;th&gt;ASP.NET Minimal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;582,510 req/s&lt;/td&gt;
&lt;td&gt;515,135 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;561,453 req/s&lt;/td&gt;
&lt;td&gt;440,034 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Wait. The framework that was 10x slower on baseline is now &lt;strong&gt;beating&lt;/strong&gt; ASP.NET on JSON serialization? By 13% at 4K connections and 28% at 16K?&lt;/p&gt;

&lt;p&gt;The implementations are virtually identical — both deserialize a dataset, transform it, and serialize the response. Same &lt;code&gt;System.Text.Json&lt;/code&gt; under the hood. The difference likely comes down to how each framework handles response writing and buffering for larger payloads. GenHTTP's response pipeline may have less overhead when serializing structured data compared to ASP.NET's &lt;code&gt;Results.Json()&lt;/code&gt; wrapper.&lt;/p&gt;

&lt;p&gt;This is where the "faster framework" narrative breaks down. Performance isn't one number. It's a &lt;em&gt;profile&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 5: Compression — GenHTTP Keeps Winning
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concurrency&lt;/th&gt;
&lt;th&gt;GenHTTP&lt;/th&gt;
&lt;th&gt;ASP.NET Minimal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;8,565 req/s&lt;/td&gt;
&lt;td&gt;7,183 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;7,572 req/s&lt;/td&gt;
&lt;td&gt;5,949 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;GenHTTP leads by 19-27% on compressed responses. Both use gzip at the fastest compression level, so the delta is in how they pipe compressed bytes to the wire. Another quiet win for the underdog.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 6: Noisy Neighbors — Dead Heat
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concurrency&lt;/th&gt;
&lt;th&gt;GenHTTP&lt;/th&gt;
&lt;th&gt;ASP.NET Minimal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;1,080,435 req/s&lt;/td&gt;
&lt;td&gt;1,110,339 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;1,218,298 req/s&lt;/td&gt;
&lt;td&gt;1,210,574 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;1,017,992 req/s&lt;/td&gt;
&lt;td&gt;1,007,141 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Under CPU contention with background noise, they trade punches within 3%. At a million requests per second each, both frameworks handle stress gracefully. No complaints from either corner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 7: Mixed Workload &amp;amp; Database — GenHTTP's Sweet Spot
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;GenHTTP&lt;/th&gt;
&lt;th&gt;ASP.NET Minimal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mixed 4,096c&lt;/td&gt;
&lt;td&gt;7,962 req/s&lt;/td&gt;
&lt;td&gt;6,650 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mixed 16,384c&lt;/td&gt;
&lt;td&gt;9,269 req/s&lt;/td&gt;
&lt;td&gt;6,240 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Async DB 512c&lt;/td&gt;
&lt;td&gt;138,179 req/s&lt;/td&gt;
&lt;td&gt;122,812 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Async DB 1,024c&lt;/td&gt;
&lt;td&gt;161,488 req/s&lt;/td&gt;
&lt;td&gt;147,924 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;GenHTTP pulls ahead by 20-49% on mixed workloads and 9-12% on async database queries. For realistic API workloads that juggle multiple endpoint types and database access — you know, &lt;em&gt;actual applications&lt;/em&gt; — GenHTTP holds its own and then some.&lt;/p&gt;

&lt;h2&gt;
  
  
  Round 8: HTTP/2 and HTTP/3 — GenHTTP Sits This One Out
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;ASP.NET Minimal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;H2 Baseline 256c&lt;/td&gt;
&lt;td&gt;254,371 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;H2 Baseline 1,024c&lt;/td&gt;
&lt;td&gt;203,325 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;H3 Baseline 256c&lt;/td&gt;
&lt;td&gt;41,922 req/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;ASP.NET supports HTTP/2 and HTTP/3 out of the box via Kestrel's QUIC integration. GenHTTP doesn't participate in these tests — it's HTTP/1.1 only for now. If you need modern protocol support, that's a clear win for ASP.NET.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Verdict: It's Complicated (Obviously)
&lt;/h2&gt;

&lt;p&gt;If you looked at the baseline numbers alone, you'd conclude ASP.NET Minimal is in a different league. And for raw request routing throughput, it is.&lt;/p&gt;

&lt;p&gt;But real applications don't just route requests. They serialize JSON, compress responses, handle file uploads, query databases, and juggle mixed workloads. And in those scenarios, GenHTTP frequently &lt;strong&gt;wins&lt;/strong&gt; — sometimes by significant margins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose ASP.NET Minimal if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Raw request throughput is critical&lt;/li&gt;
&lt;li&gt;You need HTTP/2 or HTTP/3 support&lt;/li&gt;
&lt;li&gt;You want the massive .NET ecosystem and tooling&lt;/li&gt;
&lt;li&gt;You're building at scale and need Microsoft's backing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose GenHTTP if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want a lightweight, embeddable server&lt;/li&gt;
&lt;li&gt;Memory efficiency matters (looking at you, upload numbers)&lt;/li&gt;
&lt;li&gt;Your workload is JSON-heavy, database-driven, or mixed&lt;/li&gt;
&lt;li&gt;You appreciate a higher-level API that handles footguns for you&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most interesting takeaway? A small indie framework with a handful of contributors is genuinely competitive with — and sometimes faster than — one of the most optimized web stacks on the planet. That's impressive engineering from the GenHTTP team.&lt;/p&gt;

&lt;p&gt;Performance isn't a single number. It's a conversation. And this one just got a lot more interesting.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;All benchmarks from &lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt; — an open-source, reproducible HTTP benchmark suite. Full source code and methodology available on &lt;a href="https://github.com/MDA2AV/HttpArena" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>csharp</category>
      <category>dotnet</category>
      <category>performance</category>
      <category>benchmarks</category>
    </item>
    <item>
      <title>Bun HTTP Server: #1 in Mixed Workloads, #41 in Pipelining — The Full Picture (HttpArena Deep Dive)</title>
      <dc:creator>Benny</dc:creator>
      <pubDate>Fri, 27 Mar 2026 18:34:40 +0000</pubDate>
      <link>https://dev.to/fbio_reis_355b87b508598e/bun-http-server-1-in-mixed-workloads-41-in-pipelining-the-full-picture-httparena-deep-dive-4h6e</link>
      <guid>https://dev.to/fbio_reis_355b87b508598e/bun-http-server-1-in-mixed-workloads-41-in-pipelining-the-full-picture-httparena-deep-dive-4h6e</guid>
      <description>&lt;p&gt;Every few months, someone posts "Bun is fast" on Twitter and the replies turn into a warzone. Node fans say it doesn't matter. Deno fans say their runtime is better. Rust folks just post flamegraphs.&lt;/p&gt;

&lt;p&gt;So let's look at actual numbers. I ran Bun through &lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt;, an open-source benchmark suite that tests HTTP frameworks across a bunch of real-world-ish scenarios — not just "hello world" in a loop. We're talking baseline throughput, pipelining, JSON serialization, compression, mixed workloads, uploads, noisy neighbor tolerance, and more.&lt;/p&gt;

&lt;p&gt;The results are... honestly fascinating. Bun is a study in contrasts.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Bun?
&lt;/h2&gt;

&lt;p&gt;If you've been living under a rock: &lt;a href="https://github.com/oven-sh/bun" rel="noopener noreferrer"&gt;Bun&lt;/a&gt; is a JavaScript/TypeScript runtime built from scratch using &lt;strong&gt;JavaScriptCore&lt;/strong&gt; (Safari's engine) instead of V8. It's written in Zig and aims to be a drop-in replacement for Node.js — but faster at everything.&lt;/p&gt;

&lt;p&gt;Its built-in HTTP server () skips the Node.js  module entirely and goes straight to optimized native code. In the HttpArena benchmark, the implementation uses  and spawns one Bun process per CPU core — simple multiprocess scaling with no clustering library needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Headline Numbers
&lt;/h2&gt;

&lt;p&gt;Let me just lay out where Bun landed across the key tests (at 4,096 connections unless noted):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Latency (avg)&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Baseline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#13/51&lt;/td&gt;
&lt;td&gt;1,557,305&lt;/td&gt;
&lt;td&gt;2.62ms&lt;/td&gt;
&lt;td&gt;2.2 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pipelined&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#41/51&lt;/td&gt;
&lt;td&gt;491,345&lt;/td&gt;
&lt;td&gt;106.30ms&lt;/td&gt;
&lt;td&gt;2.0 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JSON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#9/50&lt;/td&gt;
&lt;td&gt;708,960&lt;/td&gt;
&lt;td&gt;4.58ms&lt;/td&gt;
&lt;td&gt;2.9 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compression&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#2/49&lt;/td&gt;
&lt;td&gt;15,804&lt;/td&gt;
&lt;td&gt;251.28ms&lt;/td&gt;
&lt;td&gt;3.3 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mixed workload&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;#1/47&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;52,274&lt;/td&gt;
&lt;td&gt;72.41ms&lt;/td&gt;
&lt;td&gt;6.1 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Noisy neighbor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#11/47&lt;/td&gt;
&lt;td&gt;1,939,652&lt;/td&gt;
&lt;td&gt;1.25ms&lt;/td&gt;
&lt;td&gt;2.3 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Limited conn&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#6/51&lt;/td&gt;
&lt;td&gt;1,388,768&lt;/td&gt;
&lt;td&gt;2.85ms&lt;/td&gt;
&lt;td&gt;2.4 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Upload (256 conn)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#42/48&lt;/td&gt;
&lt;td&gt;264&lt;/td&gt;
&lt;td&gt;866.85ms&lt;/td&gt;
&lt;td&gt;10.3 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;H2 baseline (256)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#18/21&lt;/td&gt;
&lt;td&gt;378,032&lt;/td&gt;
&lt;td&gt;72.87ms&lt;/td&gt;
&lt;td&gt;2.2 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Read that again. &lt;strong&gt;#1 in mixed workloads. #2 in compression. But #41 in pipelining and #42 in uploads.&lt;/strong&gt; That's wild range for a single runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Bun Dominates
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mixed Workload: The Overall Champion
&lt;/h3&gt;

&lt;p&gt;The mixed workload test is the closest thing to a "real app" benchmark — it combines baseline requests, JSON serialization, compression, static file serving, and database queries all in one stream. And Bun sits at #1:&lt;/p&gt;

&lt;p&gt;Look at that. Bun beats go-fasthttp, which is usually a throughput monster. And it does it with &lt;strong&gt;6.1 GiB of RAM&lt;/strong&gt; vs go-fasthttp's absurd &lt;strong&gt;80.7 GiB&lt;/strong&gt;. That's over 13x more memory-efficient.&lt;/p&gt;

&lt;p&gt;Three of the top 5 run on the Bun runtime (bun, Elysia, Hono). The Bun ecosystem basically owns this test.&lt;/p&gt;

&lt;p&gt;Why? My theory: Bun's built-in gzip (), native JSON handling, and pre-loaded static files all contribute. When you mix these operations together, Bun's "everything is native" approach pays off vs frameworks that rely on separate npm packages for each concern.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compression: Natively Fast Gzip
&lt;/h3&gt;

&lt;p&gt;Bun's  with level 1 compression keeps it competitive with Rust's salvo and ahead of nearly everything else. Deno edges it out here (probably because Deno's compression pipeline is also quite optimized), but check the memory: Bun uses &lt;strong&gt;3.3 GiB&lt;/strong&gt; vs Deno's &lt;strong&gt;12.8 GiB&lt;/strong&gt;. Nearly 4x more efficient.&lt;/p&gt;

&lt;p&gt;The implementation is elegant too — pre-compute the JSON buffer once, compress per-request:&lt;/p&gt;

&lt;h3&gt;
  
  
  JSON Serialization: Top 10 Overall
&lt;/h3&gt;

&lt;p&gt;At #9/50 with &lt;strong&gt;708,960 rps&lt;/strong&gt;, Bun is the fastest JS/TS runtime for JSON workloads (tied with Elysia which runs on Bun). For context:&lt;/p&gt;

&lt;p&gt;Bun is &lt;strong&gt;~20% faster than Node&lt;/strong&gt; and &lt;strong&gt;~23% faster than Fastify&lt;/strong&gt; at JSON. Not the 10x improvement some marketing suggests, but a solid, consistent edge.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limited Connections &amp;amp; Noisy Neighbors
&lt;/h3&gt;

&lt;p&gt;Bun handles constrained scenarios well. At limited connections (#6/51, 1,388,768 rps), it beats everything in the JS/TS world by a comfortable margin — Elysia is next at #12. Under noisy neighbor conditions (#11/47, ~1.94M rps), Bun stays stable and leads the JS/TS pack again.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Bun Struggles
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pipelining: The Big Miss
&lt;/h3&gt;

&lt;p&gt;This is the elephant in the room. &lt;strong&gt;#41 out of 51 frameworks&lt;/strong&gt; in pipelining at 4,096 connections, with only 491,345 rps and 106ms average latency.&lt;/p&gt;

&lt;p&gt;Node.js is nearly &lt;strong&gt;5x faster&lt;/strong&gt; than Bun at pipelining. Even Express — &lt;em&gt;Express!&lt;/em&gt; — only drops to #42. The entire Bun ecosystem (bun, Elysia, Hono) clusters at the bottom.&lt;/p&gt;

&lt;p&gt;What's happening? HTTP pipelining sends multiple requests over a single TCP connection without waiting for responses. Bun's  likely processes requests one-at-a-time per connection rather than batching pipelined requests. Node's  module has years of pipelining optimization baked in.&lt;/p&gt;

&lt;p&gt;Is this a dealbreaker? Honestly, &lt;strong&gt;not for most real apps&lt;/strong&gt;. HTTP pipelining is rarely used in production (browsers don't even support it over HTTP/1.1 anymore). But if you're building an internal service-to-service API where clients pipeline aggressively, this matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Uploads: Surprisingly Weak
&lt;/h3&gt;

&lt;p&gt;At 256 concurrent connections uploading data, Bun lands at &lt;strong&gt;#42/48&lt;/strong&gt; with only 264 rps and 867ms latency, using 10.3 GiB of memory:&lt;/p&gt;

&lt;p&gt;Node is &lt;strong&gt;3.5x faster&lt;/strong&gt; at uploads. Express — the framework everyone loves to call slow — handles uploads 3.6x faster than Bun. This suggests Bun's request body reading () has significant overhead for large payloads, or there's a memory management issue when buffering upload data.&lt;/p&gt;

&lt;h3&gt;
  
  
  HTTP/2: Not There Yet
&lt;/h3&gt;

&lt;p&gt;Bun's H2 support exists but isn't competitive:&lt;/p&gt;

&lt;p&gt;Node.js beats Bun by almost &lt;strong&gt;4x&lt;/strong&gt; in HTTP/2. Even at higher connection counts (1,024), Bun only reaches 558,342 rps. If you're doing H2-heavy work, Node is the better runtime right now.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Bun Ecosystem" Effect
&lt;/h2&gt;

&lt;p&gt;One of the coolest things the data reveals: frameworks running &lt;strong&gt;on&lt;/strong&gt; Bun tend to perform very similarly to bare Bun. At 4,096 connections:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Baseline RPS&lt;/th&gt;
&lt;th&gt;JSON RPS&lt;/th&gt;
&lt;th&gt;Mixed RPS&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;bun (bare)&lt;/td&gt;
&lt;td&gt;1,557,305&lt;/td&gt;
&lt;td&gt;708,960&lt;/td&gt;
&lt;td&gt;52,274&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elysia&lt;/td&gt;
&lt;td&gt;1,458,341&lt;/td&gt;
&lt;td&gt;722,557&lt;/td&gt;
&lt;td&gt;51,251&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hono (Bun)&lt;/td&gt;
&lt;td&gt;1,242,917&lt;/td&gt;
&lt;td&gt;662,019&lt;/td&gt;
&lt;td&gt;49,378&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The abstraction cost of a framework on top of Bun is remarkably small — maybe 5-20%. Compare that to Node.js where Express is &lt;strong&gt;76% slower&lt;/strong&gt; than bare Node in baseline. Bun's  API is apparently so close to what frameworks need that there's minimal overhead in wrapping it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture: What Makes It Tick
&lt;/h2&gt;

&lt;p&gt;Looking at the &lt;a href="https://github.com/MDA2AV/HttpArena/tree/main/frameworks/bun" rel="noopener noreferrer"&gt;HttpArena implementation&lt;/a&gt;, a few things stand out:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-process via reusePort&lt;/strong&gt;: The entrypoint script spawns one Bun process per CPU core, each calling . The kernel load-balances connections across processes. Simple, effective, no IPC overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Everything pre-loaded&lt;/strong&gt;: Static files, datasets, and the SQLite database are all loaded at startup. The  endpoint re-processes the dataset per request (as the benchmark requires), but the raw data is already in memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bun.gzipSync() over zlib&lt;/strong&gt;: The compression endpoint uses Bun's native gzip instead of Node's zlib bindings. This is why compression performance is stellar — it's going through Zig's optimized zlib implementation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Minimal dependencies&lt;/strong&gt;: Only  for PostgreSQL. Everything else — HTTP serving, SQLite, gzip, file reading — uses Bun built-ins. Fewer layers, fewer places for overhead to hide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Should Use Bun?
&lt;/h2&gt;

&lt;p&gt;Based on these numbers, Bun is a strong choice if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Your app does a mix of things&lt;/strong&gt; (JSON, compression, static files, DB queries) — Bun literally wins this category&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want good JSON throughput&lt;/strong&gt; without leaving the JS/TS ecosystem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You care about memory efficiency&lt;/strong&gt; — Bun consistently uses less RAM than Node/Deno at similar throughput&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want minimal deps&lt;/strong&gt; — built-in SQLite, gzip, and HTTP server reduce your node_modules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think twice if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You need HTTP/2 performance&lt;/strong&gt; — Node is 4x faster today&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You handle lots of uploads&lt;/strong&gt; — Node/Express handle it much better&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You're building pipelining-heavy internal services&lt;/strong&gt; — unlikely, but if so, Node or Deno serve this better&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Bun isn't uniformly faster than everything — no runtime is. But it has a genuinely impressive &lt;strong&gt;performance shape&lt;/strong&gt;: it excels at the things most web apps actually do (mixed workloads, JSON, compression) while being memory-efficient. Its weaknesses (pipelining, uploads, H2) are in areas that matter less for typical web services.&lt;/p&gt;

&lt;p&gt;The real story isn't "Bun fast, Node slow." It's that Bun makes different tradeoffs. JavaScriptCore over V8. Native built-ins over npm packages. Simple multi-process over clustering. And for a lot of real-world use cases, those tradeoffs pay off beautifully.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;All data from &lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt; (&lt;a href="https://github.com/MDA2AV/HttpArena" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;). Tests run on identical hardware with standardized configurations. Check the repo for methodology and raw data.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Previous deep dives: &lt;a href="https://dev.to/fbio_reis_355b87b508598e/actix-web-1-in-15-out-of-22-tests-dissecting-the-benchmark-king-httparena-deep-dive-114g"&gt;Actix-web&lt;/a&gt; | &lt;a href="https://dev.to/fbio_reis_355b87b508598e/go-fasthttp-the-go-framework-that-dominates-mixed-workloads-httparena-deep-dive-23mh"&gt;go-fasthttp&lt;/a&gt; | &lt;a href="https://dev.to/fbio_reis_355b87b508598e/drogon-the-c-framework-that-tops-http2-benchmarks-and-where-it-struggles-3d20"&gt;Drogon&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>performance</category>
      <category>benchmarks</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Actix-web: #1 in 15 Out of 22 Tests — Dissecting the Benchmark King (HttpArena Deep Dive)</title>
      <dc:creator>Benny</dc:creator>
      <pubDate>Wed, 25 Mar 2026 14:22:15 +0000</pubDate>
      <link>https://dev.to/fbio_reis_355b87b508598e/actix-web-1-in-15-out-of-22-tests-dissecting-the-benchmark-king-httparena-deep-dive-114g</link>
      <guid>https://dev.to/fbio_reis_355b87b508598e/actix-web-1-in-15-out-of-22-tests-dissecting-the-benchmark-king-httparena-deep-dive-114g</guid>
      <description>&lt;p&gt;There's a framework that keeps showing up at the top of benchmark charts, and it's not written in C.&lt;/p&gt;

&lt;p&gt;Actix-web, Rust's battle-tested async web framework, just put up numbers in &lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt; that are genuinely hard to argue with. We're talking &lt;strong&gt;#1 overall in 15 out of the 22 test profiles it competed in&lt;/strong&gt;, across 47 frameworks. Not #1 among Rust frameworks. #1 overall.&lt;/p&gt;

&lt;p&gt;Let's dig into what's going on.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Actix-web?
&lt;/h2&gt;

&lt;p&gt;Actix-web is a Rust web framework built on top of the Tokio async runtime. It's been around since ~2017, making it one of the more mature options in the Rust ecosystem. Version 4 (the one tested here) dropped the actor model dependency that gave it its name — now it's just a really fast, really ergonomic async web framework.&lt;/p&gt;

&lt;p&gt;It uses rustls for TLS (no OpenSSL dependency), compiles with thin LTO and &lt;code&gt;-O3&lt;/code&gt;, and targets native CPU instructions. The HttpArena implementation runs one worker per CPU core with a backlog of 4096.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Headline Numbers
&lt;/h2&gt;

&lt;p&gt;Let's start with where actix absolutely dominates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Baseline (Plain HTTP/1.1)
&lt;/h3&gt;

&lt;p&gt;At 4,096 connections, actix hits &lt;strong&gt;2.61M requests/sec&lt;/strong&gt; with 1.57ms average latency and only 158MB of memory. For context, that puts it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;#6 overall&lt;/strong&gt; out of 47 frameworks (behind ringzero, h2o, nginx, blitz, and hyper)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;#2 among Rust frameworks&lt;/strong&gt; (hyper edges it out at 2.76M rps)&lt;/li&gt;
&lt;li&gt;Ahead of bun (1.56M), drogon (1.69M), and every Go framework&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But here's the thing — at 512 connections, actix climbs to &lt;strong&gt;2.49M rps&lt;/strong&gt; with a tiny 205μs average latency and 93MB RAM. The consistency across connection counts is impressive.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pipelined Requests — Where Actix Gets Scary
&lt;/h3&gt;

&lt;p&gt;This is where things get wild. With HTTP pipelining (16 requests per connection), actix hits:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;Requests/sec&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;20.4M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;400μs&lt;/td&gt;
&lt;td&gt;123MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;23.0M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.84ms&lt;/td&gt;
&lt;td&gt;220MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;21.4M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10.7ms&lt;/td&gt;
&lt;td&gt;689MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's &lt;strong&gt;23 million requests per second&lt;/strong&gt; at peak. #3 overall, behind only ringzero (46.8M, written in C) and blitz (39.5M, written in Zig). Actix beats hyper (16.3M), go-fasthttp (17.8M), and the entire JVM ecosystem in this test.&lt;/p&gt;

&lt;p&gt;For a framework that gives you routing, middleware, and a full request/response abstraction — doing 23M rps in pipelining is absurd.&lt;/p&gt;

&lt;h3&gt;
  
  
  JSON Serialization — The Practical Test
&lt;/h3&gt;

&lt;p&gt;The JSON test serializes a dataset, computes derived fields, and sends it back. This is closer to what a real API does.&lt;/p&gt;

&lt;p&gt;At 4,096 connections: &lt;strong&gt;1.13M rps&lt;/strong&gt;, pushing &lt;strong&gt;8.92 GB/s of bandwidth&lt;/strong&gt;. That's #3 overall, right behind hyper (1.17M) and nginx (1.14M). Actix is neck-and-neck with its own underlying HTTP library here.&lt;/p&gt;

&lt;p&gt;Interesting detail: actix uses &lt;code&gt;serde_json&lt;/code&gt; (Rust's standard JSON library) — no exotic SIMD JSON tricks. And it still hangs with nginx, which uses a highly optimized C JSON implementation. Rust's zero-cost abstractions are doing real work here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mixed Workload — The Real World Simulation
&lt;/h3&gt;

&lt;p&gt;The mixed test combines baseline requests, JSON serialization, database queries, file uploads, and compression — all hitting the server simultaneously. This is the closest thing to a production workload in HttpArena.&lt;/p&gt;

&lt;p&gt;At 4,096 connections:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;#2 overall&lt;/strong&gt;: 75,948 rps (52ms avg latency, 2.1GB RAM)&lt;/li&gt;
&lt;li&gt;Behind only go-fasthttp at 87,964 rps (but fasthttp uses 10.2GB RAM — &lt;strong&gt;5x more memory&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;Ahead of salvo (73.5K), bun (70.7K), and ultimate-express (63K)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At 16,384 connections, actix takes &lt;strong&gt;#1&lt;/strong&gt;: 157,549 rps. Go-fasthttp can't keep up at this connection count.&lt;/p&gt;

&lt;p&gt;The memory efficiency here is the real story. Actix handles a brutal mixed workload with 2.1GB while go-fasthttp needs 10.2GB and bun needs 5.4GB.&lt;/p&gt;

&lt;h3&gt;
  
  
  HTTP/2 Baseline
&lt;/h3&gt;

&lt;p&gt;Actix uses rustls for HTTP/2. At 256 connections: &lt;strong&gt;3.05M rps&lt;/strong&gt;, ranking #8 out of 21 HTTP/2-capable frameworks. h2o (C) leads at 14.1M, and hyper takes #2 at 8.15M.&lt;/p&gt;

&lt;p&gt;This is one of actix's weaker areas relatively speaking — the rustls + actix HTTP/2 implementation doesn't match h2o's purpose-built HTTP/2 stack. But 3M rps for HTTP/2 is still excellent in absolute terms.&lt;/p&gt;

&lt;h3&gt;
  
  
  Noisy Neighbor — Handling Bad Traffic
&lt;/h3&gt;

&lt;p&gt;The noisy test throws malformed requests, connection resets, and garbage traffic at the server alongside legitimate requests. It's a resilience test.&lt;/p&gt;

&lt;p&gt;Actix handles it beautifully: &lt;strong&gt;2.43M rps&lt;/strong&gt; at 4,096 connections (#5 overall), correctly returning 4xx for bad requests while maintaining throughput. Only the C trio (ringzero, h2o, nginx) and hyper beat it.&lt;/p&gt;

&lt;p&gt;Zero 5xx errors. Zero crashes. That's Rust's memory safety paying dividends — no segfaults from malformed input, no buffer overflows from garbage data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limited Connections — Efficiency Under Constraints
&lt;/h3&gt;

&lt;p&gt;With connection reuse disabled (every request opens a new TCP connection), actix hits &lt;strong&gt;1.07M rps&lt;/strong&gt; at 512 connections, ranking #8 overall. The connection setup overhead is real, but actix handles it gracefully with only 128MB of memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Actix Struggles
&lt;/h2&gt;

&lt;p&gt;No framework is perfect, and actix has clear weak spots.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compression — Room to Grow
&lt;/h3&gt;

&lt;p&gt;At 4,096 connections: 14,220 rps, &lt;strong&gt;#8 overall&lt;/strong&gt;. Not bad, but blitz (89K rps) is 6x faster, and even deno (17.7K) and bun (15.8K) outpace it.&lt;/p&gt;

&lt;p&gt;The culprit is likely the compression middleware implementation. Actix uses &lt;code&gt;flate2&lt;/code&gt; through its &lt;code&gt;compress-gzip&lt;/code&gt; feature — solid but not cutting-edge. The 5.7GB memory usage at 4K connections also suggests the compression pipeline could be more efficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  Uploads — The Weak Spot
&lt;/h3&gt;

&lt;p&gt;File uploads reveal actix's biggest weakness. At 256 connections: &lt;strong&gt;616 rps, #15 overall&lt;/strong&gt;. At 512 connections: 559 rps, #16. Spring JVM leads at 1,265 rps — more than double.&lt;/p&gt;

&lt;p&gt;The upload handler in the HttpArena implementation is simple (&lt;code&gt;web::Bytes&lt;/code&gt; → count length → respond), so this isn't a code issue. Actix's body parsing pipeline likely has overhead for large payloads that frameworks like Spring and nginx handle more efficiently.&lt;/p&gt;

&lt;h3&gt;
  
  
  H2 Static Files at Scale
&lt;/h3&gt;

&lt;p&gt;At 1,024 HTTP/2 connections serving static files: &lt;strong&gt;946K rps, #6&lt;/strong&gt;. Nginx (1.80M) and hyper (1.66M) are significantly faster. At lower connection counts actix does better (#2 at 64 connections with 1.35M rps), but it doesn't scale as well under HTTP/2 pressure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rust Showdown
&lt;/h2&gt;

&lt;p&gt;How does actix stack up against its Rust siblings?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;hyper&lt;/th&gt;
&lt;th&gt;actix&lt;/th&gt;
&lt;th&gt;salvo&lt;/th&gt;
&lt;th&gt;rocket&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Baseline 4K&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.76M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2.61M&lt;/td&gt;
&lt;td&gt;1.26M&lt;/td&gt;
&lt;td&gt;86K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pipelined 4K&lt;/td&gt;
&lt;td&gt;16.3M&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;23.0M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.3M&lt;/td&gt;
&lt;td&gt;176K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON 4K&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.17M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.13M&lt;/td&gt;
&lt;td&gt;781K&lt;/td&gt;
&lt;td&gt;44K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mixed 4K&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;75.9K&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;73.5K&lt;/td&gt;
&lt;td&gt;34.7K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compression 4K&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;14.2K&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;15.3K&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10.1K&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern is clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;hyper&lt;/strong&gt; wins raw throughput (it's the HTTP library actix is compared against, not built on — actix has its own HTTP implementation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;actix&lt;/strong&gt; wins pipelining and mixed workloads by a huge margin&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;salvo&lt;/strong&gt; is competitive in practical tests and wins compression&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;rocket&lt;/strong&gt; is... in a different league (trading performance for developer ergonomics)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Actix vs hyper is the most interesting comparison. Hyper is a lower-level HTTP library — less abstraction, less overhead. The fact that actix, with its routing, middleware stack, and request extraction pipeline, comes within 5-10% of hyper in most tests is remarkable. And in pipelining, actix actually crushes hyper by 41%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reading the Implementation
&lt;/h2&gt;

&lt;p&gt;Looking at the &lt;a href="https://github.com/MDA2AV/HttpArena/tree/main/frameworks/actix" rel="noopener noreferrer"&gt;actual HttpArena implementation&lt;/a&gt;, a few things stand out:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Smart caching&lt;/strong&gt;: The JSON large dataset is pre-serialized at startup (&lt;code&gt;build_json_cache&lt;/code&gt;) and served as raw bytes for the compression test. No re-serialization per request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-worker database connections&lt;/strong&gt;: Each actix worker gets its own SQLite connection with &lt;code&gt;PRAGMA mmap_size=268435456&lt;/code&gt; (256MB memory-mapped I/O). No connection pooling overhead, no cross-thread synchronization on DB access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Static header values&lt;/strong&gt;: The &lt;code&gt;SERVER&lt;/code&gt; header is a &lt;code&gt;static HeaderValue&lt;/code&gt; — allocated once, cloned cheaply. Small thing, but at 23M rps, small things matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compile-time optimization&lt;/strong&gt;: &lt;code&gt;codegen-units = 1&lt;/code&gt; + thin LTO + &lt;code&gt;target-cpu=native&lt;/code&gt; + panic=abort. This squeezes every last drop out of the compiler. The Dockerfile even uses &lt;code&gt;RUSTFLAGS="-C target-cpu=native"&lt;/code&gt; for native SIMD instructions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Middleware approach&lt;/strong&gt;: Compression uses &lt;code&gt;actix_web::middleware::Compress::default()&lt;/code&gt; — it's applied globally, so the compression endpoint benefits from the framework's built-in gzip handling rather than manual compression.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Should Use Actix-web?
&lt;/h2&gt;

&lt;p&gt;If you're building a Rust web service and care about performance, actix-web is the obvious choice. The numbers speak for themselves, but more importantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It's mature&lt;/strong&gt;: Version 4 has been stable for years. The ecosystem (middleware, extractors, websockets) is deep.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's ergonomic&lt;/strong&gt;: Compared to hyper (which requires you to handle everything manually), actix gives you routing, middleware, typed extractors, and a clean API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory efficiency&lt;/strong&gt;: Consistently low memory usage across all tests. When go-fasthttp needs 10GB for a mixed workload, actix does it in 2GB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Battle-tested&lt;/strong&gt;: Powers production systems at scale. Microsoft, for example, uses actix-web internally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The main trade-off is Rust itself — compile times, borrow checker learning curve, and a smaller hiring pool. But if you've already committed to Rust, actix-web should be your default choice for web APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Actix-web is the most complete high-performance web framework in the HttpArena benchmark. It doesn't always take #1 (the C frameworks and hyper beat it in raw throughput), but no other framework combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Top-6 baseline performance&lt;/li&gt;
&lt;li&gt;Top-3 pipelined throughput (23M rps!)&lt;/li&gt;
&lt;li&gt;Top-3 JSON serialization&lt;/li&gt;
&lt;li&gt;Top-2 mixed workload handling&lt;/li&gt;
&lt;li&gt;Excellent memory efficiency&lt;/li&gt;
&lt;li&gt;Full framework features (routing, middleware, extractors)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The only frameworks that consistently outperform it are either bare HTTP libraries (hyper, h2o) or purpose-built C/Zig systems (ringzero, blitz, nginx) that sacrifice developer ergonomics for raw speed.&lt;/p&gt;

&lt;p&gt;For a framework that gives you &lt;code&gt;#[get("/api/items")]&lt;/code&gt; syntax and middleware stacks, doing 23 million pipelined requests per second is not normal. Actix makes it look easy.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;All benchmarks from &lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt; — an open-source HTTP framework benchmark suite. Full results, methodology, and source code on &lt;a href="https://github.com/MDA2AV/HttpArena" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Got questions about the data or want to see another framework deep dive? Drop a comment!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>webdev</category>
      <category>performance</category>
      <category>benchmarks</category>
    </item>
    <item>
      <title>go-fasthttp: The Go Framework That Dominates Mixed Workloads (HttpArena Deep Dive)</title>
      <dc:creator>Benny</dc:creator>
      <pubDate>Fri, 20 Mar 2026 17:18:40 +0000</pubDate>
      <link>https://dev.to/fbio_reis_355b87b508598e/go-fasthttp-the-go-framework-that-dominates-mixed-workloads-httparena-deep-dive-23mh</link>
      <guid>https://dev.to/fbio_reis_355b87b508598e/go-fasthttp-the-go-framework-that-dominates-mixed-workloads-httparena-deep-dive-23mh</guid>
      <description>&lt;p&gt;If you've spent any time looking at Go HTTP performance, you've probably heard of &lt;a href="https://github.com/valyala/fasthttp" rel="noopener noreferrer"&gt;fasthttp&lt;/a&gt;. It's been around for years, and its pitch is simple: it's &lt;em&gt;way&lt;/em&gt; faster than net/http. But how does it actually stack up against everything else — not just Go frameworks, but Rust, C, Zig, and the whole zoo?&lt;/p&gt;

&lt;p&gt;I ran go-fasthttp through &lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt;, an open benchmark suite that tests frameworks across a bunch of realistic scenarios. Here's what I found.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is fasthttp?
&lt;/h2&gt;

&lt;p&gt;fasthttp is a high-performance HTTP library for Go built by &lt;a href="https://github.com/valyala" rel="noopener noreferrer"&gt;Aliaksandr Valialkin&lt;/a&gt;. Unlike Go's standard &lt;code&gt;net/http&lt;/code&gt;, it avoids allocations wherever possible. Instead of creating a new request/response object per request, it pools and reuses them. It's basically Go's answer to "what if we cared about garbage collection pressure?"&lt;/p&gt;

&lt;p&gt;The HttpArena implementation uses &lt;code&gt;reuseport&lt;/code&gt; to spin up one listener per CPU core, which is a neat trick — each goroutine gets its own socket listener, reducing lock contention.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Let's get into it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Baseline (Plain Text Responses)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Avg Latency&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;1,337,651&lt;/td&gt;
&lt;td&gt;382us&lt;/td&gt;
&lt;td&gt;#14/30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;1,478,446&lt;/td&gt;
&lt;td&gt;2.76ms&lt;/td&gt;
&lt;td&gt;#13/30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;1,310,081&lt;/td&gt;
&lt;td&gt;10.63ms&lt;/td&gt;
&lt;td&gt;#13/30&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Middle of the pack. Honestly, for baseline text responses, 1.3-1.5M RPS is solid — but ringzero (C) is hitting 3.4M at the top. The Rust and C frameworks dominate here. For plain "return a string" work, fasthttp lands in the top half but doesn't lead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compared to other Go frameworks:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;go-fasthttp: &lt;strong&gt;1,478,446 rps&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;caddy: 582,248 rps&lt;/li&gt;
&lt;li&gt;echo: 456,383 rps&lt;/li&gt;
&lt;li&gt;gin: 446,160 rps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's about &lt;strong&gt;3x faster&lt;/strong&gt; than standard Go HTTP frameworks. That's the fasthttp promise delivering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pipelined Requests — Where fasthttp Shines
&lt;/h3&gt;

&lt;p&gt;This is where things get interesting.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;16,786,953&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;#4/30&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;17,808,031&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;#4/30&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;16,403,972&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;#4/30&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;17.8 million requests per second.&lt;/strong&gt; That's not a typo. fasthttp handles HTTP pipelining exceptionally well. It sits right behind ringzero (C, 46.8M), blitz (Zig, 39.5M), and actix (Rust, 23M).&lt;/p&gt;

&lt;p&gt;Top 5 at 4,096 connections:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;ringzero (C) — 46,803,504 rps&lt;/li&gt;
&lt;li&gt;blitz (Zig) — 39,534,054 rps&lt;/li&gt;
&lt;li&gt;actix (Rust) — 23,001,200 rps&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;go-fasthttp (Go) — 17,808,031 rps&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;hyper (Rust) — 16,273,142 rps&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Being #4 overall and beating hyper (Rust!) at pipelining is impressive. The zero-allocation design really pays off when you're hammering the same connection with sequential requests. Meanwhile, gin and echo are down at ~1M rps — a &lt;strong&gt;17x&lt;/strong&gt; gap.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mixed Workload — The Crown Jewel
&lt;/h3&gt;

&lt;p&gt;Here's the headline result:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;87,964&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;#1/27&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;164,178&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;#1/27&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;go-fasthttp wins the mixed workload test.&lt;/strong&gt; Both connection levels. Beating actix, beating nginx, beating everything.&lt;/p&gt;

&lt;p&gt;Top 5 at 16,384 connections:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;go-fasthttp (Go) — 164,178 rps&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;actix (Rust) — 157,549 rps&lt;/li&gt;
&lt;li&gt;salvo (Rust) — 67,520 rps&lt;/li&gt;
&lt;li&gt;bun (TS) — 64,614 rps&lt;/li&gt;
&lt;li&gt;node (JS) — 55,988 rps&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The mixed workload combines baseline requests, JSON processing, compression, uploads, and database queries into a single test. It's the closest thing to a "real-world" scenario in the suite. And fasthttp tops it — barely edging out actix at 16K connections, and more convincingly at 4K (87,964 vs 75,948 rps).&lt;/p&gt;

&lt;p&gt;This tells us something important: fasthttp's architecture handles diverse workloads better than almost anything else. It's not just fast at one thing.&lt;/p&gt;

&lt;h3&gt;
  
  
  JSON Processing
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;311,030&lt;/td&gt;
&lt;td&gt;#18/29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;265,535&lt;/td&gt;
&lt;td&gt;#22/29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32,768&lt;/td&gt;
&lt;td&gt;149,159&lt;/td&gt;
&lt;td&gt;#10/14&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is the weak spot. Bottom half in JSON serialization. Go's &lt;code&gt;encoding/json&lt;/code&gt; is famously not great, and it shows here. nginx leads at 1.18M rps (it's serving pre-computed static JSON), and actix with serde is at ~1M.&lt;/p&gt;

&lt;p&gt;For the Go comparison: fasthttp still beats caddy (308K), gin (169K), and echo (158K) — but only by a modest margin on JSON. The serialization bottleneck is the equalizer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compression
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;14,771&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;#5/28&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;13,736&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;#5/28&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Solid #5 finish. The implementation uses &lt;code&gt;compress/flate&lt;/code&gt; with &lt;code&gt;BestSpeed&lt;/code&gt; level — trading compression ratio for throughput. blitz (Zig) leads at 89K rps, then actix, salvo, and h2o. But fasthttp holds its own at nearly 15K rps, which is almost 2x the caddy result (8,147 rps).&lt;/p&gt;

&lt;h3&gt;
  
  
  Upload Handling
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;td&gt;910&lt;/td&gt;
&lt;td&gt;#6/29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;842&lt;/td&gt;
&lt;td&gt;mid-table&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Mid-table for uploads. spring-jvm actually wins this category at 1,294 rps — the JVM's buffered I/O handling pays off for large body ingestion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limited Connections — The Achilles' Heel
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;147,847&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;#26/30&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;636,185&lt;/td&gt;
&lt;td&gt;#13/30&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Oof. At 512 connections, fasthttp drops to #26/30. h2o leads at 1.6M rps, and most frameworks do significantly better. This is surprising — you'd expect a fast framework to shine with fewer connections too.&lt;/p&gt;

&lt;p&gt;The likely culprit? fasthttp's architecture is optimized for high concurrency. The &lt;code&gt;reuseport&lt;/code&gt; multi-listener design and goroutine-per-connection model have overhead that doesn't amortize well with limited connections. At 4,096 connections it recovers to #13, which matches baseline performance. This framework wants a lot of concurrent work to hit its stride.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reading the Source Code
&lt;/h2&gt;

&lt;p&gt;The HttpArena implementation reveals some interesting architectural choices:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SO_REUSEPORT with per-CPU listeners:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;numCPU&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;ln&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;reuseport&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tcp4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;":8080"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;fasthttp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Handler&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ln&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each CPU core gets its own listener socket. The kernel distributes incoming connections across them. This avoids the thundering herd problem and reduces lock contention — a big deal at high concurrency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Manual path-based routing:&lt;/strong&gt; No router library. Just a switch on &lt;code&gt;ctx.Path()&lt;/code&gt;. Zero overhead from regex matching or tree traversal. For a benchmark this makes sense, but it also shows the performance ceiling when you strip away routing abstractions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pre-computed large JSON responses:&lt;/strong&gt; The compression endpoint pre-marshals the JSON during startup and serves the cached bytes. Smart — you avoid re-serializing on every request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory usage:&lt;/strong&gt; 188 MiB for baseline, 716 MiB for JSON, but &lt;strong&gt;10.2 GiB for mixed workloads&lt;/strong&gt;. The garbage collector is working hard under mixed load. That's the Go trade-off — the runtime manages memory for you, but at scale it adds up.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who Should Use fasthttp?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Great fit if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're building Go services that need raw throughput&lt;/li&gt;
&lt;li&gt;Your workload involves lots of concurrent connections&lt;/li&gt;
&lt;li&gt;You need pipelining support (API gateways, proxies)&lt;/li&gt;
&lt;li&gt;You want to stay in the Go ecosystem but need better performance than net/http&lt;/li&gt;
&lt;li&gt;Mixed workloads (the typical real-world pattern)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Maybe look elsewhere if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have few connections and need maximum per-connection throughput&lt;/li&gt;
&lt;li&gt;JSON serialization is your bottleneck (consider Rust frameworks or use a faster JSON lib like sonic/jsoniter)&lt;/li&gt;
&lt;li&gt;You need HTTP/2 or HTTP/3 (fasthttp is HTTP/1.1 only)&lt;/li&gt;
&lt;li&gt;You want a batteries-included framework with routing, middleware, etc. (that's echo/gin territory)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;go-fasthttp is a fascinating framework. It's not the fastest at any single micro-benchmark (C and Zig own that space), but it's &lt;strong&gt;the most versatile performer&lt;/strong&gt; in the entire HttpArena suite. Winning the mixed workload test — beating actix, nginx, and h2o — is a big deal because that's the test that most resembles real production traffic.&lt;/p&gt;

&lt;p&gt;The pipelining numbers (17.8M rps, #4 overall) show its architecture is fundamentally sound. The limited-connection weakness is real but only matters in specific scenarios. And being 3-17x faster than standard Go frameworks (gin, echo, caddy) makes it the clear choice for performance-sensitive Go work.&lt;/p&gt;

&lt;p&gt;Just watch out for that &lt;code&gt;encoding/json&lt;/code&gt; bottleneck. Pair fasthttp with a faster serializer and you might climb even higher.&lt;/p&gt;

&lt;p&gt;All data from &lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt; (&lt;a href="https://github.com/MDA2AV/HttpArena" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;). Check the full results if you want to compare frameworks yourself.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What framework should I deep-dive next? Drop a comment!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>webdev</category>
      <category>performance</category>
      <category>benchmarks</category>
    </item>
    <item>
      <title>Drogon: The C++ Framework That Tops HTTP/2 Benchmarks (And Where It Struggles)</title>
      <dc:creator>Benny</dc:creator>
      <pubDate>Tue, 17 Mar 2026 14:26:37 +0000</pubDate>
      <link>https://dev.to/fbio_reis_355b87b508598e/drogon-the-c-framework-that-tops-http2-benchmarks-and-where-it-struggles-3d20</link>
      <guid>https://dev.to/fbio_reis_355b87b508598e/drogon-the-c-framework-that-tops-http2-benchmarks-and-where-it-struggles-3d20</guid>
      <description>&lt;p&gt;I've been digging through &lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt; benchmark data lately — it's an open benchmark suite that tests HTTP frameworks across a bunch of realistic scenarios — and Drogon caught my eye. It's quietly one of the most interesting performers in the entire dataset.&lt;/p&gt;

&lt;p&gt;Let me walk you through what I found.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Drogon?
&lt;/h2&gt;

&lt;p&gt;Drogon is a C++ web framework built on top of its own async networking library (Trantor). It's been around since 2018, and it's designed for high-performance HTTP services. Think of it as what you'd reach for if you need raw C++ speed but don't want to hand-roll everything from scratch.&lt;/p&gt;

&lt;p&gt;The HttpArena implementation uses Drogon v1.9.10, compiled with &lt;code&gt;-O3 -flto&lt;/code&gt; (link-time optimization), running on Ubuntu 24.04. C++17, CMake build, nothing exotic.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Baseline (Plain Text Response)
&lt;/h3&gt;

&lt;p&gt;In the standard baseline test, Drogon lands &lt;strong&gt;#7 out of 30 frameworks&lt;/strong&gt; across all connection levels:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Avg Latency&lt;/th&gt;
&lt;th&gt;P99 Latency&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;1,928,561&lt;/td&gt;
&lt;td&gt;264μs&lt;/td&gt;
&lt;td&gt;1.61ms&lt;/td&gt;
&lt;td&gt;81.7 MiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;2,249,513&lt;/td&gt;
&lt;td&gt;1.82ms&lt;/td&gt;
&lt;td&gt;9.67ms&lt;/td&gt;
&lt;td&gt;129.4 MiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;2,087,751&lt;/td&gt;
&lt;td&gt;7.57ms&lt;/td&gt;
&lt;td&gt;42.80ms&lt;/td&gt;
&lt;td&gt;314.6 MiB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For context, the top 10 at 4,096 connections looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;ringzero (C) — 3,452,370&lt;/li&gt;
&lt;li&gt;h2o (C) — 3,162,875&lt;/li&gt;
&lt;li&gt;blitz (Zig) — 3,071,375&lt;/li&gt;
&lt;li&gt;nginx (C) — 3,028,812&lt;/li&gt;
&lt;li&gt;hyper (Rust) — 2,942,685&lt;/li&gt;
&lt;li&gt;actix (Rust) — 2,711,945&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;drogon (C++) — 2,249,513&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;kemal (Crystal) — 2,154,014&lt;/li&gt;
&lt;li&gt;quarkus-jvm (Java) — 2,102,344&lt;/li&gt;
&lt;li&gt;bun (TS) — 1,956,298&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Solid company. Drogon is the only C++ framework in the benchmark and it's hanging with the Rust and C heavyweights.&lt;/p&gt;

&lt;h3&gt;
  
  
  HTTP/2 — Where Drogon Shines ✨
&lt;/h3&gt;

&lt;p&gt;Here's where things get really interesting. In the HTTP/2 baseline tests:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10,631,440&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;#1/14&lt;/strong&gt; 🏆&lt;/td&gt;
&lt;td&gt;98.6 MiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;td&gt;6,725,340&lt;/td&gt;
&lt;td&gt;#3/16&lt;/td&gt;
&lt;td&gt;155.7 MiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1,024&lt;/td&gt;
&lt;td&gt;6,859,540&lt;/td&gt;
&lt;td&gt;#3/16&lt;/td&gt;
&lt;td&gt;357.0 MiB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Drogon takes first place in the HTTP/2 baseline with 64 connections&lt;/strong&gt;, pushing over 10.6 million requests per second. At that concurrency level, it beats hyper (6.88M), h2o (which dominates at higher concurrency), and everything else. It's also serving at 1.48 GB/s bandwidth while using under 100 MiB of memory.&lt;/p&gt;

&lt;p&gt;Even at higher concurrency where h2o takes the lead (14M+ RPS), Drogon stays comfortably in the top 3.&lt;/p&gt;

&lt;h3&gt;
  
  
  Static File Serving over HTTP/2
&lt;/h3&gt;

&lt;p&gt;Drogon's HTTP/2 dominance extends to static files too:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Bandwidth&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1,813,238&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;#1/14&lt;/strong&gt; 🏆&lt;/td&gt;
&lt;td&gt;27.77 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;td&gt;1,546,328&lt;/td&gt;
&lt;td&gt;#2/16&lt;/td&gt;
&lt;td&gt;23.66 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1,024&lt;/td&gt;
&lt;td&gt;1,018,221&lt;/td&gt;
&lt;td&gt;#5/16&lt;/td&gt;
&lt;td&gt;15.57 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Another first-place finish at 64 connections, beating actix by a significant margin (1.81M vs 1.35M). The bandwidth numbers are massive — nearly 28 GB/s of static content over HTTP/2.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pipelined Requests
&lt;/h3&gt;

&lt;p&gt;Pipelining shows Drogon's solid but not spectacular HTTP/1.1 parsing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;7,828,214&lt;/td&gt;
&lt;td&gt;#9/30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;7,612,822&lt;/td&gt;
&lt;td&gt;#9/30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;7,260,243&lt;/td&gt;
&lt;td&gt;#9/30&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Consistently 9th place across all concurrency levels. The gap to the top is real though — ringzero hits 47M RPS pipelined, roughly 6x what Drogon manages. But 7.8M pipelined RPS is nothing to sneeze at for a full-featured framework.&lt;/p&gt;

&lt;h3&gt;
  
  
  JSON Serialization — The Plot Twist
&lt;/h3&gt;

&lt;p&gt;Okay, this is where the story gets complicated. In the JSON test (serialize a 50-item dataset):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;128,946&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;#26/29&lt;/strong&gt; 😬&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;124,793&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;#26/29&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's... not great. Drogon drops to near the bottom of the pack for JSON serialization. For context, nginx (using its native JSON module) hits 1.18M RPS in the same test. Even Flask manages 107K — Drogon is barely ahead of it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But here's the twist.&lt;/strong&gt; At 32,768 connections, Drogon jumps to &lt;strong&gt;#3 out of 14&lt;/strong&gt; with 933,156 RPS. The framework seems to have a specific performance cliff at moderate connection counts for JSON workloads, then recovers dramatically at very high concurrency.&lt;/p&gt;

&lt;p&gt;Looking at the implementation, the likely culprit is jsoncpp. Drogon uses jsoncpp for JSON serialization, which is known to be one of the slower JSON libraries in C++. The code builds each JSON response by constructing &lt;code&gt;Json::Value&lt;/code&gt; objects field by field, then serializes with &lt;code&gt;Json::StreamWriterBuilder&lt;/code&gt;. At lower concurrency where the CPU isn't fully utilized across all event loop threads, the per-request serialization overhead dominates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compression
&lt;/h3&gt;

&lt;p&gt;This is Drogon's worst showing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;4,348&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;#24/28&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;4,173&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;#23/28&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Only 4K RPS with gzip compression enabled. The memory usage spikes to 556 MiB and CPU pegs at 12,153%. Drogon uses zlib for compression, and compressing the large JSON response on every request absolutely tanks throughput. The top performer here (blitz) manages 89K RPS — over 20x more.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mixed Workload
&lt;/h3&gt;

&lt;p&gt;The mixed test hits multiple endpoints in a realistic traffic pattern:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Connections&lt;/th&gt;
&lt;th&gt;RPS&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;21,593&lt;/td&gt;
&lt;td&gt;#16/17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,096&lt;/td&gt;
&lt;td&gt;22,858&lt;/td&gt;
&lt;td&gt;#20/27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16,384&lt;/td&gt;
&lt;td&gt;22,100&lt;/td&gt;
&lt;td&gt;#20/27&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Bottom half of the field. The mixed workload combines plain text, JSON, compression, static files, and database queries — and the JSON/compression weakness drags the composite score down significantly. go-fasthttp leads here with 87K RPS at 4,096 connections.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limited Connections &amp;amp; Noisy Neighbor
&lt;/h3&gt;

&lt;p&gt;Drogon recovers nicely in constrained scenarios:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limited connections (512):&lt;/strong&gt; #6/30 with 1,251,259 RPS&lt;br&gt;
&lt;strong&gt;Limited connections (4096):&lt;/strong&gt; #4/30 with 1,646,234 RPS&lt;br&gt;
&lt;strong&gt;Noisy neighbor (4096):&lt;/strong&gt; #6/30 with 1,965,305 RPS&lt;/p&gt;

&lt;p&gt;When the playing field is leveled by connection limits or background noise, Drogon's efficient event loop and low per-connection overhead keep it competitive.&lt;/p&gt;
&lt;h2&gt;
  
  
  Architecture Deep Dive
&lt;/h2&gt;

&lt;p&gt;Looking at the &lt;a href="https://github.com/MDA2AV/HttpArena/tree/main/frameworks/drogon" rel="noopener noreferrer"&gt;HttpArena implementation&lt;/a&gt;, a few things stand out:&lt;/p&gt;
&lt;h3&gt;
  
  
  Thread-Local SQLite
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;thread_local&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tl_db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;thread_local&lt;/span&gt; &lt;span class="n"&gt;sqlite3_stmt&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tl_stmt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each event loop thread gets its own SQLite connection with a pre-prepared statement. No mutex contention, no connection pooling overhead. The &lt;code&gt;PRAGMA mmap_size=268435456&lt;/code&gt; enables memory-mapped I/O for the database file. Clean approach.&lt;/p&gt;
&lt;h3&gt;
  
  
  Pre-loaded Everything
&lt;/h3&gt;

&lt;p&gt;Datasets and static files are loaded entirely into memory at startup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;DataItem&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="n"&gt;json_large_response&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;unordered_map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StaticFile&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;static_files&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The large JSON response is pre-serialized once and served as a raw string. Static files sit in an &lt;code&gt;unordered_map&lt;/code&gt; for O(1) lookups. This is why the static file serving numbers are so good — there's zero disk I/O.&lt;/p&gt;

&lt;h3&gt;
  
  
  Async Callback Pattern
&lt;/h3&gt;

&lt;p&gt;Drogon uses the classic async callback style:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;HttpRequestPtr&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;HttpResponsePtr&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;HttpResponse&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;newHttpResponse&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;setBody&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ok"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;setContentTypeCode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CT_TEXT_PLAIN&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No coroutines, no futures — just raw callbacks. This keeps the overhead minimal but makes complex async chains harder to write. Drogon does support coroutines in newer versions, but this benchmark sticks with callbacks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build Optimization
&lt;/h3&gt;

&lt;p&gt;The Dockerfile shows Drogon built from source with LTO enabled, and the app compiled with &lt;code&gt;-O3 -flto&lt;/code&gt;. Notably, drogon itself is built with &lt;code&gt;-DBUILD_ORM=OFF -DBUILD_BROTLI=OFF&lt;/code&gt; — stripping out unused features for a leaner binary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Should Use Drogon?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Good fit if you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Already have a C++ codebase and need HTTP endpoints&lt;/li&gt;
&lt;li&gt;Need excellent HTTP/2 performance (seriously, those numbers are elite)&lt;/li&gt;
&lt;li&gt;Want a mature, feature-complete framework (ORM, WebSocket, middleware, etc.)&lt;/li&gt;
&lt;li&gt;Need low memory usage under moderate load (~80-130 MiB)&lt;/li&gt;
&lt;li&gt;Serve mostly static or pre-computed content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Maybe look elsewhere if you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Need fast JSON serialization (consider Rust/actix or use a faster JSON lib)&lt;/li&gt;
&lt;li&gt;Need strong gzip compression throughput&lt;/li&gt;
&lt;li&gt;Prefer modern async patterns over callbacks&lt;/li&gt;
&lt;li&gt;Want a large ecosystem and community (Drogon's is growing but still niche)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;Drogon is a framework of extremes. Its HTTP/2 performance is genuinely best-in-class — taking #1 in both baseline and static file tests at moderate concurrency. The plain HTTP/1.1 baseline numbers are consistently top-10 across all concurrency levels. Memory efficiency is excellent.&lt;/p&gt;

&lt;p&gt;But the JSON serialization bottleneck is real and dramatic. Dropping from top-7 in baseline to #26 in JSON tests is a stark reminder that framework performance isn't one-dimensional. The jsoncpp dependency is the obvious weak link — swapping it for simgleson, rapidjson, or even nlohmann/json could dramatically change those numbers.&lt;/p&gt;

&lt;p&gt;If you're building an HTTP/2 service that mostly serves pre-computed or static content, Drogon might be the fastest option available. If you're building a JSON API that serializes data on every request... you might want to benchmark carefully first.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;All benchmark data from &lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt; (&lt;a href="https://github.com/MDA2AV/HttpArena" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;). Tests run under controlled conditions with consistent hardware across all frameworks.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>performance</category>
      <category>benchmarks</category>
      <category>cpp</category>
    </item>
    <item>
      <title>Why We Built HttpArena — A Better Way to Benchmark HTTP Frameworks</title>
      <dc:creator>Benny</dc:creator>
      <pubDate>Sun, 15 Mar 2026 14:06:39 +0000</pubDate>
      <link>https://dev.to/fbio_reis_355b87b508598e/why-we-built-httparena-a-better-way-to-benchmark-http-frameworks-j94</link>
      <guid>https://dev.to/fbio_reis_355b87b508598e/why-we-built-httparena-a-better-way-to-benchmark-http-frameworks-j94</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Every framework claims to be fast. Blog posts benchmark X vs Y with a single plaintext endpoint, one concurrency level, one metric. The results are interesting for five minutes, then a new version ships and everything changes.&lt;/p&gt;

&lt;p&gt;The real question developers face is harder: &lt;strong&gt;which framework performs best for &lt;em&gt;my&lt;/em&gt; workload?&lt;/strong&gt; And nobody can answer that, because most benchmarks don't test real workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is HttpArena?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;HttpArena&lt;/a&gt; is an open-source benchmarking platform that tests HTTP frameworks across &lt;strong&gt;16 different test profiles&lt;/strong&gt; on dedicated, reproducible hardware. No cloud VMs. No noisy neighbors. Same machine, same load generator, same conditions for every framework.&lt;/p&gt;

&lt;p&gt;The source is on &lt;a href="https://github.com/MDA2AV/HttpArena" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and the results are live at &lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;mda2av.github.io/HttpArena&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why 16 Test Profiles?
&lt;/h2&gt;

&lt;p&gt;This is the core idea. A single "requests per second" number is almost meaningless without context. HttpArena tests frameworks across a range of realistic scenarios:&lt;/p&gt;

&lt;h3&gt;
  
  
  Connection Behavior
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Baseline&lt;/strong&gt; at 512, 4K, 16K, and 32K concurrent connections — how does performance scale as you push connection counts higher?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipelined&lt;/strong&gt; — HTTP pipelining with 16 requests per connection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited connections&lt;/strong&gt; — connection reuse under constrained pools&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Real Workloads
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;JSON processing&lt;/strong&gt; — parse a dataset, compute derived fields, serialize the response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compression&lt;/strong&gt; — gzip a large payload on the fly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upload&lt;/strong&gt; — handle incoming request bodies of varying sizes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database&lt;/strong&gt; — SQLite queries under concurrent load&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Resilience
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Noisy&lt;/strong&gt; — a mix of valid requests, bad methods, and nonexistent paths. Does the server stay stable?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mixed&lt;/strong&gt; — all endpoint types hit concurrently. This is closest to real-world traffic.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Protocols
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HTTP/2 and HTTP/3&lt;/strong&gt; — for frameworks that support them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static file serving&lt;/strong&gt; over H2&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gRPC&lt;/strong&gt; — unary calls with and without TLS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebSocket&lt;/strong&gt; — echo server performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A framework that dominates at plaintext might fall apart under JSON serialization. One that handles 512 connections beautifully might choke at 32K. One that aces every individual test might have contention issues when all endpoints are hit simultaneously. &lt;strong&gt;You don't see any of this with single-profile benchmarks.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes It Different
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Reproducibility
&lt;/h3&gt;

&lt;p&gt;Every framework runs in a Docker container on the same dedicated hardware. The Dockerfiles, source code, and test configurations are all in the repo. Anyone can clone it and reproduce the results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Correctness First
&lt;/h3&gt;

&lt;p&gt;Before any performance testing happens, every framework goes through an &lt;strong&gt;18-point validation suite&lt;/strong&gt; that checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Correct arithmetic on query params and request bodies&lt;/li&gt;
&lt;li&gt;Anti-cheat with randomized inputs (no hardcoded responses)&lt;/li&gt;
&lt;li&gt;Proper HTTP status codes (404 for missing routes, 4xx for bad methods)&lt;/li&gt;
&lt;li&gt;Correct Content-Type headers&lt;/li&gt;
&lt;li&gt;Valid JSON processing with computed fields&lt;/li&gt;
&lt;li&gt;Gzip compression that actually compresses&lt;/li&gt;
&lt;li&gt;Resilience under malformed requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your framework doesn't pass validation, it doesn't get benchmarked. Performance numbers are useless if the server isn't doing the work correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Apples to Apples
&lt;/h3&gt;

&lt;p&gt;Every framework implements the same endpoints with the same behavior. The JSON endpoint processes the same dataset. The compression endpoint gzips the same payload. The database endpoint runs the same queries. The only variable is the framework itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Growing Framework List
&lt;/h3&gt;

&lt;p&gt;We currently test &lt;strong&gt;35+ frameworks&lt;/strong&gt; across languages including Rust, Go, C, C++, Java, C#, JavaScript (Node, Bun, Deno), Python, Ruby, Lua, and more. New frameworks are being added regularly — recent additions include Crystal, Zig, Nim, Swift, and Gleam.&lt;/p&gt;

&lt;h2&gt;
  
  
  Add Your Framework
&lt;/h2&gt;

&lt;p&gt;Adding a framework is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create a Dockerfile&lt;/strong&gt; — multi-stage build, minimal runtime image&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement the endpoints&lt;/strong&gt; — &lt;code&gt;/baseline11&lt;/code&gt;, &lt;code&gt;/pipeline&lt;/code&gt;, &lt;code&gt;/json&lt;/code&gt;, &lt;code&gt;/compression&lt;/code&gt;, &lt;code&gt;/upload&lt;/code&gt;, and optionally &lt;code&gt;/db&lt;/code&gt;, &lt;code&gt;/baseline2&lt;/code&gt; (H2), &lt;code&gt;/static/*&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add a &lt;code&gt;meta.json&lt;/code&gt;&lt;/strong&gt; — declare which test profiles your framework subscribes to&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open a PR&lt;/strong&gt; — validation runs automatically in CI&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Look at any existing framework in the &lt;code&gt;frameworks/&lt;/code&gt; directory for a working example. The whole process takes about an hour if you know your framework well.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Is This For?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Developers choosing a framework&lt;/strong&gt; — see how candidates perform across diverse workloads, not just plaintext&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Framework authors&lt;/strong&gt; — get multi-dimensional performance data and a standardized way to compare against the ecosystem&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance engineers&lt;/strong&gt; — reproducible, open-source benchmarks you can run on your own hardware&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The curious&lt;/strong&gt; — sometimes you just want to know how fast things can go&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Check It Out
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live results:&lt;/strong&gt; &lt;a href="https://mda2av.github.io/HttpArena/" rel="noopener noreferrer"&gt;mda2av.github.io/HttpArena&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source &amp;amp; contribute:&lt;/strong&gt; &lt;a href="https://github.com/MDA2AV/HttpArena" rel="noopener noreferrer"&gt;github.com/MDA2AV/HttpArena&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're building this in the open and actively welcoming contributions. If your favorite framework isn't represented yet, come add it. If you think our methodology could be better, open an issue. The goal is to give the community the most useful, honest benchmark data possible.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>performance</category>
      <category>opensource</category>
      <category>http</category>
    </item>
  </channel>
</rss>
