Over the last few days I’ve been stress-testing PAGI::Server, an async HTTP server built on IO::Async, Future::IO, and an EV event loop. The goal wasn’t to chase synthetic “requests per second” numbers, but to answer harder questions:
- Does streaming actually behave correctly?
- What happens under sustained concurrency?
- Do slow clients cause buffer bloat?
- Does backpressure really work?
- How does it fail under overload?
This post summarizes what I learned by pushing the server well past “normal” workloads and watching it misbehave — or not.
Architecture in one paragraph
PAGI::Server is:
- single-process, event-driven by default
- optionally multi-process via workers
- fully async end-to-end (no threads)
- streaming-first (responses are not buffered by default)
- designed to support ASGI-like semantics in Perl
The server accepts connections in a main event loop and routes work through IO::Async::Stream objects. Application code produces responses via a $send callback that emits protocol events (http.response.start, http.response.body, etc.).
That last part matters, because it’s where most async servers quietly cheat.
An important note: Although PAGI::Server is IO::Async based, that is an implementation detail, not a core requirement of the PAGI specification. That said I build PAGI::Server on IO::Async because it's mature, battle tested and fully featured.
Baseline: fast responses
Before touching streaming, I tested a trivial HTTP handler.
On an older 16-core MacBook Pro:
- ~22–23k requests/sec
- p50 latency ~20ms
- p99 < 50ms
- CPU mostly idle
Nothing surprising here — just confirming there’s no accidental serialization or blocking in the accept loop.
Just to set some baselines, I tested a simple, similar PSGI application using Starman, one of the most popular choices for serving PSGI apps, and I found it topped out at around 16K requests/sec, with significantly worse latency spread and dropped connections.
Streaming test: 2-second responses, 500 concurrent clients
Next, I switched to a streaming handler that sends chunks over ~2 seconds.
With 500 concurrent connections, single process:
- ~242–245 requests/sec
- p50 ≈ 2.00s
- p99 ≈ 2.3–2.4s
- CPU ~10–20% busy
- memory stable
This matters because the math checks out:
500 concurrent / 2 seconds ≈ 250 rps theoretical max
Hitting ~97–98% of theoretical max strongly suggests:
- no head-of-line blocking
- no per-connection threads
- no accidental buffering
The server is doing what an async server should do: waiting, not working.
A gotcha: TTY logging will ruin your benchmark
One early run looked worse than expected. The cause was embarrassingly simple:
Access logging was printing to a terminal (TTY).
At ~20k+ log lines/sec, the terminal became the bottleneck.
Once logging was disabled or redirected:
- CPU usage dropped dramatically
- latency tails tightened
- throughput returned to theoretical limits
Lesson: never benchmark with synchronous TTY logging enabled.
Overload behavior: what happens at 2500 concurrent streams?
With ulimit -n = 2048, single-process PAGI::Server started rejecting connections around ~2000 open sockets.
That rejection path returned:
503 Service UnavailableRetry-After: 5Connection: close
This is good behavior.
Instead of accepting everything and melting down, the server applied admission control based on available file descriptors.
The client (hey) complained with “unsolicited response” warnings — a known artifact when clients aggressively reuse keep-alive connections under churn. The server was behaving correctly.
Scaling out: 16 workers
With 16 worker processes, the same workload:
- sustained ~1200 requests/sec at 2500 concurrency
- p50 ≈ 2.00s
- p99 ≈ 3.0s
- all 200 responses, no rejects
Again, the math checks:
2500 concurrent / 2 seconds ≈ 1250 rps theoretical
Result: ~96% of theoretical max.
This validated that PAGI’s worker model increases capacity cleanly without breaking streaming semantics.
The real test: slow clients and backpressure
Throughput benchmarks are easy. Backpressure is not.
To test this, I used a stress app that streams tens of megabytes as fast as possible, then ran clients with artificial throttling.
Slow client test
curl --limit-rate 1M http://localhost:5000/stream/50
Observed:
- ~50MB transferred in ~49 seconds
- download rate stayed near 1MB/s
- server did not buffer the entire response
- no memory blow-up
This is the critical signal.
If backpressure were broken, the server would buffer 50MB instantly and curl would “catch up” later. That did not happen.
High-throughput clients
Unthrottled clients on localhost achieved:
- ~75–80 MB/s per connection
- linear scaling until loopback bandwidth became the limit
Again, no red flags.
Concurrency + large bodies
Using hey with 50 concurrent clients streaming 10–50MB bodies:
- high aggregate throughput
- wide latency distribution
- no error rates
- no collapse
At this point, the bottleneck was clearly:
- kernel socket buffers
- memory copy bandwidth
- client tooling limits
Not the server itself.
The subtle risk we identified (and addressed)
While reviewing the server code, one important detail surfaced:
- writes use
IO::Async::Stream->write - writes enqueue data into an outgoing buffer
- there is no implicit send-side backpressure unless implemented explicitly
This is fine for polite producers (like most apps), but dangerous if:
- an app writes huge chunks rapidly
- the client reads slowly
To address this, we added send-side backpressure:
- track queued outgoing bytes
- define high / low watermarks
- pause
$send->(...)futures when buffers exceed the high watermark - resume when the buffer drains
This change makes slow-client behavior safe by default.
What I did not see (important)
Across all tests, I did not observe:
- unbounded RSS growth
- runaway CPU
- increasing latency under steady load
- dropped connections without errors
- event-loop starvation
Those are the usual failure modes of async servers. None appeared.
Final assessment
Based on these tests:
- PAGI::Server handles streaming correctly
- backpressure works (and is now explicit)
- overload fails fast and cleanly
- worker scaling behaves predictably
- performance matches theoretical limits
There’s still more to do — especially long-duration soak tests and production-grade observability — but the fundamentals are solid.
Most importantly, the server behaves honestly. It doesn’t fake async by buffering everything and hoping for the best.
Top comments (1)
Thank you for your hard work, I appreciate it :)