Over the last few days I’ve been stress-testing PAGI::Server, an async HTTP server built on IO::Async, Future::IO, and an EV event loop. The goal wasn’t to chase synthetic “requests per second” numbers, but to answer harder questions:
- Does streaming actually behave correctly?
- What happens under sustained concurrency?
- Do slow clients cause buffer bloat?
- Does backpressure really work?
- How does it fail under overload?
This post summarizes what I learned by pushing the server well past “normal” workloads and watching it misbehave — or not.
Architecture in one paragraph
PAGI::Server is:
- single-process, event-driven by default
- optionally multi-process via workers
- fully async end-to-end (no threads)
- streaming-first (responses are not buffered by default)
- designed to support ASGI-like semantics in Perl
The server accepts connections in a main event loop and routes work through IO::Async::Stream objects. Application code produces responses via a $send callback that emits protocol events (http.response.start, http.response.body, etc.).
That last part matters, because it’s where most async servers quietly cheat.
An important note: Although PAGI::Server is IO::Async based, that is an implementation detail, not a core requirement of the PAGI specification. That said I build PAGI::Server on IO::Async because it's mature, battle tested and fully featured.
Baseline: fast responses
Before touching streaming, I tested a trivial HTTP handler.
On an older 16-core MacBook Pro:
- ~22–23k requests/sec
- p50 latency ~20ms
- p99 < 50ms
- CPU mostly idle
Nothing surprising here — just confirming there’s no accidental serialization or blocking in the accept loop.
Just to set some baselines, I tested a simple, similar PSGI application using Starman, one of the most popular choices for serving PSGI apps, and I found it topped out at around 16K requests/sec, with significantly worse latency spread and dropped connections.
Streaming test: 2-second responses, 500 concurrent clients
Next, I switched to a streaming handler that sends chunks over ~2 seconds.
With 500 concurrent connections, single process:
- ~242–245 requests/sec
- p50 ≈ 2.00s
- p99 ≈ 2.3–2.4s
- CPU ~10–20% busy
- memory stable
This matters because the math checks out:
500 concurrent / 2 seconds ≈ 250 rps theoretical max
Hitting ~97–98% of theoretical max strongly suggests:
- no head-of-line blocking
- no per-connection threads
- no accidental buffering
The server is doing what an async server should do: waiting, not working.
A gotcha: TTY logging will ruin your benchmark
One early run looked worse than expected. The cause was embarrassingly simple:
Access logging was printing to a terminal (TTY).
At ~20k+ log lines/sec, the terminal became the bottleneck.
Once logging was disabled or redirected:
- CPU usage dropped dramatically
- latency tails tightened
- throughput returned to theoretical limits
Lesson: never benchmark with synchronous TTY logging enabled.
Overload behavior: what happens at 2500 concurrent streams?
With ulimit -n = 2048, single-process PAGI::Server started rejecting connections around ~2000 open sockets.
That rejection path returned:
503 Service UnavailableRetry-After: 5Connection: close
This is good behavior.
Instead of accepting everything and melting down, the server applied admission control based on available file descriptors.
The client (hey) complained with “unsolicited response” warnings — a known artifact when clients aggressively reuse keep-alive connections under churn. The server was behaving correctly.
Scaling out: 16 workers
With 16 worker processes, the same workload:
- sustained ~1200 requests/sec at 2500 concurrency
- p50 ≈ 2.00s
- p99 ≈ 3.0s
- all 200 responses, no rejects
Again, the math checks:
2500 concurrent / 2 seconds ≈ 1250 rps theoretical
Result: ~96% of theoretical max.
This validated that PAGI’s worker model increases capacity cleanly without breaking streaming semantics.
The real test: slow clients and backpressure
Throughput benchmarks are easy. Backpressure is not.
To test this, I used a stress app that streams tens of megabytes as fast as possible, then ran clients with artificial throttling.
Slow client test
curl --limit-rate 1M http://localhost:5000/stream/50
Observed:
- ~50MB transferred in ~49 seconds
- download rate stayed near 1MB/s
- server did not buffer the entire response
- no memory blow-up
This is the critical signal.
If backpressure were broken, the server would buffer 50MB instantly and curl would “catch up” later. That did not happen.
High-throughput clients
Unthrottled clients on localhost achieved:
- ~75–80 MB/s per connection
- linear scaling until loopback bandwidth became the limit
Again, no red flags.
Concurrency + large bodies
Using hey with 50 concurrent clients streaming 10–50MB bodies:
- high aggregate throughput
- wide latency distribution
- no error rates
- no collapse
At this point, the bottleneck was clearly:
- kernel socket buffers
- memory copy bandwidth
- client tooling limits
Not the server itself.
The subtle risk we identified (and addressed)
While reviewing the server code, one important detail surfaced:
- writes use
IO::Async::Stream->write - writes enqueue data into an outgoing buffer
- there is no implicit send-side backpressure unless implemented explicitly
This is fine for polite producers (like most apps), but dangerous if:
- an app writes huge chunks rapidly
- the client reads slowly
To address this, we added send-side backpressure:
- track queued outgoing bytes
- define high / low watermarks
- pause
$send->(...)futures when buffers exceed the high watermark - resume when the buffer drains
This change makes slow-client behavior safe by default.
What I did not see (important)
Across all tests, I did not observe:
- unbounded RSS growth
- runaway CPU
- increasing latency under steady load
- dropped connections without errors
- event-loop starvation
Those are the usual failure modes of async servers. None appeared.
Final assessment
Based on these tests:
- PAGI::Server handles streaming correctly
- backpressure works (and is now explicit)
- overload fails fast and cleanly
- worker scaling behaves predictably
- performance matches theoretical limits
There’s still more to do — especially long-duration soak tests and production-grade observability — but the fundamentals are solid.
Most importantly, the server behaves honestly. It doesn’t fake async by buffering everything and hoping for the best.
Top comments (0)