DEV Community

John Napiorkowski
John Napiorkowski

Posted on

PAGI::Server: Stress-Testing an Async Perl HTTP Server the Hard Way

Over the last few days I’ve been stress-testing PAGI::Server, an async HTTP server built on IO::Async, Future::IO, and an EV event loop. The goal wasn’t to chase synthetic “requests per second” numbers, but to answer harder questions:

  • Does streaming actually behave correctly?
  • What happens under sustained concurrency?
  • Do slow clients cause buffer bloat?
  • Does backpressure really work?
  • How does it fail under overload?

This post summarizes what I learned by pushing the server well past “normal” workloads and watching it misbehave — or not.


Architecture in one paragraph

PAGI::Server is:

  • single-process, event-driven by default
  • optionally multi-process via workers
  • fully async end-to-end (no threads)
  • streaming-first (responses are not buffered by default)
  • designed to support ASGI-like semantics in Perl

The server accepts connections in a main event loop and routes work through IO::Async::Stream objects. Application code produces responses via a $send callback that emits protocol events (http.response.start, http.response.body, etc.).

That last part matters, because it’s where most async servers quietly cheat.

An important note: Although PAGI::Server is IO::Async based, that is an implementation detail, not a core requirement of the PAGI specification. That said I build PAGI::Server on IO::Async because it's mature, battle tested and fully featured.

Baseline: fast responses

Before touching streaming, I tested a trivial HTTP handler.

On an older 16-core MacBook Pro:

  • ~22–23k requests/sec
  • p50 latency ~20ms
  • p99 < 50ms
  • CPU mostly idle

Nothing surprising here — just confirming there’s no accidental serialization or blocking in the accept loop.

Just to set some baselines, I tested a simple, similar PSGI application using Starman, one of the most popular choices for serving PSGI apps, and I found it topped out at around 16K requests/sec, with significantly worse latency spread and dropped connections.


Streaming test: 2-second responses, 500 concurrent clients

Next, I switched to a streaming handler that sends chunks over ~2 seconds.

With 500 concurrent connections, single process:

  • ~242–245 requests/sec
  • p50 ≈ 2.00s
  • p99 ≈ 2.3–2.4s
  • CPU ~10–20% busy
  • memory stable

This matters because the math checks out:

500 concurrent / 2 seconds ≈ 250 rps theoretical max
Enter fullscreen mode Exit fullscreen mode

Hitting ~97–98% of theoretical max strongly suggests:

  • no head-of-line blocking
  • no per-connection threads
  • no accidental buffering

The server is doing what an async server should do: waiting, not working.


A gotcha: TTY logging will ruin your benchmark

One early run looked worse than expected. The cause was embarrassingly simple:

Access logging was printing to a terminal (TTY).

At ~20k+ log lines/sec, the terminal became the bottleneck.

Once logging was disabled or redirected:

  • CPU usage dropped dramatically
  • latency tails tightened
  • throughput returned to theoretical limits

Lesson: never benchmark with synchronous TTY logging enabled.


Overload behavior: what happens at 2500 concurrent streams?

With ulimit -n = 2048, single-process PAGI::Server started rejecting connections around ~2000 open sockets.

That rejection path returned:

  • 503 Service Unavailable
  • Retry-After: 5
  • Connection: close

This is good behavior.

Instead of accepting everything and melting down, the server applied admission control based on available file descriptors.

The client (hey) complained with “unsolicited response” warnings — a known artifact when clients aggressively reuse keep-alive connections under churn. The server was behaving correctly.


Scaling out: 16 workers

With 16 worker processes, the same workload:

  • sustained ~1200 requests/sec at 2500 concurrency
  • p50 ≈ 2.00s
  • p99 ≈ 3.0s
  • all 200 responses, no rejects

Again, the math checks:

2500 concurrent / 2 seconds ≈ 1250 rps theoretical
Enter fullscreen mode Exit fullscreen mode

Result: ~96% of theoretical max.

This validated that PAGI’s worker model increases capacity cleanly without breaking streaming semantics.


The real test: slow clients and backpressure

Throughput benchmarks are easy. Backpressure is not.

To test this, I used a stress app that streams tens of megabytes as fast as possible, then ran clients with artificial throttling.

Slow client test

curl --limit-rate 1M http://localhost:5000/stream/50
Enter fullscreen mode Exit fullscreen mode

Observed:

  • ~50MB transferred in ~49 seconds
  • download rate stayed near 1MB/s
  • server did not buffer the entire response
  • no memory blow-up

This is the critical signal.

If backpressure were broken, the server would buffer 50MB instantly and curl would “catch up” later. That did not happen.


High-throughput clients

Unthrottled clients on localhost achieved:

  • ~75–80 MB/s per connection
  • linear scaling until loopback bandwidth became the limit

Again, no red flags.


Concurrency + large bodies

Using hey with 50 concurrent clients streaming 10–50MB bodies:

  • high aggregate throughput
  • wide latency distribution
  • no error rates
  • no collapse

At this point, the bottleneck was clearly:

  • kernel socket buffers
  • memory copy bandwidth
  • client tooling limits

Not the server itself.


The subtle risk we identified (and addressed)

While reviewing the server code, one important detail surfaced:

  • writes use IO::Async::Stream->write
  • writes enqueue data into an outgoing buffer
  • there is no implicit send-side backpressure unless implemented explicitly

This is fine for polite producers (like most apps), but dangerous if:

  • an app writes huge chunks rapidly
  • the client reads slowly

To address this, we added send-side backpressure:

  • track queued outgoing bytes
  • define high / low watermarks
  • pause $send->(...) futures when buffers exceed the high watermark
  • resume when the buffer drains

This change makes slow-client behavior safe by default.


What I did not see (important)

Across all tests, I did not observe:

  • unbounded RSS growth
  • runaway CPU
  • increasing latency under steady load
  • dropped connections without errors
  • event-loop starvation

Those are the usual failure modes of async servers. None appeared.


Final assessment

Based on these tests:

  • PAGI::Server handles streaming correctly
  • backpressure works (and is now explicit)
  • overload fails fast and cleanly
  • worker scaling behaves predictably
  • performance matches theoretical limits

There’s still more to do — especially long-duration soak tests and production-grade observability — but the fundamentals are solid.

Most importantly, the server behaves honestly. It doesn’t fake async by buffering everything and hoping for the best.

Top comments (1)

Collapse
 
bbrtj profile image
bbrtj

Thank you for your hard work, I appreciate it :)