DEV Community

John Napiorkowski
John Napiorkowski

Posted on

PAGI::Server: Stress-Testing an Async Perl HTTP Server the Hard Way

Over the last few days I’ve been stress-testing PAGI::Server, an async HTTP server built on IO::Async, Future::IO, and an EV event loop. The goal wasn’t to chase synthetic “requests per second” numbers, but to answer harder questions:

  • Does streaming actually behave correctly?
  • What happens under sustained concurrency?
  • Do slow clients cause buffer bloat?
  • Does backpressure really work?
  • How does it fail under overload?

This post summarizes what I learned by pushing the server well past “normal” workloads and watching it misbehave — or not.


Architecture in one paragraph

PAGI::Server is:

  • single-process, event-driven by default
  • optionally multi-process via workers
  • fully async end-to-end (no threads)
  • streaming-first (responses are not buffered by default)
  • designed to support ASGI-like semantics in Perl

The server accepts connections in a main event loop and routes work through IO::Async::Stream objects. Application code produces responses via a $send callback that emits protocol events (http.response.start, http.response.body, etc.).

That last part matters, because it’s where most async servers quietly cheat.

An important note: Although PAGI::Server is IO::Async based, that is an implementation detail, not a core requirement of the PAGI specification. That said I build PAGI::Server on IO::Async because it's mature, battle tested and fully featured.

Baseline: fast responses

Before touching streaming, I tested a trivial HTTP handler.

On an older 16-core MacBook Pro:

  • ~22–23k requests/sec
  • p50 latency ~20ms
  • p99 < 50ms
  • CPU mostly idle

Nothing surprising here — just confirming there’s no accidental serialization or blocking in the accept loop.

Just to set some baselines, I tested a simple, similar PSGI application using Starman, one of the most popular choices for serving PSGI apps, and I found it topped out at around 16K requests/sec, with significantly worse latency spread and dropped connections.


Streaming test: 2-second responses, 500 concurrent clients

Next, I switched to a streaming handler that sends chunks over ~2 seconds.

With 500 concurrent connections, single process:

  • ~242–245 requests/sec
  • p50 ≈ 2.00s
  • p99 ≈ 2.3–2.4s
  • CPU ~10–20% busy
  • memory stable

This matters because the math checks out:

500 concurrent / 2 seconds ≈ 250 rps theoretical max
Enter fullscreen mode Exit fullscreen mode

Hitting ~97–98% of theoretical max strongly suggests:

  • no head-of-line blocking
  • no per-connection threads
  • no accidental buffering

The server is doing what an async server should do: waiting, not working.


A gotcha: TTY logging will ruin your benchmark

One early run looked worse than expected. The cause was embarrassingly simple:

Access logging was printing to a terminal (TTY).

At ~20k+ log lines/sec, the terminal became the bottleneck.

Once logging was disabled or redirected:

  • CPU usage dropped dramatically
  • latency tails tightened
  • throughput returned to theoretical limits

Lesson: never benchmark with synchronous TTY logging enabled.


Overload behavior: what happens at 2500 concurrent streams?

With ulimit -n = 2048, single-process PAGI::Server started rejecting connections around ~2000 open sockets.

That rejection path returned:

  • 503 Service Unavailable
  • Retry-After: 5
  • Connection: close

This is good behavior.

Instead of accepting everything and melting down, the server applied admission control based on available file descriptors.

The client (hey) complained with “unsolicited response” warnings — a known artifact when clients aggressively reuse keep-alive connections under churn. The server was behaving correctly.


Scaling out: 16 workers

With 16 worker processes, the same workload:

  • sustained ~1200 requests/sec at 2500 concurrency
  • p50 ≈ 2.00s
  • p99 ≈ 3.0s
  • all 200 responses, no rejects

Again, the math checks:

2500 concurrent / 2 seconds ≈ 1250 rps theoretical
Enter fullscreen mode Exit fullscreen mode

Result: ~96% of theoretical max.

This validated that PAGI’s worker model increases capacity cleanly without breaking streaming semantics.


The real test: slow clients and backpressure

Throughput benchmarks are easy. Backpressure is not.

To test this, I used a stress app that streams tens of megabytes as fast as possible, then ran clients with artificial throttling.

Slow client test

curl --limit-rate 1M http://localhost:5000/stream/50
Enter fullscreen mode Exit fullscreen mode

Observed:

  • ~50MB transferred in ~49 seconds
  • download rate stayed near 1MB/s
  • server did not buffer the entire response
  • no memory blow-up

This is the critical signal.

If backpressure were broken, the server would buffer 50MB instantly and curl would “catch up” later. That did not happen.


High-throughput clients

Unthrottled clients on localhost achieved:

  • ~75–80 MB/s per connection
  • linear scaling until loopback bandwidth became the limit

Again, no red flags.


Concurrency + large bodies

Using hey with 50 concurrent clients streaming 10–50MB bodies:

  • high aggregate throughput
  • wide latency distribution
  • no error rates
  • no collapse

At this point, the bottleneck was clearly:

  • kernel socket buffers
  • memory copy bandwidth
  • client tooling limits

Not the server itself.


The subtle risk we identified (and addressed)

While reviewing the server code, one important detail surfaced:

  • writes use IO::Async::Stream->write
  • writes enqueue data into an outgoing buffer
  • there is no implicit send-side backpressure unless implemented explicitly

This is fine for polite producers (like most apps), but dangerous if:

  • an app writes huge chunks rapidly
  • the client reads slowly

To address this, we added send-side backpressure:

  • track queued outgoing bytes
  • define high / low watermarks
  • pause $send->(...) futures when buffers exceed the high watermark
  • resume when the buffer drains

This change makes slow-client behavior safe by default.


What I did not see (important)

Across all tests, I did not observe:

  • unbounded RSS growth
  • runaway CPU
  • increasing latency under steady load
  • dropped connections without errors
  • event-loop starvation

Those are the usual failure modes of async servers. None appeared.


Final assessment

Based on these tests:

  • PAGI::Server handles streaming correctly
  • backpressure works (and is now explicit)
  • overload fails fast and cleanly
  • worker scaling behaves predictably
  • performance matches theoretical limits

There’s still more to do — especially long-duration soak tests and production-grade observability — but the fundamentals are solid.

Most importantly, the server behaves honestly. It doesn’t fake async by buffering everything and hoping for the best.

Top comments (0)