Backpressure, Explained Through a Queue That Won't Fall Over

#webdev #tutorial

A queue feels like the safe answer. Producer writes fast, consumer reads slow, so you drop a buffer between them and assume the buffer absorbs the difference. It does — right up until the producer is faster than the consumer for long enough that the buffer is no longer a buffer. It's a backlog. And a backlog with no ceiling is a memory leak that takes a while to show up in your graphs.

Backpressure is the mechanism that stops that. It's the signal that travels backward — from the slow consumer to the fast producer — saying "slow down, I can't keep up." Without it, the producer keeps shoving work into a queue that grows until the process is killed by the OOM killer or the latency on every queued item climbs past the point where the result still matters.

The unbounded queue is the bug, not the fix

Here's the version most of us write first:

queue = []  # no max size

def produce(item):
    queue.append(item)   # never blocks, never fails

def consume():
    while queue:
        handle(queue.pop(0))

This works in every test you run, because in a test the producer stops. In production the producer doesn't stop. If handle() takes 50ms and items arrive every 10ms, the queue grows by 4 items per cycle, forever. Memory climbs linearly. The 10,000th item waits roughly 500 seconds before anyone looks at it. By the time you see the memory alert, the queued work is already stale.

The fix is not a bigger queue. A bigger queue just moves the cliff further out and makes the fall taller. The fix is a bounded queue plus a decision about what happens when it's full. That decision is backpressure.

An unbounded queue doesn't remove the limit — it relocates it from your code (where you can handle it) to the kernel's OOM killer (where you can't). "It hasn't fallen over yet" usually means "the producer hasn't sustained peak rate long enough yet."

Four things a full queue can do

Once the queue has a maximum size, produce() has to answer one question: what do I do when there's no room? There are exactly four honest answers, and picking the wrong one for your workload is how systems fail in surprising ways.

Block the producer. The producer waits until a slot frees up. This is the cleanest form of backpressure — the slowness propagates all the way up the chain, and an upstream HTTP server starts returning slower, which makes its clients slow down. Go channels with a fixed capacity do this by default: a send on a full channel blocks. The risk is that blocking can cascade into a deadlock if the producer holds a lock the consumer needs.

Drop the new item. Reject what just arrived. Sensible when fresh data supersedes old — a metrics pipeline sampling 1-in-N under load loses precision, not correctness. You must surface the drop as a counter, or you've built silent data loss.

Drop the oldest item. Evict the head to make room for the tail. Right when the newest data is the most valuable: live sensor readings, the current price, the latest frame. A ring buffer does this for free.

Fail fast. Return an error to the caller immediately — HTTP 503, a rejected future. This is what a bounded thread pool's rejection policy does, and it's the foundation of load shedding: better to cleanly reject 10% of requests than to slowly degrade 100% of them into timeouts.

The four strategies map to a single question: what is the cost of being late versus the cost of being wrong? If late data is worthless, drop oldest. If extra data is worthless, drop newest. If every item must be processed, block. If the caller has a fallback, fail fast.

Why "just add a queue" hides the real number

The queue length you can tolerate is determined by Little's Law: the average number of items in the system equals arrival rate times the time each item spends inside. Flip it around and a full queue tells you your worst-case latency. A 1,000-slot queue draining at 200 items/second means a freshly queued item waits up to 5 seconds. If your SLA is 1 second, your queue is already four times too deep — and no amount of buffering fixes that, because the buffer is the latency.

This is the part people skip. A queue doesn't make a slow consumer fast. It converts a throughput problem into a latency problem and hides it inside a data structure. Backpressure forces the throughput problem back into the open where you can either scale the consumer, shed load, or tell the producer the truth.

Reactive libraries make this explicit. In Reactive Streams (the contract behind Project Reactor, RxJava, and Akka Streams), the consumer calls request(n) to pull a specific number of items, and the producer is contractually forbidden from sending more than were requested. The demand signal is the backpressure. Node.js streams do the same with a lower-level handshake: writable.write() returns false when the internal buffer is over its highWaterMark, and a well-behaved producer pauses until the 'drain' event fires.

function pump(source, dest) {
  for (const chunk of source) {
    const ok = dest.write(chunk);
    if (!ok) {
      // buffer is full — stop until it drains
      source.pause();
      dest.once('drain', () => source.resume());
      return;
    }
  }
}

That if (!ok) is the whole idea. The plumbing exists in your runtime already. The bug is ignoring the return value — calling write() in a tight loop without checking it is the Node equivalent of the unbounded queue.append() above.

When you're tracing a backpressure path through an unfamiliar codebase — finding every place a producer ignores the consumer's signal — an editor with whole-repo context speeds up the read considerably. You're looking for the inverse of a pattern (writes with no corresponding check), and that's exactly the kind of structural search an AI-assisted editor handles better than grep.

The mental model to keep: a queue is a shock absorber for bursts, not a fix for a sustained rate mismatch. Size it for the burst you expect, bound it hard, and decide — explicitly, in code — what happens at the boundary. A queue that won't fall over is just a queue whose full case you actually wrote.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.