Peter Mbanugo

Posted on Apr 11 • Originally published at pmbanugo.me

Why Queues Don’t Fix Overload (And What To Do Instead)

#go #rust #webdev #programming

This post is about the physical laws of backpressure in software systems, latency death spirals, and why unbounded queues are a bug.

If you have ever been on call or in charge of ops and system reliability during a massive traffic spike, you know the feeling. The alerts fire, the CPU goes to 100%, memory usage goes through the roof, and the server either crashes or your software becomes unresponsive.

The team’s immediate reaction is almost always the same: we need a bigger buffer. You increase the size of your Go channels or Redis buffer limits. Perhaps you don't "yet" have a queue in your system, so you decide to add Kafka or RabbitMQ — Problem solved! You can now handle such traffic spike without crashing... Until your business hits a bigger milestone and you're back to the same routine.

This is the most intuitive response in software engineering today. Not only are queues used as a band-aid solution, they are added upfront as a way to decouple systems. Yet an overload on one system cascades to others.

But a queue is synonymous with a bathtub or sink, and just like bathtubs, software is bound by physics. If water is coming out of the tap (faucet) faster than it can go down the drain, making the bathtub bigger does not prevent a flood. It just delays it.

I'll publish a follow-up article about delays, how they affect your system, and how to engineer feedback loops for a reliable system. For now, let's focus on the physics of queues and why they don't fix overload.

As Fred Hebert famously wrote: "Queues don’t fix overload." They are only useful when the rate of work arriving is occasionally and temporarily higher than the rate of work finishing. They absorb variance, not sustained load.

When you rely on an unbounded queue to handle a traffic spike, you aren’t fixing the problem. You are guaranteeing that the failure will be catastrophic and likely unrecoverable.

When arrival rate (the faucet) exceeds processing rate (the narrow pipe), the buffer (the bathtub) eventually overflows. A larger bathtub only delays the inevitable.

The Latency Death Spiral (and Little's Law)

Let's look at the physics of a queue. Queueing theory is the mathematical study of waiting lines or queues, and it has a mathematical proof for it called Little’s Law.

Little’s Law states that the number of items in a system ($L$) is equal to the rate they arrive ($\lambda$) multiplied by the average time it takes to process them ($W$).

L = λW

Imagine your service processes 1,000 requests per second. Suddenly, 5,000 requests per second arrive. You now have a 4,000 request-per-second deficit.

Because you use unbounded mailboxes or queues, the service doesn't crash immediately. It obediently accepts every incoming request and stacks them in memory. As the arrival rate ($\lambda$) exceeds your processing capacity, Little's Law dictates that the number of items in your queue ($L$) must grow toward infinity.

When the queue hits 10,000, then 50,000, then 100,000, what happens to the 100,000th request sitting in that massive queue?

By the time your worker process finally reaches it, the client who sent it most likely timed out. Or they saw a spinning loading wheel, got frustrated, and refreshed the page. That refresh just added another request to the back of the queue, increasing the arrival rate ($\lambda$) even further.

Your server is now spending expensive CPU cycles processing dead requests for users who have already left. Because you are processing dead requests, your effective processing time ($W$) worsens. The queue grows faster because of the retries. The memory pressure triggers garbage collection pauses, which slows down your processing rate even further.

The drain gets clogged while the faucet opens wider.

This is the latency death spiral — a cascading system failure where increased response times lead to request queuing, causing further delays and eventual total system saturation. That is what happens with an unbounded queue.

The Hard Truth: You Must Shed Load

You cannot process data faster than your hardware allows. If the data transfer rate of your network, the CPU processing speed, or the disk I/O throughput is maxed out, you cannot magically make it faster with software. Therefore, you must drop data. In a networking context, for example, you drop packets.

The only engineering choice you have is how and where you drop it.

Most frameworks choose to drop data implicitly. The queue grows until the server runs out of memory, the OS kills the process, and you drop everything. All in-flight requests are lost as a result of that, and the service is completely down until it restarts.

The alternative is to drop data explicitly through Load Shedding and Backpressure.

If the system is at capacity, it must reject new work immediately. It must look the sender in the eye and say, "I am full. Go away." The sender must be told instantly so it can make a policy decision: should I drop this request, retry it later, or show the user a degraded experience? This is not a failure of the system. This is the system successfully defending itself with proper feedback loops.

Shed Load at the Source

In Tina, I engineered the framework around a single, immutable law: when capacity is exceeded, excess is shed immediately.

Tina is a high-throughput thread-per-core concurrency framework I built in Odin. It does not queue to fix overload. Every resource in the system — mailboxes, message pools, cross-shard channels — is strictly bounded and pre-allocated at boot time. There is no dynamic allocation (or malloc) during operation, and there is no "unbounded" mode for a mailbox. A standard Isolate mailbox holds exactly 256 messages by default (a configurable value).

Isolate is Tina's unit of concurrency. It is a single-threaded, message-passing actor that processes messages sequentially from its mailbox.

When an Isolate attempts to send a message to a full mailbox, the system does not allocate a hidden buffer. It does not pause the sender. It rejects the message in O(1) time and returns the failure to the sender immediately.

[ Sender Isolate ]                 [ Target Isolate Mailbox ]
        │                                ┌───┬───┬───┬───┐
send(target, msg) ─────(Full!)───────X   │msg│msg│...│msg│ (256/256)
        │                                └───┴───┴───┴───┘
        ▼
Returns .mailbox_full
(Zero allocation, O(1) fast rejection)

By returning the failure synchronously, the framework forces the application to make a policy decision at the exact moment the system reaches capacity, rather than deferring the failure to a downstream timeout some minutes later.

Here is what that looks like in code. In Tina, sending a message is a synchronous ctx_send call that returns a Send_Result:

result := tina.ctx_send(ctx, destination_handle, TAG_DATA, &payload)

#partial switch result {
case .ok:
    // Message successfully enqueued.
    return tina.Effect_Receive{}

case .mailbox_full:
    // The destination is overwhelmed. We must shed load.
    tina.ctx_log(ctx, .WARN, TAG_OVERLOAD, "Destination overloaded, dropping request.")

    // We explicitly drop the work and wait for the next message.
    return tina.Effect_Receive{}

case .pool_exhausted:
    // The Shard's memory pool is fully saturated. Let it crash.
    return tina.Effect_Crash{reason = .system_saturated}
}

If you are building a telemetry or metrics emitter where dropping data is acceptable, you explicitly silence the result. In Tina, the underscore documents your architectural decision to ignore backpressure:

// Fire-and-forget. If the metrics service is overloaded, drop the metric.
_ = tina.ctx_send(ctx, metrics_handle, TAG_METRIC, &payload)

Bounded Reliability via Timeouts

ctx_send acts like UDP. It is fast, best-effort, and provides immediate feedback. But what if you are writing a billing service and you need a guaranteed response?

If you need reliability, you use the .call pattern. Instead of a synchronous function call, you return an Effect_Call to the scheduler.

// Send a request and park the Isolate until a reply arrives.
return tina.Effect_Call{
    to      = billing_handle,
    message = transform_request_to_message(request),
    timeout = 5000, // Mandatory timeout in milliseconds
}

In an overloaded system, your .call request might be dropped because the billing mailbox is full. If that happens, you do not wait forever. The mandatory timeout fires, the scheduler wakes your Isolate with a TAG_CALL_TIMEOUT message, and you handle the failure.

You cannot distinguish between "the billing service is dead," "the message was dropped," or "the billing service is just slow." And from a systems engineering perspective, you shouldn't care. The response is exactly the same because the system failed to service your request within the required SLA. You must retry, escalate, or fail the client request.

Predictability Beats Brevity

Forcing the developer to handle a Send_Result on every message send is more verbose than dumping a message into an unbounded Erlang mailbox or an infinitely scaling Go channel.

I accepted this tradeoff because predictability and crash-safety are more important than typing fewer characters.

By strictly bounding queues and forcing timeouts, we eliminate the latency death spiral structurally. The system degrades gracefully. It sheds excess load instantly, and it recovers the moment traffic normalizes because it isn't spending the next twenty minutes processing dead requests.

If you are interested in high-throughput thread-per-core architectures, zero-allocation state machines, or deterministic simulation testing, I'd love your architectural critique on my project — Tina. You can find the source code, documentation, and the examples on GitHub here.

Top comments (3)

mote • Apr 12

Great write-up on Little's Law and why queues alone can't solve overload! One related challenge I've seen in embedded/edge systems: bounded queues can become a problem when the consumer is also resource-constrained.

In robotics, for example, a queue that fills up faster than the robot can process means you have to make hard decisions: drop messages, block, or shed load.

What's your experience with backpressure handling in practice? Have you found any patterns that work better than others for preventing the latency death spiral?

Peter Mbanugo • Apr 12

if the bounds for the queues (or related items) match what consumers expect, it's usually not a problem. The problem either lies with the consumer or somewhere else. But making limits and constraints known means we can avoid fatal state during overload... shedding load or blocking (or non-blocking with retries)

Yusuf Olosan • May 10

Really enjoyed this piece—the Little’s Law explanation and bathtub analogy made the physics of queuing very tangible. The distinction between explicit and implicit dropping is especially important and often overlooked.

Reading this from a fintech perspective got me thinking about transactional workloads, where messages aren’t always independent. In payment systems, earlier stages may already have produced side effects that later stages must complete.

For example, if a debit has already been executed and the system later sheds the credit-leg message because the mailbox is full, the outcome isn’t just a rejected request. It’s an inconsistent financial state requiring reversal, reconciliation, and sometimes regulatory reporting.

That’s why I think the issue with traditional banking systems wasn’t queuing itself but how the queues were managed: often unbounded, opaque, and poorly prioritized. Modern fintech systems still rely heavily on queues, but with more deliberate engineering around bounded capacity, durability, idempotent consumers, compensating transactions, and scalable consumers.

A few questions I’d genuinely love your perspective on:

• How would your model approach multi-stage pipelines where earlier stages have already committed before a later stage gets shed?
• Do you see a meaningful distinction between shedding load at the system edge vs. shedding mid-pipeline?
• When the sender receiving .mailbox_fullis itself stateless, who should own the durability guarantee?

Would love to read a follow-up focused on transactional systems, especially how backpressure, sagas, reconciliation, and compensating transactions fit together in practice.