How Kafka Handles Backpressure: Producer Buffers, Broker Quotas, and Consumer Flow Control

#architecture #backend #distributedsystems #systemdesign

Backpressure is the pressure that builds in a pipeline when the slow end cannot absorb what the fast end produces. In a reactive stream, the solution is explicit: the subscriber signals its demand upstream, and the publisher sends only as many items as it requested. Kafka takes a different approach. There is no protocol-level demand signal flowing from consumers back to producers. Instead, Kafka distributes the problem across three separate layers, each with its own configuration surface, and leaves the integration of those layers to you.

Understanding which layer to reach for, and when, is the practical skill this article develops.

The Producer Side: A Bounded Buffer That Blocks

When your application calls producer.send(), the record does not go directly to the broker. It lands in the RecordAccumulator, an in-memory buffer the producer client maintains internally. The accumulator organizes records into per-partition deques of ProducerBatch objects. A background Sender thread drains ready batches to the appropriate brokers over the network.

This separation is where producer-side backpressure lives. The accumulator has a finite size, controlled by buffer.memory (default: 32 MB). As long as the Sender drains batches faster than your application produces records, everything stays well under that limit. When it does not, because the broker is slow, the network is saturated, or you are simply writing more than the brokers can accept, the buffer fills up.

At that point, send() blocks. The calling thread waits inside BufferPool.allocate() until the Sender frees enough memory or max.block.ms (default: 60 seconds) expires. If the timeout fires first, the client throws a BufferExhaustedException.

The blocking behavior is the implicit backpressure signal on the producer side. Your application slows down because send() will not return until there is room. The 60-second default means a slow broker can cause your producers to stall quietly for up to a minute before surfacing an error. This is worth knowing when diagnosing latency spikes that seem to appear without warning.

In practice, lowering max.block.ms to 5-10 seconds and handling the resulting exception explicitly gives you faster feedback and a clear place to shed load, rather than waiting for Kafka to silently unblock.

How Batching Softens Burst Pressure

The RecordAccumulator does not forward records to the Sender one at a time. It accumulates them into batches, flushing a batch when either it reaches batch.size bytes or linger.ms milliseconds have passed since the first record arrived in that batch, whichever comes first.

batch.size defaults to 16 KB and linger.ms defaults to 5 ms as of Kafka 4.0 (it was 0 in earlier versions, which caused the producer to send records immediately). A linger.ms of 0 is right for low-latency use cases but leaves throughput on the table for anything bulk-oriented. A linger.ms of 5-50 ms allows the accumulator to fill batches more completely, reducing the number of network round trips the Sender has to make and easing the per-request load on brokers during traffic spikes.

Compression works at the batch level, so the fuller the batch, the better the ratio. Enabling compression.type (lz4 or zstd being the practical choices) on top of sensible batching settings can meaningfully reduce the bytes-per-second your producers push to the network, leaving more headroom before the buffer fills.

Broker Quota Throttling

Producer-side buffering protects individual producers from overwhelming themselves. Broker quotas protect the cluster from being overwhelmed by any single client.

Kafka supports three quota types, each applied per user or client ID on a per-broker basis:

Quota type	What it limits	Available since
Produce quota	Bytes per second written	Kafka 0.9
Fetch quota	Bytes per second read	Kafka 0.9
Request quota	Percentage of broker request-handler CPU time	Kafka 0.11

When a client exceeds its quota, the broker does not reject the request. Instead, it computes a throttle delay based on how much the client overshot its limit across a sliding window of 30 one-second buckets. For a produce quota of 10 MB/s, a client that sends 15 MB in one second receives a throttle delay of roughly 500 ms.

For produce requests, the broker returns the response immediately but includes a non-zero ThrottleTimeMs field. The client SDK reads this value and pauses outgoing requests for that duration. For fetch requests, the response contains no data and the client backs off before its next poll. The broker also mutes the client's network channel during the delay window, so clients that ignore ThrottleTimeMs still cannot push more requests through.

You configure quotas at runtime with no broker restart required:

# Set a produce quota of 10 MB/s for a specific client ID
kafka-configs.sh --bootstrap-server localhost:9092 \
  --alter \
  --add-config 'producer_byte_rate=10485760' \
  --entity-type clients \
  --entity-name my-producer

One thing to account for in quota math: quotas are per broker, not per cluster. A quota of 50 MB/s on a 6-broker cluster allows up to 300 MB/s of aggregate throughput for that client if its writes are spread evenly across brokers.

Consumer Flow Control: Pause, Resume, and Poll Sizing

On the consumption side, Kafka's pull-based model is itself a form of flow control. Your consumer calls poll() at its own pace; the broker sends back available records up to fetch.max.bytes (default: 50 MB) and max.partition.fetch.bytes (default: 1 MB per partition). You control throughput entirely by controlling when and how often you call poll().

Two configs bound how much work each poll cycle returns:

max.poll.records (default: 500): the maximum number of records returned per poll() call
max.poll.interval.ms (default: 5 minutes): the maximum time the broker allows between consecutive polls before treating the consumer as dead and triggering a rebalance

If your downstream processing is slow, max.poll.records is the first lever to reach for. Dropping it from 500 to 50 means each poll cycle hands you less work, you return to calling poll() faster, and you stay well inside the max.poll.interval.ms window. This alone prevents a large class of rebalance storms that masquerade as "Kafka being slow."

For more dynamic control, KafkaConsumer exposes pause(Collection<TopicPartition>) and resume(Collection<TopicPartition>). When you pause a partition, subsequent poll() calls return no records from that partition. The consumer remains in the group and continues sending heartbeats; it just stops fetching. This is the right tool when your processing is async (thread pools, HTTP calls, database writes) and you need to prevent a downstream queue from growing without bound.

A practical pattern:

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));

    for (ConsumerRecord<String, String> record : records) {
        processingQueue.put(record); // blocks naturally when the queue is full
    }

    if (processingQueue.size() > HIGH_WATERMARK) {
        consumer.pause(consumer.assignment());
    } else if (processingQueue.size() < LOW_WATERMARK) {
        consumer.resume(consumer.assignment());
    }
}

The one thing to watch: paused partitions still count against your session. If the processing thread takes longer than max.poll.interval.ms to drain below LOW_WATERMARK, the broker removes the consumer from the group. Set HIGH_WATERMARK generously enough that draining happens well within that window.

Kafka Streams: Backpressure for Free

Kafka Streams applications are simultaneously consumers and producers: they read from input topics, transform records, and write results to output topics. Because they use the standard consumer client under the hood, backpressure emerges naturally from the pull-based poll rhythm.

If processing slows, the Streams thread takes longer per poll cycle. This means fewer records flow through per unit of time, without any explicit pause/resume logic on your part. The application never accumulates more records in memory than one poll cycle returns.

You still care about max.poll.records and commit.interval.ms. A slow processing stage increases the uncommitted offset window; if a Streams application restarts, it replays all uncommitted records. Shorter commit intervals reduce that replay window, at the cost of more offset commits to the __consumer_offsets topic.

Techniques at a Glance

Mechanism	Where it lives	What it protects	Key config
Producer buffer blocking	Producer client	Slows the caller when broker is slow	`buffer.memory`, `max.block.ms`
Batch accumulation	Producer client	Smooths throughput spikes	`batch.size`, `linger.ms`
Compression	Producer client	Reduces wire bytes, extends buffer headroom	`compression.type`
Broker quotas	Broker	Prevents noisy clients from monopolizing resources	`producer_byte_rate`, `consumer_byte_rate`
Poll sizing	Consumer client	Bounds records per processing cycle	`max.poll.records`
Pause/Resume	Consumer client	Dynamic per-partition flow control	`KafkaConsumer.pause()`
Kafka Streams pull model	Streams runtime	Implicit throttling via poll cadence	`max.poll.records`, `commit.interval.ms`

Where to Start

Begin on the consumer side. Measure your per-message processing time, then set max.poll.records so one poll cycle's work fits inside max.poll.interval.ms with room to spare. This one change resolves most consumer-lag issues before you need to touch anything else.

Add pause()/resume() if your processing is async and you need to protect a downstream queue from growing without bound. Keep your watermarks far enough apart that the processing thread can drain between poll cycles.

On the producer side, decide how you want failures to surface. The 60-second max.block.ms default means a stalled producer is invisible for up to a minute. Lowering it to 5-10 seconds and treating BufferExhaustedException as a load signal gives you faster, actionable feedback. Pair this with appropriate linger.ms and a compression codec to increase throughput before the buffer becomes the constraint.

Broker quotas earn their configuration cost in multi-tenant clusters or anywhere you have a mix of batch and interactive workloads sharing the same brokers. A runaway batch ingest job should not be able to saturate the cluster and introduce latency spikes for latency-sensitive consumers.

Consumer lag, visible through kafka-consumer-groups.sh --describe or the JMX metric records-lag-max, is the single health signal that ties all these mechanisms together. Sustained, growing lag tells you the consumers are falling behind before the system enters a failure mode. The root cause might be anywhere in this stack, but the lag metric is where you start the investigation.