Aleksandr

Posted on Apr 1 • Originally published at petrolmuffin.github.io

Message Broker Throughput: RabbitMQ vs Kafka vs NATS

#programming #microservices #dotnet

I started using NATS in one of my projects and was generally happy with it, but I wanted to verify the performance claims for myself. Is it really as fast as people say, or is that just marketing and cherry-picked benchmarks? The best way to find out was to write my own tests and compare NATS against the two most common alternatives: RabbitMQ and Kafka.

This post covers throughput testing of all three brokers on two messaging patterns: async producer-consumer queue, and request-reply. Request-reply is not the typical use case for message brokers, but NATS supports it natively, so it was worth measuring how the others perform when forced into that pattern.

Interactive results page: https://petrolmuffin.github.io/BrokersPerformance/
GitHub: https://github.com/PetrolMuffin/BrokersPerformance

Test Environment

All three brokers ran in Docker containers on the same host. No custom tuning was applied to any broker: default configurations only.

CPU: AMD Ryzen 7 8845HS, 8 cores / 16 threads, 3.80 GHz
OS: Windows 11 (10.0.26200)
Runtime: .NET 10.0.4, RyuJIT x86-64-v4
Benchmarking framework: BenchmarkDotNet v0.15.8

Broker Versions and Configuration

Broker	Docker Image	Client Version	Configuration
RabbitMQ	`rabbitmq:4.2-management`	`RabbitMQ.Client` v7.2.1	Default settings, AMQP 0.9.1, guest/guest
Kafka	`apache/kafka:4.2.0`	`Confluent.Kafka` v2.13.2	KRaft mode (no ZooKeeper), single node, 1 partition, replication factor = 1
NATS	`nats:2.12-alpine`	`NATS.Net` v2.7.3	JetStream enabled (`-js` flag)

Idle RAM Consumption (Docker, Cold Start)

Measured via docker stats on freshly started containers with no accumulated data or active connections:

Broker	RAM
NATS	6 MiB
RabbitMQ	122 MiB
Kafka	327 MiB

Kafka's JVM-based architecture is immediately visible: 54x the memory of NATS and 2.7x of RabbitMQ on cold start. NATS is the lightest at 6 MiB.

BenchmarkDotNet Configuration

3 warmup iterations, 10 measured iterations per scenario
InvocationCount = 1, UnrollFactor = 1 (each iteration is a single benchmark call)
RunStrategy = Monitoring
GC: non-concurrent, forced collections, non-server mode
ThreadPool: minimum 100 worker + 100 I/O completion port threads
Reported metrics: Mean, StdDev, P95, Op/s, Allocated memory

Note on metric choice: all result tables below use P95 (95th percentile) rather than Mean. P95 better represents worst-case performance a system will realistically encounter, filtering out warm-up noise while capturing tail latency.

Test Parameters

The message counts and payload sizes were chosen to cover two dimensions: the number of concurrent messages the broker must route, and the size of individual payloads. Counts are inversely proportional to payload size to keep total benchmark runtime within a few minutes per scenario while still loading the broker enough to reveal its throughput characteristics.

Async queue (250 concurrent publishers, 1 consumer):

Messages	Payload	Total Volume
50,000	256 B	12.8 MB
25,000	1 KB	25 MB
10,000	4 KB	40 MB
5,000	64 KB	327 MB
2,500	128 KB	335 MB

Request-reply (150 concurrent publishers):

Messages	Payload	Total Volume
25,000	256 B	6.4 MB
10,000	1 KB	10 MB
5,000	4 KB	20 MB

The async pattern uses more publishers (250 vs 150) and reaches larger payloads because bulk throughput is the primary concern. Request-reply uses fewer messages and smaller payloads reflecting the typical RPC use case where latency matters more than volume.

Implementation Details

Async Queue (Producer-Consumer)

All three implementations follow the same structure: N publishers concurrently push messages into a queue/topic/stream, one consumer reads everything. The benchmark measures wall-clock time from the first publish to the last received message.

RabbitMQ:

Persistent messages (DeliveryMode = Persistent)
QoS: prefetch count = 100
Manual ACK
Separate IConnection for publisher and consumer
Completion tracked via CounterCompletionSource (atomic increment + TaskCompletionSource)

Kafka:

Idempotent producer (EnableIdempotence = true)
Write buffer: QueueBufferingMaxKbytes = 1 GB, QueueBufferingMaxMessages = 1M
Manual offset commit, AutoOffsetReset = Earliest
Single partition, consumer group ID randomized per iteration
Background consumer task with manual message counting

NATS JetStream:

File-backed stream, retention = Workqueue
Async persistence (StorageType = File)
Explicit ACK, MaxDeliver = 10
Deduplication window: 1 minute
WriterBufferSize = 1 GB

Request-Reply

NATS has native request-reply: RequestAsync sends a message and returns a response in a single call. The broker handles response routing internally.

RabbitMQ and Kafka lack this primitive. For both, request-reply was implemented via correlation IDs:

Requester generates a UUID, attaches it to the message, stores a TaskCompletionSource in a ConcurrentDictionary<string, TaskCompletionSource>
Responder receives the message, echoes the correlation ID back on a dedicated reply queue/topic
Requester's reply listener matches the ID and completes the corresponding TaskCompletionSource

This means each "request" in RabbitMQ/Kafka involves 4 broker operations (publish request → consume request → publish reply → consume reply) vs 1 round-trip in NATS.

RabbitMQ:

Separate request/reply queues
Correlation-ID in AMQP properties
Persistent messages (DeliveryMode = Persistent)
QoS: prefetch count = 100
Manual ACK

Kafka:

Separate request/reply topics
Correlation-ID in Kafka headers.
Idempotent producer (EnableIdempotence = true)
Write buffer: QueueBufferingMaxKbytes = 1 GB, QueueBufferingMaxMessages = 1M
Manual offset commit, AutoOffsetReset = Earliest

NATS:

Built-in RequestAsync/ReplyAsync
WriterBufferSize = 1 GB
RequestTimeout = 10 min
CommandTimeout = 5 min

Results: Async Queue

All values are P95 (95th percentile) completion time in milliseconds. Lower is better. Ratio columns show time relative to NATS JetStream (baseline).

P95 Completion Time

Scenario	RabbitMQ	Kafka	NATS JetStream	RabbitMQ / NATS	Kafka / NATS
50K × 256 B	1,521 ms	35,856 ms	944 ms	1.61	38.98
25K × 1 KB	905 ms	18,629 ms	511 ms	1.77	36.46
10K × 4 KB	442 ms	8,329 ms	256 ms	1.73	32.54
5K × 64 KB	534 ms	7,496 ms	878 ms	0.61	8.54
2.5K × 128 KB	690 ms	7,162 ms	735 ms	0.94	9.74

Messages per Second (at P95)

Scenario	RabbitMQ	Kafka	NATS JetStream
50K × 256 B	32 873 msg/s	1 394 msg/s	52 966 msg/s
25K × 1 KB	27 624 msg/s	1 342 msg/s	48 924 msg/s
10K × 4 KB	22 624 msg/s	1 201 msg/s	39 063 msg/s
5K × 64 KB	9 363 msg/s	667 msg/s	5 695 msg/s
2.5K × 128 KB	3 623 msg/s	349 msg/s	3 401 msg/s

On small to medium payloads (up to 4 KB), NATS JetStream processes messages x1.6-1.8 faster than RabbitMQ at P95. The gap is consistent, suggesting protocol-level overhead in AMQP relative to NATS's binary protocol.

On large payloads (64 KB+), RabbitMQ takes the lead. At 64 KB processes messages x1.6 faster than NATS's; x1.1 at 128 KB. RabbitMQ allocates 7-12 MB managed memory for these scenarios, while NATS allocates 368-401 MB. AMQP framing is more efficient for large contiguous payloads.

Kafka is x9-38 slower than NATS at P95. This is expected: Kafka's commit log architecture, partition leader election, and replication protocol add overhead that only pays off with horizontal scaling across multiple partitions and nodes.

Memory Allocation (Managed Heap)

Scenario	RabbitMQ	Kafka	NATS JetStream
50K × 256 B	106 MB	115 MB	678 MB
25K × 1 KB	54 MB	76 MB	342 MB
10K × 4 KB	22 MB	60 MB	205 MB
5K × 64 KB	12 MB	323 MB	401 MB
2.5K × 128 KB	7 MB	318 MB	368 MB

RabbitMQ consistently uses the least managed memory. NATS allocates significantly more due to the 1 GB writer buffer configuration. Kafka's allocations spike with large payloads (318-323 MB) due to its own producer buffer configuration (QueueBufferingMaxKbytes = 1 GB).

Results: Request-Reply

All values are P95 completion time. Ratio columns show time relative to NATS (baseline).

Scenario	RabbitMQ	Kafka	NATS	RabbitMQ / NATS	Kafka / NATS
25K × 256 B	41,450 ms	36,572 ms	397 ms	104.41	92.12
10K × 1 KB	21,434 ms	15,113 ms	226 ms	94.84	66.87
5K × 4 KB	12,231 ms	7,339 ms	159 ms	76.92	46.16

Messages per second (at P95):

Scenario	RabbitMQ	Kafka	NATS
25K × 256 B	603 msg/s	684 msg/s	62 972 msg/s
10K × 1 KB	467 msg/s	662 msg/s	44 248 msg/s
5K × 4 KB	409 msg/s	681 msg/s	31 447 msg/s

NATS is 46-92x faster than Kafka and x77-104 faster than RabbitMQ at P95. This is the difference between a native protocol primitive (one network round-trip) and an application-level emulation (four broker operations per request).

RabbitMQ is the slowest in all request-reply scenarios, with P95 degrading linearly: 12.2s for 5K messages, 21.4s for 10K, 41.4s for 25K. The per-message overhead is roughly constant at ~1.7 ms, dominated by the ACK cycle on both request and reply queues.

Kafka also shows high tail latency: P95 reaches 36.6s on the 25K scenario (Mean is 23.3s), indicating consumer group coordination and offset management overhead amplified in what is effectively a synchronous request pattern.

Broker Comparison

RabbitMQ 4.2

Strengths:

Mature AMQP implementation with 15+ years of production usage
Rich routing model: direct, topic, fanout, and headers exchanges with flexible bindings
Management UI included (port 15672), exposing queue depths, message rates, connection counts, and consumer status
Lowest managed memory allocation in benchmarks, particularly with large payloads
Multi-protocol support: AMQP 0.9.1, AMQP 1.0, MQTT 3.1.1/5.0, STOMP
Plugin ecosystem: delayed message exchange, federation, shovel, consistent hash exchange
Broad client library coverage across all major languages
Quorum queues and streams for HA and replay scenarios

Weaknesses:

1.6-1.8x slower than NATS on small message async throughput
No native request-reply, must be implemented via correlation IDs
Classic mirrored queues are deprecated; quorum queues improve HA but add latency
Erlang runtime limits low-level troubleshooting and custom extensions
Clustering can exhibit split-brain under network partitions (mitigated by peer discovery plugins)

Apache Kafka 4.2

Strengths:

Distributed commit log with configurable retention, allowing consumers to replay from any offset
Horizontal throughput scaling via partition-based parallelism
Exactly-once semantics with idempotent producers and transactional API
Extensive ecosystem: Kafka Connect (200+ connectors), Kafka Streams, ksqlDB, Schema Registry
Standard for event sourcing, CDC (Debezium), and data pipeline architectures
KRaft mode (used here) removes ZooKeeper dependency

Weaknesses:

Slowest in every scenario in this benchmark (single-node, single-partition is its worst case)
327 MiB RAM on cold start (JVM heap), 54x NATS
High operational complexity: partitions, ISR, consumer group rebalancing, offset management
Consumer group rebalancing causes consumption pauses (mitigated by cooperative-sticky assignor)
Latency-optimized for batched throughput, not per-message delivery
No native request-reply

NATS 2.12 with JetStream

Strengths:

Fastest in 3 out of 5 async scenarios, and all 3 request-reply scenarios
Native request-reply at the protocol level, no application-level workarounds needed
Operationally minimal: single binary, single flag (-js) enables persistence
6 MiB RAM on cold start
JetStream provides persistence, replay, exactly-once delivery, de-duplication, and consumer acknowledgement
Subject-based routing with hierarchical wildcards (>, *)
Built-in key-value store and object store
Service discovery via micro package
Leafnode and gateway topologies for multi-cluster deployments

Weaknesses:

Higher managed memory allocation
Slower than RabbitMQ on large payloads (64 KB+)
Smaller community and fewer production war stories compared to RabbitMQ/Kafka
JetStream is younger than Kafka Streams; less battle-tested for event streaming at extreme scale
Monitoring/observability tooling is less mature (no equivalent to Kafka Connect ecosystem)

Conclusion

For new projects that need a general-purpose message broker, NATS is the most practical starting point.

It provides a feature set comparable to Kafka: persistence with replay, exactly-once delivery, stream processing primitives, key-value and object stores. At the same time, its throughput on small-to-medium payloads matches or exceeds RabbitMQ, and it handles request-reply 46-105x faster than either alternative at P95 thanks to native protocol support.

The operational cost is also lower. A single binary with one flag gives you a persistent, JetStream-enabled broker consuming 6 MiB of RAM on cold start. Compare that to Kafka's 327 MiB.

RabbitMQ remains a strong choice when the workload is primarily large payloads (64 KB+) or when the team has deep AMQP expertise. Kafka is still the right tool for large-scale event streaming, CDC pipelines, and scenarios where partition-based parallelism and the Connect/Streams ecosystem matter.

But as a default choice for a new distributed system? NATS delivers Kafka-class features at RabbitMQ-class speed, with less operational overhead than either.

Benchmarked with BenchmarkDotNet v0.15.8 on .NET 10.0.4. All brokers ran in Docker on the same machine with default configurations. Single-node results. Production numbers will differ.

DEV Community

Message Broker Throughput: RabbitMQ vs Kafka vs NATS

Test Environment

Broker Versions and Configuration

Idle RAM Consumption (Docker, Cold Start)

BenchmarkDotNet Configuration

Test Parameters

Implementation Details

Async Queue (Producer-Consumer)

Request-Reply

Results: Async Queue

P95 Completion Time

Messages per Second (at P95)

Memory Allocation (Managed Heap)

Results: Request-Reply

Broker Comparison

RabbitMQ 4.2

Apache Kafka 4.2

NATS 2.12 with JetStream

Conclusion

Top comments (0)