DEV Community

Aleksandr
Aleksandr

Posted on • Originally published at petrolmuffin.github.io

Message Broker Throughput: RabbitMQ vs Kafka vs NATS

I started using NATS in one of my projects and was generally happy with it, but I wanted to verify the performance claims for myself. Is it really as fast as people say, or is that just marketing and cherry-picked benchmarks? The best way to find out was to write my own tests and compare NATS against the two most common alternatives: RabbitMQ and Kafka.

This post covers throughput testing of all three brokers on two messaging patterns: async producer-consumer queue, and request-reply. Request-reply is not the typical use case for message brokers, but NATS supports it natively, so it was worth measuring how the others perform when forced into that pattern.

Interactive results page: https://petrolmuffin.github.io/BrokersPerformance/
GitHub: https://github.com/PetrolMuffin/BrokersPerformance

Test Environment

All three brokers ran in Docker containers on the same host. No custom tuning was applied to any broker: default configurations only.

  • CPU: AMD Ryzen 7 8845HS, 8 cores / 16 threads, 3.80 GHz
  • OS: Windows 11 (10.0.26200)
  • Runtime: .NET 10.0.4, RyuJIT x86-64-v4
  • Benchmarking framework: BenchmarkDotNet v0.15.8

Broker Versions and Configuration

Broker Docker Image Client Version Configuration
RabbitMQ rabbitmq:4.2-management RabbitMQ.Client v7.2.1 Default settings, AMQP 0.9.1, guest/guest
Kafka apache/kafka:4.2.0 Confluent.Kafka v2.13.2 KRaft mode (no ZooKeeper), single node, 1 partition, replication factor = 1
NATS nats:2.12-alpine NATS.Net v2.7.3 JetStream enabled (-js flag)

Idle RAM Consumption (Docker, Cold Start)

Measured via docker stats on freshly started containers with no accumulated data or active connections:

Broker RAM
NATS 6 MiB
RabbitMQ 122 MiB
Kafka 327 MiB

Kafka's JVM-based architecture is immediately visible: 54x the memory of NATS and 2.7x of RabbitMQ on cold start. NATS is the lightest at 6 MiB.

BenchmarkDotNet Configuration

  • 3 warmup iterations, 10 measured iterations per scenario
  • InvocationCount = 1, UnrollFactor = 1 (each iteration is a single benchmark call)
  • RunStrategy = Monitoring
  • GC: non-concurrent, forced collections, non-server mode
  • ThreadPool: minimum 100 worker + 100 I/O completion port threads
  • Reported metrics: Mean, StdDev, P95, Op/s, Allocated memory

Note on metric choice: all result tables below use P95 (95th percentile) rather than Mean. P95 better represents worst-case performance a system will realistically encounter, filtering out warm-up noise while capturing tail latency.

Test Parameters

The message counts and payload sizes were chosen to cover two dimensions: the number of concurrent messages the broker must route, and the size of individual payloads. Counts are inversely proportional to payload size to keep total benchmark runtime within a few minutes per scenario while still loading the broker enough to reveal its throughput characteristics.

Async queue (250 concurrent publishers, 1 consumer):

Messages Payload Total Volume
50,000 256 B 12.8 MB
25,000 1 KB 25 MB
10,000 4 KB 40 MB
5,000 64 KB 327 MB
2,500 128 KB 335 MB

Request-reply (150 concurrent publishers):

Messages Payload Total Volume
25,000 256 B 6.4 MB
10,000 1 KB 10 MB
5,000 4 KB 20 MB

The async pattern uses more publishers (250 vs 150) and reaches larger payloads because bulk throughput is the primary concern. Request-reply uses fewer messages and smaller payloads reflecting the typical RPC use case where latency matters more than volume.

Implementation Details

Async Queue (Producer-Consumer)

All three implementations follow the same structure: N publishers concurrently push messages into a queue/topic/stream, one consumer reads everything. The benchmark measures wall-clock time from the first publish to the last received message.

RabbitMQ:

  • Persistent messages (DeliveryMode = Persistent)
  • QoS: prefetch count = 100
  • Manual ACK
  • Separate IConnection for publisher and consumer
  • Completion tracked via CounterCompletionSource (atomic increment + TaskCompletionSource)

Kafka:

  • Idempotent producer (EnableIdempotence = true)
  • Write buffer: QueueBufferingMaxKbytes = 1 GB, QueueBufferingMaxMessages = 1M
  • Manual offset commit, AutoOffsetReset = Earliest
  • Single partition, consumer group ID randomized per iteration
  • Background consumer task with manual message counting

NATS JetStream:

  • File-backed stream, retention = Workqueue
  • Async persistence (StorageType = File)
  • Explicit ACK, MaxDeliver = 10
  • Deduplication window: 1 minute
  • WriterBufferSize = 1 GB

Request-Reply

NATS has native request-reply: RequestAsync sends a message and returns a response in a single call. The broker handles response routing internally.

RabbitMQ and Kafka lack this primitive. For both, request-reply was implemented via correlation IDs:

  1. Requester generates a UUID, attaches it to the message, stores a TaskCompletionSource in a ConcurrentDictionary<string, TaskCompletionSource>
  2. Responder receives the message, echoes the correlation ID back on a dedicated reply queue/topic
  3. Requester's reply listener matches the ID and completes the corresponding TaskCompletionSource

This means each "request" in RabbitMQ/Kafka involves 4 broker operations (publish request → consume request → publish reply → consume reply) vs 1 round-trip in NATS.

RabbitMQ:

  • Separate request/reply queues
  • Correlation-ID in AMQP properties
  • Persistent messages (DeliveryMode = Persistent)
  • QoS: prefetch count = 100
  • Manual ACK

Kafka:

  • Separate request/reply topics
  • Correlation-ID in Kafka headers.
  • Idempotent producer (EnableIdempotence = true)
  • Write buffer: QueueBufferingMaxKbytes = 1 GB, QueueBufferingMaxMessages = 1M
  • Manual offset commit, AutoOffsetReset = Earliest

NATS:

  • Built-in RequestAsync/ReplyAsync
  • WriterBufferSize = 1 GB
  • RequestTimeout = 10 min
  • CommandTimeout = 5 min

Results: Async Queue

All values are P95 (95th percentile) completion time in milliseconds. Lower is better. Ratio columns show time relative to NATS JetStream (baseline).

P95 Completion Time

Scenario RabbitMQ Kafka NATS JetStream RabbitMQ / NATS Kafka / NATS
50K × 256 B 1,521 ms 35,856 ms 944 ms 1.61 38.98
25K × 1 KB 905 ms 18,629 ms 511 ms 1.77 36.46
10K × 4 KB 442 ms 8,329 ms 256 ms 1.73 32.54
5K × 64 KB 534 ms 7,496 ms 878 ms 0.61 8.54
2.5K × 128 KB 690 ms 7,162 ms 735 ms 0.94 9.74

Messages per Second (at P95)

Scenario RabbitMQ Kafka NATS JetStream
50K × 256 B 32 873 msg/s 1 394 msg/s 52 966 msg/s
25K × 1 KB 27 624 msg/s 1 342 msg/s 48 924 msg/s
10K × 4 KB 22 624 msg/s 1 201 msg/s 39 063 msg/s
5K × 64 KB 9 363 msg/s 667 msg/s 5 695 msg/s
2.5K × 128 KB 3 623 msg/s 349 msg/s 3 401 msg/s

On small to medium payloads (up to 4 KB), NATS JetStream processes messages x1.6-1.8 faster than RabbitMQ at P95. The gap is consistent, suggesting protocol-level overhead in AMQP relative to NATS's binary protocol.

On large payloads (64 KB+), RabbitMQ takes the lead. At 64 KB processes messages x1.6 faster than NATS's; x1.1 at 128 KB. RabbitMQ allocates 7-12 MB managed memory for these scenarios, while NATS allocates 368-401 MB. AMQP framing is more efficient for large contiguous payloads.

Kafka is x9-38 slower than NATS at P95. This is expected: Kafka's commit log architecture, partition leader election, and replication protocol add overhead that only pays off with horizontal scaling across multiple partitions and nodes.

Memory Allocation (Managed Heap)

Scenario RabbitMQ Kafka NATS JetStream
50K × 256 B 106 MB 115 MB 678 MB
25K × 1 KB 54 MB 76 MB 342 MB
10K × 4 KB 22 MB 60 MB 205 MB
5K × 64 KB 12 MB 323 MB 401 MB
2.5K × 128 KB 7 MB 318 MB 368 MB

RabbitMQ consistently uses the least managed memory. NATS allocates significantly more due to the 1 GB writer buffer configuration. Kafka's allocations spike with large payloads (318-323 MB) due to its own producer buffer configuration (QueueBufferingMaxKbytes = 1 GB).

Results: Request-Reply

All values are P95 completion time. Ratio columns show time relative to NATS (baseline).

Scenario RabbitMQ Kafka NATS RabbitMQ / NATS Kafka / NATS
25K × 256 B 41,450 ms 36,572 ms 397 ms 104.41 92.12
10K × 1 KB 21,434 ms 15,113 ms 226 ms 94.84 66.87
5K × 4 KB 12,231 ms 7,339 ms 159 ms 76.92 46.16

Messages per second (at P95):

Scenario RabbitMQ Kafka NATS
25K × 256 B 603 msg/s 684 msg/s 62 972 msg/s
10K × 1 KB 467 msg/s 662 msg/s 44 248 msg/s
5K × 4 KB 409 msg/s 681 msg/s 31 447 msg/s

NATS is 46-92x faster than Kafka and x77-104 faster than RabbitMQ at P95. This is the difference between a native protocol primitive (one network round-trip) and an application-level emulation (four broker operations per request).

RabbitMQ is the slowest in all request-reply scenarios, with P95 degrading linearly: 12.2s for 5K messages, 21.4s for 10K, 41.4s for 25K. The per-message overhead is roughly constant at ~1.7 ms, dominated by the ACK cycle on both request and reply queues.

Kafka also shows high tail latency: P95 reaches 36.6s on the 25K scenario (Mean is 23.3s), indicating consumer group coordination and offset management overhead amplified in what is effectively a synchronous request pattern.

Broker Comparison

RabbitMQ 4.2

Strengths:

  • Mature AMQP implementation with 15+ years of production usage
  • Rich routing model: direct, topic, fanout, and headers exchanges with flexible bindings
  • Management UI included (port 15672), exposing queue depths, message rates, connection counts, and consumer status
  • Lowest managed memory allocation in benchmarks, particularly with large payloads
  • Multi-protocol support: AMQP 0.9.1, AMQP 1.0, MQTT 3.1.1/5.0, STOMP
  • Plugin ecosystem: delayed message exchange, federation, shovel, consistent hash exchange
  • Broad client library coverage across all major languages
  • Quorum queues and streams for HA and replay scenarios

Weaknesses:

  • 1.6-1.8x slower than NATS on small message async throughput
  • No native request-reply, must be implemented via correlation IDs
  • Classic mirrored queues are deprecated; quorum queues improve HA but add latency
  • Erlang runtime limits low-level troubleshooting and custom extensions
  • Clustering can exhibit split-brain under network partitions (mitigated by peer discovery plugins)

Apache Kafka 4.2

Strengths:

  • Distributed commit log with configurable retention, allowing consumers to replay from any offset
  • Horizontal throughput scaling via partition-based parallelism
  • Exactly-once semantics with idempotent producers and transactional API
  • Extensive ecosystem: Kafka Connect (200+ connectors), Kafka Streams, ksqlDB, Schema Registry
  • Standard for event sourcing, CDC (Debezium), and data pipeline architectures
  • KRaft mode (used here) removes ZooKeeper dependency

Weaknesses:

  • Slowest in every scenario in this benchmark (single-node, single-partition is its worst case)
  • 327 MiB RAM on cold start (JVM heap), 54x NATS
  • High operational complexity: partitions, ISR, consumer group rebalancing, offset management
  • Consumer group rebalancing causes consumption pauses (mitigated by cooperative-sticky assignor)
  • Latency-optimized for batched throughput, not per-message delivery
  • No native request-reply

NATS 2.12 with JetStream

Strengths:

  • Fastest in 3 out of 5 async scenarios, and all 3 request-reply scenarios
  • Native request-reply at the protocol level, no application-level workarounds needed
  • Operationally minimal: single binary, single flag (-js) enables persistence
  • 6 MiB RAM on cold start
  • JetStream provides persistence, replay, exactly-once delivery, de-duplication, and consumer acknowledgement
  • Subject-based routing with hierarchical wildcards (>, *)
  • Built-in key-value store and object store
  • Service discovery via micro package
  • Leafnode and gateway topologies for multi-cluster deployments

Weaknesses:

  • Higher managed memory allocation
  • Slower than RabbitMQ on large payloads (64 KB+)
  • Smaller community and fewer production war stories compared to RabbitMQ/Kafka
  • JetStream is younger than Kafka Streams; less battle-tested for event streaming at extreme scale
  • Monitoring/observability tooling is less mature (no equivalent to Kafka Connect ecosystem)

Conclusion

For new projects that need a general-purpose message broker, NATS is the most practical starting point.

It provides a feature set comparable to Kafka: persistence with replay, exactly-once delivery, stream processing primitives, key-value and object stores. At the same time, its throughput on small-to-medium payloads matches or exceeds RabbitMQ, and it handles request-reply 46-105x faster than either alternative at P95 thanks to native protocol support.

The operational cost is also lower. A single binary with one flag gives you a persistent, JetStream-enabled broker consuming 6 MiB of RAM on cold start. Compare that to Kafka's 327 MiB.

RabbitMQ remains a strong choice when the workload is primarily large payloads (64 KB+) or when the team has deep AMQP expertise. Kafka is still the right tool for large-scale event streaming, CDC pipelines, and scenarios where partition-based parallelism and the Connect/Streams ecosystem matter.

But as a default choice for a new distributed system? NATS delivers Kafka-class features at RabbitMQ-class speed, with less operational overhead than either.


Benchmarked with BenchmarkDotNet v0.15.8 on .NET 10.0.4. All brokers ran in Docker on the same machine with default configurations. Single-node results. Production numbers will differ.

Top comments (0)