Maciej Wakuła

Posted on Oct 29, 2022

Kafka vs RabbitMQ

#kafka #rabbitmq

This is a response to the article an article with same name

Introduction

Both Kafka and RabbitMQ are used to relay messages between message producers and message consumers. RabbitMQ can handle large amount of messages (dozens up to thousands per second) but Kafka can handle much higher traffic. Unless you are working with very high volume of messages, the kafka would be an overkill.

Distribudion

RabbitMQ (in "new" version 3.7 or newer) consists of "durable" instances (working and saving data) and "working instances" (not saving data). Works well with 1+ instances. Works with exchanges and queues.

Kafka spreads all messages through the instances (works well with 3+ instances, preferably 5+). Can work with single instance (for development/test purpose). You must set "partitioning key" to guarantee that messages are processed in order (for this key). It is not a queue.

High availability

RabbitMQ works well but you should not assume that it is 100% available. Under some rare circumstances messages could be lost.

Kafka is using multiple servers and quorum. Much has changed between versions 1, 2 and 3. Internally Kafka 2 and 3 are fully transactional to ensure that no messages are lost. Its "exactly once delivery" ensures that a message is delivered at least once (to the library, not necessary processed) OR delivered at least once (be prepared for same messages delivered multiple times).
Kafka is using either zookeeper or its own kraft protocol (available since kafka 2.80).

Performance

Kafka can scale much - use it for REALLY high messages volume.

RabbitMQ works well with high load but expect dead-end when dealing with HUGE load.

Replication

In RabbitMQ messages gets copied, consumed or dequeued (when time to live reached). Or lost (worst nightmare as you have no idea what happened).

In Kafka messages are stored for some time, never lost. Even if consumed you can still access then to see what happened.
You can replicate all the messages to secondary data center (and delay is not an issue).

Multi subscriber

In RabbitMQ you must control who should receive the messages and handle when message is not consumed in time.

Kafka is just a journal of messages. Same messages can be consumed by many consumer groups and you can always re-consume them (unless timed out).

Message Protocols

RabbitMQ is a "Standard" in terms of protocol but it is quite complex (and examples and libraries are not easy to understand). If you dig into the libs then all are just barely proof-of-concept's.

With enterprise level kafka is easier to feel but you must feel also asynchronous message processing. Rabbit works well for synchronous one.

Message Ordering

RabbitMQ if an advanced queue. Message ordering is "guaranteed" (but messages not processed are re-queued what means they could get to the end of queue).

Kafka is NOT an queue but you can use partitioning key to ensure those are processed in order. Expect issues when your consumer is not responding and same messages are passed to another consumer. Expect issues when single partitioning keys contians VERY HIGH amount of messages (it could block an partition).

Message lifetime

In RabbitMQ message consumed is gone.

In Kafka message stays until removed.

Architecture

RabbitMQ is handling message delivery and logic "what to do if delivery fails". This pushes responsibility to the message producer (sender). Your service must handle multiple delivery failure scenarios.

Kafka is rather "fire ans forget". You produce a message and don't care if it is consumed by anyone. This helps is decoupling of services. You must handle problems in consumers so producers are more simple.

Use Cases

Kafka is meant for asynchronous message delivery (fire and forget).

RabbitMQ is a message send and handle delivery failures. You have better control over the delivery but you must handle all the scenarios.

Transactions

Kafka has transactions. RabbitMQ does not. If you want reliable transactions then more complex kafka can provide this.

Language

RabbitMQ is written in Erlang (older versions are using erlang config files).
Kafka is written in scala (based on java).

Routing Support

Both options are OK but handle your routing differently (kafka pushes it to the consumers while rabbit depends much on the producer)...

Developer Experience

Rabbit seems simple at first glance but fails when issues appear.

Kafka is much more challenging at first but then does not fail once you understand how it works. Ir requires much more knowledge and experience.

Kafka can be scaled to much higher volume.
Kafka requires more knowledge.
Kafka provides better disaster recovery.

Disk space

I decided to add this one extra as it is important but not straightforward to notice or understand.
Kafka keeps messages for a while, message is kept only once but can be consumed multiple times. It consumes much disk space because you keep them for a while just in case a consumer is not available for a while. Messages should be small (preferably kilobytes). Consumer outage is not that scary anymore. You often want to outsource Kafka management to a company focused on that (having know-how and well established tools).
RabbitMQ can handle larger messages but copies them for every subscriber. Once consumed, the message is gone. As a result only few messages are kept and you need less disk space. If message is not consumed in time, it gets to a dead letter queue. Consumer outage is a pain and often forces someone to handle it manually. Your nightmare is a message that gets consumed but not processed as you end up digging the logs. System management is often very bound to domain knowledge so outsourcing might be difficult.

How this document was created

It was created in relation to the https://dev.to/rakeshkr2/kafka-vs-rabbitmq-4ioj

DEV Community