ChunTing Wu

Posted on Dec 23, 2021

Message Queue in Redis

#redis #eventdriven #architecture #design

My organization have done the technical selection recently, and we want to build an event-driven system. However, the budget is limited, we cannot choose a classic queuing service like RabbitMQ or streaming process like Kafka. We have to find an affordable solution which can meet our needs. Currently, all we have is Redis, therefore, we decide to build a message queue in Redis.

In this article, I will introduce some properties of a message queue and describe how Redis be a message queue.

Message Queue

There are many aspects to consider when choosing a message queue, such as propagation, delivery, persistence and consumer groups. In this section, I will explain them briefly.

Propagation

The propagation means how messages be transferred by the message queue. There are 2 types of the propagation,

1-to-1
1-to-many (fan-out)

One-to-one is quite simple. The producer sends a message to queue, and this message is received by only one consumer. On the other hand, one-to-many is a message can be delivered to multiple consumers. It’s worth mentioning that the producer just sent a message, but the message can be transferred to many receivers. Such behavior is also called fan-out.

Delivery

Delivery is interesting. Most queuing systems have their delivery guarantees. There are three common guarantees,

At-most-once
At-least-once
Exactly-once

At-most-once is relatively easy to achieve. It can be said that all queuing systems have this guarantee. The consumer can receive a sent message or nothing. This may happen in several situations. Firstly, the message is lost whereas a networking problem occurred. Secondly, although the consumer received it, but he did not handle it well like crashed. The message disappears if it is gone, and it is impossible to retrieve the message again.

At-least-once is a guarantee often used by some well-known systems such as RabbitMQ, Kafka, etc. Compared to at-most-once, at-least-once has a stronger guarantee. It can make sure the message must be processed. However, the message may be processed many times. For instance, a consumer does not acknowledge the queue that the message is handled, thus the queue sends a message to that consumer again.

Exactly-once is the strictest guarantee. It ensure the message must be handle once. Even the popular systems can’t do this well, e.g. RabbitMQ. Nevertheless, the correct use and configuration of Kafka can still be achieved. The price is to sacrifice some performance.

Persistence

Persistence means whether the message will disappear after it is sent to the system. There are also three types of persistence,

In-memory
In-disk
Hybrid

We all know what they means. But the interesting thing is, is it slower to persist messages in disk? No, not really. It depends on how persistence is implemented. Kafka uses LSM-tree to achieve a lot of throughput; in addition, it is better than RabbitMQ who uses memory. There is another example in Cassandra, Cassandra has very fast writing speed and uses LSM-tree as well.

Hybrid is a special case combined with in-memory and in-disk. In order to improve the writing performance, the queuing system writes to the memory first, and then flush into the disk. RabbitMQ is a typical example in hybrid. However, RabbitMQ is also able to be configured as in-disk.

Consumer Group

In my opinion, consumer group is the most important feature in a queuing system. Processing a message usually takes time, so that we have to use more consumers to deal with messages, aka scale-out. In consumer group scenes, both the target of one-to-one and one-to-many become a group of consumers instead of a single consumer.

Redis Queue

After talking about the properties in a queuing system, let's talk about how Redis be a message queue. There are 3 ways to do it,

Pub/Sub
List
Stream

We will introduce one by one, and then give a comprehensive summary.

Pub/Sub

Pub/Sub is a widely known solution for notifying, this feature was born almost at the same time as Redis. The consumer SUBSCRIBE a topic, aka a key, and then receive the data after a client PUBLISH messages to the same topic. As a traditional Pub/Sub feature, it also can fan-out a message to multiple consumers. Moreover, A certain degree of messaging routing can also be achieved through PSUBSCRIBE.

But Pub/Sub in Redis are not popular for most use cases. The biggest problem is the message will delivery at most once. When a message is published, if the consumer doesn’t receive it right now, the message disappears. Furthermore, Redis doesn't persist messages. All messages are gone away if Redis is shutdown.

Let's summarize Pub/Sub:

1-to-1 and 1-to-many are fine
at-most-once
no persistence
no consumer group

List

List is a useful data structure in Redis, and we can accomplish a FIFO queue easily by using it. The trick is we can use BLPOP to wait for a message in blocking mode. However, adding a timeout is recommended.

According to the figure, we can see if there are multiple consumers wait for the same list, they are become a consumer group. Without configuring anything, the consumer group can be spontaneously formed by consumers. On the other hand, list cannot fan-out a message. If a message is BLPOP by a consumer, others can not be retrieve this message anymore, even the message is lost in that consumer.

Nevertheless, Redis list can persist messages in memory. In addition, if you are enabling AOF or RDB, messages can be backed up into the disk. I have to say, following my previous article, this approach is not entirely data persistence.

To sum up,

1-to-1 is okay, but no 1-to-many
at-most-once
persist in-memory, and backup in-disk
consumer group works

Stream

After introducing the Pub/Sub and List, we notice that neither of these two methods is very good. They have their own drawbacks; therefore, Stream has come to solve those issues since Redis 5.0.

Because Stream is much more complicated, let’s first look at what benefits Stream brings.

1-to-1 and 1-to-many are fine
at-least-once 👍
persist in-memory, and backup in-disk
consumer group works

As a result, Stream solves all issues in Pub/Sub and List and enhances a lot, e.g., at-least-once delivery.

The diagram is like Pub/Sub, but the workflow is closer to List. The producer can generate messages at any time, and then XADD to Redis Stream. You can consider Stream as a list maintains all incoming messages. Consumers can also retrieve messages at any time via XREAD. The identifier in XREAD command represents where you want to read the message from.

$: No matter what messages are in Stream before, I don’t care, only retrieve from now on.
0-0: Always read from the head.
<id>: Start from the specific message id.

As well as supporting one-to-one mapping, Stream supports consumer group as follows:

In order to achieve at-least-once guarantee, like most queuing systems, the consumer must acknowledge Stream after processing a message by using XACK.

The use of the special identifier, <, here is to start reading from a position that no one has read in the group.

After the above explanation, I provide a real example to show a consumer's bootstrap in Node.js.



let lastid = "0-0";
let checkBacklog = true
while (true) {
    const myid = checkBacklog ? lastid : ">";
    const items = await redis.xreadgroup('GROUP',GroupName,ConsumerName,'BLOCK','2000','COUNT','10','STREAMS',StreamName,myid);

    if (!items) continue;

    checkBacklog = !(items[0][1].length == 0);

    items[0][1].forEach(elem => {
        const [id, fields] = elem;
        await processMessage(id,fields);
        await redis.xack(StreamName,GroupName,id);
        lastid = id;
    });
}

Please note that every consumer has his own name, ConsumerName. First, the consumer read from the beginning to determine its last position. The response will be an empty array with no length, so that the consumer can get the correct lastid. Then, the consumer reads from the lastid and processes those messages. Finally, acknowledge Stream with finished id.

Stream Consumer Failover

In the distributed system, we can not name a consumer easily. For example, the consumer is run in a container within K8s, how do I maintain names to every pod? Even if we lock everyone’s name, how do we face the scale-out and scale-in scene? Therefore, keeping the name in the distributed system is impractical.

In spite of the fact that we cannot name a consumer in uuid and forget the name after the consumer is down. Because Redis Stream maintains a table of names against last positions, if we generate a random name every time, the mapping table will become larger and larger. Worst of all, those messages that have been received but not acknowledged will never be processed.

Fortunately, Redis Stream provides a method to claim those pending messages. The workflow is like this:

Find out all pending message ids.
Claim those ids to transfer the ownership.

Therefore, the completed workflow in a consumer bootstrap is:

XPENDING StreamName GroupName
XCLAIM StreamName GroupName <ConsumerName in uuid> <min-idle-time> <ID-1> <ID-2> ... <ID-N>
The above script

The min-idle-time is a very useful approach. By using min-idle-time, we can avoid multiple consumers claim the same messages at the same time. The first consumer claims some messages, and such messages will be no longer idle. Hence, other consumers cannot claim those messages again.

Redis Stream Persistence

According to my previous article, Redis does not guarantee that the data will not be lost at all, even if the strictest setting is turned on. If we use Redis as a message queue, we must take additional measures to ensure persistence. The most common way is event-sourcing; before publishing a message, we write this message into a durable storage like MySQL. Our consumers can work generally, however, if there is any error occurred, we still can leverage the durable messages in MySQL to recover our works.

Besides, if Stream persists more and more messages, the memory usage of Redis would be a disaster. If we are looking at the Redis manual, we can find a command, XDEL. However, XDEL does not delete the messages, it only marks those messages as unused, and the messages are still there.

How can we prevent the memory leakage in Redis Stream? We can use MAXLEN whereas XADD is invoking. The command line is:

XADD StreamName MAXLEN 1000 * foo bar

But there is one thing you have to know, MAXLEN affects performance of Redis very much. It would be block the main process for a while, and there is no command can be executed during that period. If there are many incoming messages and the amount of queued messages is the maximum, then Stream will be very busy to maintain MAXLEN.

An alternative approach can be adopted. Instead of fixing the hard limit, we can give Redis the right to choose a comfortable length at its free time. Hence, the command will be:

XADD StreamName MAXLEN ~ 1000 * foo bar

The ~ sign means the maximum length is about 1000, it might be 900 even 1300. Redis will pick a good time to strip a good size for it.

Conclusion

Let me summarize these three approaches.

	Pub/Sub	List	Stream
Fan-out	V		V
Delivery	At-most-once	At-most-once	At-least-once
Scale-out		V	V
Persistence		AOF, RDB	AOF, RDB
Complexity	low	low	high

There is an unfamiliar property, complexity, which means not only the complexity of a technology but also the complexity of implementing a consumer.

From my point of view, these three approaches have their pros and cons and also have their own applicable scenarios.

Pub/Sub: Best-effort notification.
List: Tolerate message queues with some data loss.
Stream: Loose streaming process.

It is worth mentioning that why Redis Stream is loose streaming process? Because, the consumer group in Redis Stream is not like Kafka, it can not preserve the message ordering. In a high-volume traffic environment, consumers within the same group cannot be scaled out successfully.

In the end, we chose List as the message queue. Our use cases are simple, we just want to throttle the notifications in the broadcast scene. The broadcast notification can tolerate the message loss, it’s good enough that most users can receive the message. In addition, the implementation effort is very low in Node.js, we can finish it as soon as possible. Although it is not the best solution, it is good enough for our organization.

DEV Community