## What is Total order?

The way to determine if a given element is in total order is simple. That is, "Can you put those elements in a row?".

For example, what if the natural numbers 7, 8, 1, 4, and 5 are given? We can then serialize 1<4<5<7<8. In other words, the natural numbers are total order.

What about the sets {b, d}, {d,d} {z, b} next? They cannot be serialized. In other words, these sets are not in total order.

State machine replication requires the total order of operations.

## How to Total order operations?

### 1. Logical clock

The idea behind the logical clock is simple. For each operation, a counter is incremented and its number is included in the operation information. Each replica applies operations in the ascending order of that counter.

For example, suppose that the operations are issued in the order op_A, op_B, and op_C. Then cnt (op_A) = 0, cnt (op_B) = 1, and cnt (op_C) = 2 are given. The replica applies the operations according to that number in the order op_A, op_B, and op_C.

This can be accomplished by installing a leader replica. All operations are communicated to the leader. This allows the leader to assign a counter number to the operation. All replicas can then apply the operations according to that number to guarantee total order.

There are situations, however, in which it is not desirable to implement a single leader. How can we ensure total order then?

### 2. Lamport timestamp

This is explained in detail in another article by me. As explained in this article, the total order guaranteed by the Lamport timestamp may not be sufficient. For example, suppose that operations are issued to create identical usernames on different nodes. However, the usernames must be unique. In this case, one might think that the one with the smaller timestamp (the write created first) should win. But is that instantly possible? We don't know if there is another account with the same username being created in parallel on another node, whose timestamp may or may not be larger than the timestamp we are trying to win. In other words, the full order of operations will only be determined after we have information on all the nodes.

Therefore, to implement the unique constraint, the total order of operations is not sufficient, and the total order broadcast described below is needed to know the definite time of the total order.

## Total order broadcast

There are two safety concerns that total order broadcasts must satisfy.

- Messages are not lost.
- Messages received by any node are always received by all nodes.
- Messages must be conveyed to all nodes in the same order.

Different order is not allowed as shown in the following figure.

The following image shows an example of a successful total order broad.

### What makes it difficult?

At first glance, a total order broadcast seems simple. However, it becomes difficult when using an asynchronous communication model. See my article on various communication models for distributed systems.

The asynchronous communication model has the following characteristics

- Messages are eventually delivered.
- Messages can be delayed with no upper limit.
- Message arrival order not guaranteed.

In other words, using the asynchronous communication model causes the problems shown in the following figure.

In other words, the order of arrival of messages may change if nothing is done under a delay environment. This does not adhere to the principle of total order broadcast.

### Solution

The solution is simple. First, the replica that wants to deliver a message broadcasts that message. In addition, the replica that receives the message broadcasts an "ack". If all replicas agree, the message can be logged locally. This is illustrated in the following figure. The order relationship between m1 and m2 is m1 < m2.

The wait in the figure above indicates that it is waiting for m1 to agree. This is because if m2 is logged before m1, the order relationship is broken.

## Comment

This article introduced a simple all-order broadcast. However, frequent broadcasts are undesirable from the standpoint of communication volume. Therefore, new protocols with reduced communication costs, such as FastPaxos, have been developed.

Thank you for reading to the end!

## Top comments (0)