DEV Community


Posted on • Updated on

Failure Models of Distributed Systems


The previous article explained the communication model for the timing of consensus problem. In this article, we will introduce four general definitions of failure models for consensus problem nodes.

Crash-stop faults

This model only places the assumption that the node will Crash-stop faults. Also, a node that is stopped in this model never comes back.

Omission faults

This model assumes Crash-stop faults and Omission faults. Omission faults may or may not reply to messages. This ignoring cannot be judged by other nodes as to whether it is Crash-stop faults or Omission faults.

Crash-recovery faults

This model assumes Crash-stop faults, Omission faults, and Crash-recovery faults. The crash recovery fault makes the assumption that a node may crash at any time and may begin to re-intervene with the response at any time. The model also assumes partial data loss due to crash. If there is no response from a node in this model, it is not possible to determine whether it is Crash-stop faults, Omission faults, or Crash-recovery faults.

Byzantine faults

This model assumes Crash-stop faults, Omission faults, Crash-recovery faults, and Byzantine faults. Byzantine breakdowns can do whatever the node does. For example, it can ignore messages, it can pretend to be malfunctioning, can reply to fake messages, or it can do evil deeds jointly among multiple Byzantine failure nodes.


The difficulty of dealing with each failure model is illustrated in the figure below.


In other words, the Byzantine failure model is the most difficult assumption to deal with.

Top comments (3)

skoya76 profile image


phlash profile image
Phil Ashby

Out of order message arrival? 😁

skoya76 profile image

Interesting joke.
Delay due to asynchronous communication is occurring.