About: Chandra–Toueg consensus algorithm is a research topic. Over the lifetime, 153 publications have been published within this topic receiving 14654 citations.
TL;DR: In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process.
Abstract: The consensus problem involves an asynchronous system of processes, some of which may be unreliable The problem is for the reliable processes to agree on a binary value In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process By way of contrast, solutions are known for the synchronous case, the “Byzantine Generals” problem
TL;DR: It is proved that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus, the above results also apply to Atomic Broadcast.
Abstract: We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties—completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus, the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
TL;DR: It is proved that to solve Consensus, any failure detector has to provide at least as much information as Diamond W, which is indeed the weakest failure detector for solving Consensus in asynchronous systems with a majority of correct processes.
Abstract: We determine what information about failures is necessary and sufficient to solve Consensus in asynchronous distributed systems subject to crash failures. In [CT91], we proved that $\Diamond\cal W$, a failure detector that provides surprisingly little information about which processes have crashed, is sufficient to solve Consensus in asynchronous systems with a majority of correct processes. In this paper, we prove that to solve Consensus, any failure detector has to provide at least as much information as $\Diamond\cal W$. Thus, $\Diamond\cal W$ is indeed the weakest failure detector for solving Consensus in asynchronous systems with a majority of correct processes.
TL;DR: It is proved that to solve Consensus, any failure detector has to provide at least as much information as W, and W is indeed the weakest failure detector for solving Consensus in asynchronous systems with a majority of correct processes.
Abstract: We determine what information about failures is necessary and sufficient to solve Consensus in asynchronous distributed systems subject to crash failures. In Chandra and Toueg [1996], it is shown that W, a failure detector that provides surprisingly little information about which processes have crashed, is sufficient to solve Consensus in asynchronous systems with a majority of correct processes. In this paper, we prove that to solve Consensus, any failure detector has to provide at least as much information as W. Thus, W is indeed the weakest failure detector for solving Consensus in asynchronous systems with a majority of correct processes.
TL;DR: The proofs expose general heuristic principles that explain why consensus is possible in certain models but not possible in others, and several critical system parameters, including various synchronicity conditions, are identified.
Abstract: Reaching agreement is a primitive of distributed computing. Whereas this poses no problem in an ideal, failure-free environment, it imposes certain constraints on the capabilities of an actual system: A system is viable only if it permits the existence of consensus protocols tolerant to some number of failures. Fischer et al. have shown that in a completely asynchronous model, even one failure cannot be tolerated. In this paper their work is extended: Several critical system parameters, including various synchrony conditions, are identified and how varying these affects the number of faults that can be tolerated is examined. The proofs expose general heuristic principles that explain why consensus is possible in certain models but not possible in others.