TL;DR: An algorithm by which a process in a distributed system determines a global state of the system during a computation, which helps to solve an important class of problems: stable property detection.
Abstract: This paper presents an algorithm by which a process in a distributed system determines a global state of the system during a computation. Many problems in distributed systems can be cast in terms of the problem of detecting global states. For instance, the global state detection algorithm helps to solve an important class of problems: stable property detection. A stable property is one that persists: once a stable property becomes true it remains true thereafter. Examples of stable properties are “computation has terminated,” “ the system is deadlocked” and “all tokens in a token ring have disappeared.” The stable property detection problem is that of devising algorithms to detect a given stable property. Global state detection can also be used for checkpointing.
TL;DR: In this article, the authors consider the problem of bringing a distributed system to a consistent state after transient failures, and propose a distributed algorithm to create consistent checkpoints, as well as a rollback-recovery algorithm to recover the system from transient failures.
Abstract: We consider the problem of bringing a distributed system to a consistent state after transient failures. We address the two components of this problem by describing a distributed algorithm to create consistent checkpoints, as well as a rollback-recovery algorithm to recover the system to a consistent state. In contrast to previous algorithms, they tolerate failures that occur during their executions. Furthermore, when a process takes a checkpoint, a minimal number of additional processes are forced to take checkpoints. Similarly, when a process rolls back and restarts after a failure, a minimal number of additional processes are forced to roll back with it. Our algorithms require each process to store at most two checkpoints in stable storage. This storage requirement is shown to be minimal under general assumptions.
TL;DR: It is concluded that diskless checkpointing is a desirable alternative to disk-based checkpointing that can improve the performance of distributed applications in the face of failures.
Abstract: Diskless Checkpointing is a technique for checkpointing the state of a long-running computation on a distributed system without relying on stable storage. As such, it eliminates the performance bottleneck of traditional checkpointing on distributed systems. In this paper, we motivate diskless checkpointing and present the basic diskless checkpointing scheme along with several variants for improved performance. The performance of the basic scheme and its variants is evaluated on a high-performance network of workstations and compared to traditional disk-based checkpointing. We conclude that diskless checkpointing is a desirable alternative to disk-based checkpointing that can improve the performance of distributed applications in the face of failures.
TL;DR: In this article, a distributed electronic mail system with a methodology providing distributed message storage and processing is described, which breaks up how the individual components of message data are stored, such that message data itself is broken up into two parts: a metadata (mutable) portion and an immutable portion.
Abstract: An electronic mail system with a methodology providing distributed message storage and processing is described. In particular, this methodology breaks up how the individual components of message data are stored. Message data itself is broken up into two parts: a metadata (mutable) portion, and an immutable portion. The metadata portion represents that part of the message data that may change over time. This includes message status flags (e.g., the IMAP “message deleted” flag) and the message's position within a particular message folder, among other information. The immutable portion, which comprises the bulk of electronic mail data (namely, the message itself), once stored is never edited. Immutable data is written f+1 times on as many unique servers, to tolerate f number of server failures using Lampson's stable storage algorithm. The metadata portion is stored 2f+1 times on as many unique servers to tolerate f number of server failures using quorum voting. Once the message has been stored once, instead of being copied, its location is passed around by reference. The system utilizes a two-tier architecture. One tier consists of servers which store message metadata and immutable data, the Data Servers, and servers that operating upon those data, the Access Servers. Message store integrity is maintained in the event of server failure and as the set of Data Servers changes. In the latter case, I/O and storage workloads are dynamically redistributed across Data Servers in an efficient way.
TL;DR: In this paper, a database computer system and a method for making applications recoverable from system crashes is described, where the application state (i.e., address space) is treated as a single object which can be atomically flushed in a manner akin to flushing individual pages in database recovery techniques.
Abstract: This invention concerns a database computer system and method for making applications recoverable from system crashes. The application state (i.e., address space) is treated as a single object which can be atomically flushed in a manner akin to flushing individual pages in database recovery techniques. To enable this monolithic treatment of the application, executions performed by the application are mapped to logical loggable operations which can be posted to the stable log. Any modifications to the application state are accumulated and the application state is periodically flushed to stable storage using an atomic procedure. The application recovery integrates with database recovery, and effectively eliminates or at least substantially reduces the need for check pointing applications. In addition, optimization techniques are described to make the read, write, and recovery phases more efficient.