Journal Article10.1109/32.31353
Recovery point selection on a reverse binary tree task model
TL;DR: An analysis is conducted of the complexity of placing recovery points where the computation is modeled as a reverse binary tree task model, and algorithms are devised for solving the recovery point placement problem.
read more
Abstract: An analysis is conducted of the complexity of placing recovery points where the computation is modeled as a reverse binary tree task model. The objective is to minimize the expected computation time of a program in the presence of faults. The method can be extended to an arbitrary reverse tree model. For uniprocessor systems, an optimal placement algorithm is proposed. For multiprocessor systems, a procedure for computing their performance is described. Since no closed form solution is available, an alternative measurement is proposed that has a closed form formula. On the basis of this formula, algorithms are devised for solving the recovery point placement problem. The estimated formula can be extended to include communication delays where the algorithm devised still applies. >
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A survey of checkpointing algorithms for parallel and distributed computers
S. Kalaiselvi,V. Rajaraman +1 more
TL;DR: This paper surveys the algorithms which have been reported in the literature for checkpointing parallel/distributed systems and concludes that in development of parallel programs the user has to do a fair amount of work in distributing tasks and this information can be effectively used to simplify checkpointing and rollback recovery.
System Structure for Software Fault Tolerance
Brian Randell
- 01 Jan 1975
TL;DR: The aim is to facilitate the provision of dependable error detection and recovery facilities which can cope with errors caused by residual design inadequacies, particularly in the system software, rather than merely the occasional malfunctioning of hardware components.
67
Efficient algorithms for selection of recovery points in tree task models
TL;DR: An algorithm to minimize the expected computation time of the task system under a uniprocessor environment has been developed for the binary tree model.
4
•Dissertation
Checkpointing Algorithms for Parallel Computers
S Kalaiselvi
- 01 Feb 1997
TL;DR: Dedicated to m y beloved P a r e n t s a n d m y dear Uncle.
1
References
System structure for software fault tolerance
TL;DR: In this article, the authors present a method for structuring complex computing systems by the use of what they term "recovery blocks", "conversations", and "fault-tolerant interfaces".
System structure for software fault tolerance
Brian Randell
- 01 Jan 1975
TL;DR: In this article, the authors present a method for structuring complex computing systems by the use of what they term "recovery blocks", "conversations", and "fault-tolerant interfaces".
1.1K
A first order approximation to the optimum checkpoint interval
TL;DR: It is standard practice to save periodically sufficient information to enable the job to be restarted at the previous point at which information was saved, and the saving of such information at these points is called checkpointing.
693
Reliable Computer Systems
Santosh K. Shrivastava
- 01 Oct 1985
TL;DR: The terms fault, error and failure are carefully defined and distinguished in the hope that an agreed terminology will emerge in the fault tolerance community.
222