Proceedings Article10.1145/503956.504005
The Bugnet distributed debugging system
Larry D. Wittie
- 08 Sep 1986
- pp 1-3
8
TL;DR: A true distributed debugger should handle the sequence of events that lead to an error in a distributed program can be quite complex to model and difficult to capture and recreate using a serial debugger.
read more
Abstract: Distributed debugging via serial debuggers suffers from two problems: i. serial debuggers are made for one process running on a single processor, whereas distributed programs contain many processes running on different processors in a network. 2. the sequence of events that lead to an error in a distributed program can be quite complex to model and difficult to capture and recreate using a serial debugger. A true distributed debugger should handle both problems easily.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Efficient System-Enforced Deterministic Parallelism
TL;DR: This work introduces a new parallel programming model addressing issues of misbehaved software, and uses Determinator, a proof-of-concept OS, to demonstrate the model's practicality.
175
Efficient system-enforced deterministic parallelism
Amittai Aviram,Shu-Chun Weng,Sen Hu,Bryan Ford +3 more
- 04 Oct 2010
TL;DR: A new parallel programming model addressing issues of misbehaved software to defeat repeatability, and which avoids the introduction of read/write data races, and converts write/write races into reliably-detected conflicts is introduced.
Macrodebugging: global views of distributed program execution
Tamim Sookoor,Timothy W. Hnat,Pieter Hooimeijer,Westley Weimer,Kamin Whitehouse +4 more
- 04 Nov 2009
TL;DR: MDB is presented, the first system to support the debugging of macroprograms, and it is shown that macrodebugging is both easy and efficient: MDB consumes few system resources and requires few user commands to find the cause of bugs.
56
Coordinated checkpoint/restart process fault tolerance for mpi applications on hpc systems
Andrew Lumsdaine,Joshua Hursey +1 more
- 01 Jan 2010
TL;DR: This thesis identifies a complete set of capabilities that compose to form a coordinated C/R infrastructure for MPI applications running on HPC systems that provide applications with transparent, yet optionally application configurable, fault tolerance.
39
Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance
Jonathan Lifflander,Esteban Meneses,Harshitha Menon,Phil Miller,Sriram Krishnamoorthy,Laxmikant V. Kale +5 more
- 01 Dec 2014
TL;DR: A novel algebraic framework for reasoning about the minimum dependencies required to represent the partial order for different orderings and interleavings is presented and an existing scalable message-logging fault tolerance scheme that uses a total order is improved on.
References
Virtual time
TL;DR: Virtual time is a new paradigm for organizing and synchronizing distributed systems which can be applied to such problems as distributed discrete event simulation and distributed database concurrency control.
2.3K
A distributed programs monitor for Berkeley UNIX
TL;DR: In this article, the authors use a model of distributed computation and measurement to implement a program monitoring system for programs running on the Berkeley UNIX 4.2BSD operating system, which describes the activities of the processes within a distributed program in terms of computation and communication.
77
TEMPO: Time Services for the Berkeley Local Network
Riccardo Gusella,Stefano Zatti +1 more
- 01 Jan 1983
TL;DR: Several experiments show that a total quasi ordering can be based on the unique network timing maintained by the TEMPO service, and a protocol based on it can adjust the clocks.
12
•Proceedings Article
BUGNET: A Debugging system for parallel programming environments.
Ronald Curtis,Larry D. Wittie +1 more
- 01 Jan 1982
•Book
CLU: Reference Manual
Barbara Liskov,E Moss,A Snyder,R Atkinson,J C. Schaffert,Toby Bloom,Robert W. Scheifler +6 more
- 01 Jun 1981
TL;DR: This document serves both as an introduction to CLU and as a language reference manual that describes each aspect of CLU in detail, and discusses the proper use of various features.
Related Papers (5)
Michel Adam,Michel Hurfin,Michel Raynal,Noël Plouzeau +3 more
- 01 Jan 1991
Ai Guo,You Mingqi +1 more
- 17 Apr 2013