Visualizing Distributed System Executions

doi:10.1145/3375633

Journal Article10.1145/3375633

Visualizing Distributed System Executions

Ivan Beschastnikh, +5 more

- 04 Mar 2020

- ACM Transactions on Software Engineering...

- Vol. 29, Iss: 2, pp 1-38

51

TL;DR: This article presents a novel approach for tackling three tasks frequently performed during analysis of distributed system executions: understanding the relative ordering of events, searching for specific patterns of interaction between hosts, and identifying structural similarities and differences between pairs of executions.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/tdsc.2020.3025289

Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems

01 May 2022

- IEEE Transactions on Dependable and Secu...

TL;DR: In this paper , the authors apply unsupervised machine learning on execution traces of the injected system, to ease the discovery and interpretation of failure modes, and evaluated the proposed approach in the context of fault injection experiments on the OpenStack cloud computing platform.

...read moreread less

26

Theia: Visual Signatures for Problem Diagnosis in Large Hadoop Clusters.

Elmer Garduno, +4 more

- 01 Jan 2013

TL;DR: Theia as discussed by the authors analyzes application-level logs in a Hadoop cluster, and generates visual signatures of each job's performance, providing compact representations of task durations, task status, and data consumption by jobs.

...read moreread less

23

•Journal Article•10.1109/TDSC.2020.3025289

Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems

Domenico Cotroneo, +3 more

- 30 Sep 2020

- arXiv: Software Engineering

TL;DR: A new paradigm (fault injection analytics) that applies unsupervised machine learning on execution traces of the injected system, to ease the discovery and interpretation of failure modes with a low computational cost is introduced.

...read moreread less

16

Proceedings Article•10.1109/ICSE-SEIP52600.2021.00015

An Interview Study of how Developers use Execution Logs in Embedded Software Engineering

Nan Yang, +4 more

- 05 Jan 2021

TL;DR: In this article, the authors explore the type of logs developers analyze, the purposes for which developers analyze logs, the information developers need from logs and their expectation on tool support, and their main contribution is that the lack of domain knowledge, lack of familiarity with code base and software design, and presence of concurrency, raise major challenges in log analysis for complex and multidisciplinary systems.

...read moreread less

15

Proceedings Article•10.1145/3575693.3575695

Compiling Distributed System Models with PGo

Finn Hackett, +4 more

- 27 Jan 2023

TL;DR: Modular PlusCal as mentioned in this paper is a language that extends PlusCal by cleanly separating the model of a system from a model of its environment and then presents a compiler tool-chain called PGo that automatically translates MPCal models to TLA+ for model checking, and also compiles MPcal models to runnable Go code.

...read moreread less

12

...

Expand

References

•Journal Article•10.1145/359545.359563

Time, clocks, and the ordering of events in a distributed system

Leslie Lamport

- 01 Jul 1978

- Communications of The ACM

TL;DR: In this article, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.

...read moreread less

9.4K

•Book Chapter•10.1145/3335772.3335934

Time, clocks, and the ordering of events in a distributed system

Leslie Lamport

- 04 Oct 2019

- Concurrency and Computation: Practice an...

TL;DR: In this paper, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.

...read moreread less

8.4K

•Book Chapter•10.1007/3-540-45518-3_18

Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Antony Rowstron, +1 more

- 12 Nov 2001

- Lecture Notes in Computer Science

TL;DR: Pastry as mentioned in this paper is a scalable, distributed object location and routing substrate for wide-area peer-to-peer ap- plications, which performs application-level routing and object location in a po- tentially very large overlay network of nodes connected via the Internet.

...read moreread less

8K

•Journal Article•10.1016/0167-6423(87)90035-9

Statecharts: A visual formalism for complex systems

David Harel

- 01 Jun 1987

- Science of Computer Programming

TL;DR: It is intended to demonstrate here that statecharts counter many of the objections raised against conventional state diagrams, and thus appear to render specification by diagrams an attractive and plausible approach.

...read moreread less

7.5K

•Book

Concurrency Control and Recovery in Database Systems

Philip A. Bernstein, +2 more

- 01 Feb 1987

TL;DR: In this article, the design and implementation of concurrency control and recovery mechanisms for transaction management in centralized and distributed database systems is described. But this can lead to interference between queries and updates.

...read moreread less

4.2K