Stack Trace Analysis for Large Scale Debugging

doi:10.1109/IPDPS.2007.370254

Open AccessProceedings Article10.1109/IPDPS.2007.370254

Stack Trace Analysis for Large Scale Debugging

Dorian Arnold, +5 more

- 26 Mar 2007

- pp 1-10

172

TL;DR: The Stack Trace Analysis Tool (STAT) is presented to aid in debugging extreme-scale applications and leverages MRNet, an infrastructure for tool control and data analyses, to overcome scalability barriers faced by heavy-weight debuggers.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1109/CLUSTR.2009.5289150

24/7 Characterization of petascale I/O workloads

Philip Carns, +5 more

- 16 Oct 2009

TL;DR: Darshan is demonstrated to have the ability to characterize the I/O behavior of four scientific applications and it is demonstrated that it induces negligible overhead for I-O intensive jobs with as many as 65,536 processes.

...read moreread less

240

•Proceedings Article•10.1145/2931037.2931047

Binary code is not easy

Xiaozhu Meng, +1 more

- 18 Jul 2016

TL;DR: New code parsing algorithms in the open source Dyninst tool kit are presented, including a new model for describing jump tables that improves the ability to precisely determine the control flow targets, a new interprocedural analysis to determine when a function is non-returning, and techniques for handling tail calls.

...read moreread less

147

•Journal Article•10.1016/J.JPDC.2008.09.001

ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

Michael Noeth, +4 more

- 01 Aug 2009

TL;DR: An approach is contributed that provides orders of magnitude smaller, if not near-constant size, communication traces regardless of the number of nodes while preserving structural information.

...read moreread less

144

ScalaTrace: Scalable Compression and Replay of Communication Traces for High Performance Computing

Michael Noeth, +4 more

- 16 May 2008

TL;DR: In this article, the authors introduce intra-and inter-node compression techniques of MPI events that are capable of extracting an application's communication structure and present a replay mechanism for the traces generated by their approach and discuss results of their implementation for BlueGene/L.

...read moreread less

116

•Proceedings Article•10.1145/2155620.2155655

Hardware transactional memory for GPU architectures

Wilson W. L. Fung, +3 more

- 03 Dec 2011

TL;DR: KILO TM is proposed, a novel hardware TM design for GPUs that scales to 1000s of concurrent transactions that uses word-level, value-based conflict detection to avoid broadcast communication and reduce on-chip storage overhead.

...read moreread less

114

...

Expand

References

•Proceedings Article•10.1109/DSN.2002.1029005

Pinpoint: problem determination in large, dynamic Internet services

Mike Y. Chen, +4 more

- 23 Jun 2002

TL;DR: This work presents a dynamic analysis methodology that automates problem determination in these environments by coarse-grained tagging of numerous real client requests as they travel through the system and using data mining techniques to correlate the believed failures and successes of these requests to determine which components are most likely to be at fault.

...read moreread less

969

Journal Article•10.1177/109434200001400404

An API for Runtime Code Patching

Bryan R. Buck, +1 more

- 01 Nov 2000

TL;DR: The authors present a postcompiler program manipulation tool called Dyninst, which provides a C++ class library for program instrumentation that permits machine-independent binary instrumentation programs to be written.

...read moreread less

696

Journal Article•10.1175/1520-0477(2001)082<2357:TCCSM>2.3.CO;2

The Community Climate System Model

Maurice L. Blackmon, +25 more

- 01 Nov 2001

- Bulletin of the American Meteorological ...

TL;DR: The history of the CCSM, its current capabilities, and plans for its future development and applications are outlined, with the goal of providing a summary useful to present and future users.

...read moreread less

486

Proceedings Article•10.1145/1217935.1217972

Automated known problem diagnosis with event traces

Chun Yuan, +6 more

- 18 Apr 2006

TL;DR: This work proposes to use system behavior information such as system event traces to build correlations with solved problems, instead of using only vague text descriptions as in existing practices to enable automatic identification of the root cause of a problem if it is a known one, which would further lead to its resolution.

...read moreread less

200

Proceedings Article•10.1145/1145319.1145342

Automated, scalable debugging of MPI programs with Intel® Message Checker

Jayant DeSouza, +5 more

- 15 May 2005

TL;DR: It is described how automated tools can detect such errors and IMC's unique technology automatically detects several kinds of MPI errors such as various types of mismatches, race conditions, deadlocks and potential deadlocks, and resource misuse.

...read moreread less

110