John Hammond
University of Texas at Austin
8 Papers
55 Citations
John Hammond is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Event (computing) & Data mapping. The author has an hindex of 8, co-authored 8 publications.
Chat about Author
Papers
Diagnosing the root-causes of failures from cluster log files
Edward Chuah,Shyh-hao Kuo,Paul Hiew,William-Chandra Tjhi,Gary Lee,John Hammond,Marek T. Michalewicz,Terence Hung,James C. Browne +8 more
- 01 Dec 2010
TL;DR: A diagnostics tool, FDiag, is developed to extract the log entries as structured message templates and uses statistical correlation analysis to establish probable cause and effect relationships for the fault being analyzed.
66
Linking Resource Usage Anomalies with System Failures from Cluster Log Data
Edward Chuah,Arshad Jhumka,Sai Narasimhamurthy,John Hammond,James C. Browne,Bill Barth +5 more
- 30 Sep 2013
TL;DR: In this paper, the authors present the ANCOR diagnostics system that applies TACC_Stats data to identify resource use anomalies and applies log analysis to link resource use anomaly with system failures.
60
End-to-end framework for fault management for open source clusters: Ranger
John Hammond,Tommy Minyard,Jim Browne +2 more
- 02 Aug 2010
TL;DR: A framework for end-to-end fault management for open source clusters which is being developed on Ranger, but which targets general open source software based clusters and a rationalized system logging stack for Linux, low overhead log and status monitoring, and a multilevel suite of diagnostic analyses is presented.
25
Enabling comprehensive data-driven system management for large computational facilities
James C. Browne,Robert L. DeLeon,Charng-Da Lu,Matthew D. Jones,Steven M. Gallo,Amin Ghadersohi,Abani Patra,William L. Barth,John Hammond,Thomas R. Furlani,Robert McLay +10 more
- 17 Nov 2013
TL;DR: A tool chain based on the open source tool TACC_Stats for systematic and comprehensive job level resource use measurement for large cluster computers and its incorporation into XDMoD, a reporting and analytics framework for resource management that targets meeting the information needs of users, application developers, systems administrators, systems management and funding managers is presented.
20
Comprehensive job level resource usage measurement and analysis for XSEDE HPC systems
Charng-Da Lu,James C. Browne,Robert L. DeLeon,John Hammond,William L. Barth,Thomas R. Furlani,Steven M. Gallo,Matthew D. Jones,Abani Patra +8 more
- 22 Jul 2013
TL;DR: A methodology for comprehensive job level resource use measurement and analysis and applications of the analyses to planning for HPC systems and a case study application of the methodology to the XSEDE Ranger and Lonestar4 systems at the University of Texas are presented.
17