Elisabeth Baseman
Los Alamos National Laboratory
14 Papers
47 Citations
Elisabeth Baseman is an academic researcher from Los Alamos National Laboratory. The author has contributed to research in topics: Troubleshooting & Computer science. The author has an hindex of 7, co-authored 14 publications.
Chat about Author
Papers
Lessons learned from memory errors observed over the lifetime of Cielo
Scott Levy,Kurt B. Ferreira,Nathan DeBardeleben,Taniya Siddiqua,Vilas Sridharan,Elisabeth Baseman +5 more
- 11 Nov 2018
TL;DR: A corpus of empirical failure data collected over the entire five-year lifetime of Cielo, a leadership-class HPC system, provides critical analysis of, and guidance for, the deployment of extreme-scale systems.
35
Relational Synthesis of Text and Numeric Data for Anomaly Detection on Computing System Logs
Elisabeth Baseman,Sean Blanchard,Zongze Li,Song Fu +3 more
- 01 Dec 2016
TL;DR: An anomaly detection framework that combines graph analysis, relational learning, and kernel density estimation to detect unusual syslog messages and retrieves anomalous behaviors inserted into syslog files from a virtual machine is presented.
25
Design, Use and Evaluation of P-FSEFI: A Parallel Soft Error Fault Injection Framework for Emulating Soft Errors in Parallel Applications
Qiang Guan,Nathan BeBardeleben,Panruo Wu,Stephan Eidenbenz,Sean Blanchard,Laura Monroe,Elisabeth Baseman,Li Tan +7 more
- 22 Aug 2016
TL;DR: A sufficiently sophisticated software fault injection framework, an application can be studied to see how it would handle many of the errors that manifest at the application level, and a developer can progressively improve the resilience at targeted locations they believe are important for their target hardware.
23
Improving DRAM Fault Characterization through Machine Learning
Elisabeth Baseman,Nathan DeBardeleben,Kurt B. Ferreira,Scott Levy,Steven Raasch,Vilas Sridharan,Taniya Siddiqua,Qiang Guan +7 more
- 01 Jun 2016
TL;DR: This work explores the predictive performance of an online machine learning-based approach in classifying DRAM fault modes from two leadership-class supercomputing facilities and provides a critical analysis of this online learning technique that can benefit system designers to help inform best practices for dealing with reliability on future systems.
17
Markov Chain Modeling for Anomaly Detection in High Performance Computing System Logs
Abida Haque,Alexandra DeLucia,Elisabeth Baseman +2 more
- 12 Nov 2017
TL;DR: This work learns a Markov chain model from average case system logs and uses it to generate synthetic system log data and explores the abilities of this learned model to identify anomalous behavior by evaluating its ability to catch inserted and missing log messages.
15