Elisabeth Baseman

Los Alamos National Laboratory

14 Papers

47 Citations

Elisabeth Baseman is an academic researcher from Los Alamos National Laboratory. The author has contributed to research in topics: Troubleshooting & Computer science. The author has an hindex of 7, co-authored 14 publications.

Author Tools

Create citation map

Create Author Profile

Analyze Elisabeth Baseman's Top Papers

Chat about Author

Papers

•Proceedings Article•10.1109/SC.2018.00046

Lessons learned from memory errors observed over the lifetime of Cielo

Scott Levy, +5 more

- 11 Nov 2018

TL;DR: A corpus of empirical failure data collected over the entire five-year lifetime of Cielo, a leadership-class HPC system, provides critical analysis of, and guidance for, the deployment of extreme-scale systems.

...read moreread less

Proceedings Article•10.1109/ICMLA.2016.0158

Relational Synthesis of Text and Numeric Data for Anomaly Detection on Computing System Logs

Elisabeth Baseman, +3 more

- 01 Dec 2016

TL;DR: An anomaly detection framework that combines graph analysis, relational learning, and kernel density estimation to detect unusual syslog messages and retrieves anomalous behaviors inserted into syslog files from a virtual machine is presented.

...read moreread less

•Proceedings Article•10.5555/3021426.3021429

Design, Use and Evaluation of P-FSEFI: A Parallel Soft Error Fault Injection Framework for Emulating Soft Errors in Parallel Applications

Qiang Guan, +7 more

- 22 Aug 2016

TL;DR: A sufficiently sophisticated software fault injection framework, an application can be studied to see how it would handle many of the errors that manifest at the application level, and a developer can progressively improve the resilience at targeted locations they believe are important for their target hardware.

...read moreread less

•Proceedings Article•10.1109/DSN-W.2016.13

Improving DRAM Fault Characterization through Machine Learning

Elisabeth Baseman, +7 more

- 01 Jun 2016

TL;DR: This work explores the predictive performance of an online machine learning-based approach in classifying DRAM fault modes from two leadership-class supercomputing facilities and provides a critical analysis of this online learning technique that can benefit system designers to help inform best practices for dealing with reliability on future systems.

...read moreread less

Proceedings Article•10.1145/3152493.3152559

Markov Chain Modeling for Anomaly Detection in High Performance Computing System Logs

Abida Haque, +2 more

- 12 Nov 2017

TL;DR: This work learns a Markov chain model from average case system logs and uses it to generate synthetic system log data and explores the abilities of this learned model to identify anomalous behavior by evaluating its ability to catch inserted and missing log messages.

...read moreread less