Open AccessDissertation
SoftECC : A System for Software Memory Integrity Checking
Dave Dopson
- 01 Jan 2005
TL;DR: Preliminary measurements with an implementation of Soft ECC in the JOS kernel on the x86 architecture show that SoftECC can halve the number of undetectable soft errors using minimal compute time.
read more
Abstract: SoftECC is software memory integrity checking agent. SoftECC repeatedly computes page-level checksums as an efficient means to verify that a page’s contents have not changed. Memory errors that occur between two checksum computations will cause the two checksum values to disagree. Legitimate memory writes also cause a change in checksum value, so a page can only be protected during periods of time when it is not being written to. Preliminary measurements with an implementation of SoftECC in the JOS kernel on the x86 architecture show that SoftECC can halve the number of undetectable soft errors using minimal compute time. Thesis Supervisor: Frans Kaashoek Title: Professor
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Cooperative Application/OS DRAM fault recovery
Patrick G. Bridges,Mark Hoemmen,Kurt B. Ferreira,Michael A. Heroux,Philip Soltero,Ron Brightwell +5 more
- 29 Aug 2011
TL;DR: A cross-layer application / OS framework to handle uncorrected memory errors and its integration with a new fault-tolerant iterative solver within the Trilinos library are described and initial convergence results are presented.
Generative software-based memory error detection and correction for operating system data structures
Christoph Borchert,Horst Schirmeier,Olaf Spinczyk +2 more
- 24 Jun 2013
TL;DR: A software-based memory error protection approach is presented, which is used to harden the eCos operating system in a case study and exploits the object-oriented program structure of eCos to identify well-suited code locations for the insertion of generative fault-tolerance measures.
46
Evaluating operating system vulnerability to memory errors
Kurt B. Ferreira,Kevin Pedretti,Ron Brightwell,Patrick G. Bridges,David Fiala,Frank Mueller +5 more
- 29 Jun 2012
TL;DR: The results show the Kitten lightweight operating system may be an easier target to harden against memory errors due to its smaller memory footprint, largely deterministic state, and simpler system structure.
Patent
Validation of memory on-die error correction code
John B. Halbert,Kuljit S. Bains +1 more
- 05 Oct 2017
TL;DR: In this article, the authors present an embodiment of a memory device that includes one or more memory arrays for the storage of data; control logic to control operation of the memory device; and ECC (error correction code) logic, including ECC correction logic to correct data and eCC generation logic to generate ECC code bits and store the ECC bits in the memory arrays.
7
Flexible Soft Error Mitigation Strategy for Memories in Mixed-Critical Systems
Amer Kajmakovic,Konrad Diwold,Nermin Kajtazovic,Robert Zupanc,Georg Macher +4 more
- 01 Oct 2019
TL;DR: Redundant Parity (RP) extends a 1oo2 system's ability of fault detection by enabling the recovery of faulty data utilizing the parity bit concept, and preliminary results suggest that RP is indeed a suitable soft error mitigation strategy in existing 1oo1 fail-safe systems.
6
References
Exokernel: an operating system architecture for application-level resource management
Dawson Engler,M. F. Kaashoek,James O'Toole +2 more
- 03 Dec 1995
TL;DR: The prototype exokernel system implemented here is at least five times faster on operations such as exception dispatching and interprocess communication, and allows applications to control machine resources in ways not possible in traditional operating systems.
IBM experiments in soft fails in computer electronics (1978–1994)
TL;DR: The experimental work at IBM over the last fifteen years in evaluating the effect of cosmic rays on terrestrial electronic components became a significant factor in IBM`s efforts toward improved product reliability.
495
A built-in Hamming code ECC circuit for DRAMs
Kiyohiro Furutani,Kazutami Arimoto,Hiroshi Miyamoto,Toshifumi Kobayashi,Kenichi Yasuda,Koichiro Mashiko +5 more
TL;DR: An error checking and correcting (ECC) technique that checks multiple cell data simultaneously and allows fast column access is described, and can be applied to a 16-Mbit DRAM with 20% chip area increase and less access-time penalty.
79