About: Hard disk drive failure is a research topic. Over the lifetime, 17 publications have been published within this topic receiving 196 citations. The topic is also known as: hard drive failure & HDD failure.
TL;DR: Evaluate and compare the performance of 21 machine learning algorithms by using them for proactive hard disk drive failure detection and show that different algorithms are suitable for different applications based on the desired prediction quality and the tolerated training and prediction time.
Abstract: Failures or unexpected events are inevitable in critical and complex systems. Proactive failure detection is an approach that aims to detect such events in advance so that preventative or recovery measures can be planned, thus improving system availability. Machine learning techniques have been successfully applied to learn patterns from available datasets and to classify or predict to which class a new instance of data belongs. In this paper, we evaluate and compare the performance of 21 machine learning algorithms by using them for proactive hard disk drive failure detection. For this comparison, we use WEKA as an experimentation platform and benchmark publicly available datasets of hard disk drives that are used to predict imminent failures before the actual failures occur. The results show that different algorithms are suitable for different applications based on the desired prediction quality and the tolerated training and prediction time.
TL;DR: In this paper, the authors present a system and methods that facilitate the prevention of hard disk drive failure by taking corrective actions that alleviate or mitigate hostile conditions (e.g., excessive vibration, heat, humidity and the like) prior to disk failure.
Abstract: The subject disclosure pertains to systems and methods that facilitate prevention of hard disk drive failure. Sophisticated drives are likely to be capable of detecting many of the conditions that either cause or precede hard disk drive failure. Such information can be provided to a host operating system, analyzed and used to correct errors associated with the hard disk drive to reduce the probability of failure and/or damage of the hard disk drive. The host operating system can respond by taking corrective actions that alleviate or mitigate hostile conditions (e.g., excessive vibration, heat, humidity and the like) prior to disk failure.
TL;DR: In this article, a technique for reporting a hard disk drive failure in a computer system includes detecting a failure of one of a plurality of hard disk drives and reporting the failure to a CIMOM (Conceptual Information Model Object Manager) which in turn forwards a message by an LRA (Local Response Agent) to a PMP (Platform Management Provider).
Abstract: A technique for reporting a hard disk drive failure in a computer system includes detecting a failure of one of a plurality of hard disk drives and reporting the failure to a CIMOM (Conceptual Information Model Object Manager) which in turn forwards a message by an LRA (Local Response Agent) to a PMP (Platform Management Provider which in turn forwards a command to an SMC (Server Management Controller), which forward the command to an HSC (Hot-Swap Controller) activate a display, the display reporting the failure or of a particular one of the hard disk drives to a user.
TL;DR: In this paper, the authors present a system that detects the onset of hard disk drive failure by measuring the vibrations from the hard disk and comparing the measured vibrations with a reference vibration signature.
Abstract: A system that detects the onset of hard disk drive failure. During operation, the system measures vibrations from the hard disk drive to produce one or more vibration signals. Next, the system generates a vibration signature for the hard disk drive from the measured vibration signals. The system then determines if the vibration signature indicates the onset of hard disk failure by comparing the vibration signature with a reference vibration signature for the hard disk drive. If so, the system generates a warning or takes a remedial action.
TL;DR: The proposed method, named Bayesian network based Method for Failure prediction in HDDs (BNFH) uses a subset of the SMART attributes and a set of SMART trend related attributes to provide remaining life estimates of HDDs.
Abstract: The ability to predict failures in Hard Disk Drives (HDD) is a major objective of HDD manufacturers since avoiding unexpected failures may prevent data loss. As a consequence, failure prediction in HDDs became a topic that attracted much attention in recent years. Nowadays, most HDDs are equipped with a threshold-based monitoring system named Self-Monitoring, Analysis and Reporting Technology (SMART). The system collects several performance parameters and detects anomalies that may indicate incipient failures. Although the SMART system is very popular, it achieves failure detection rates of 3% to 10%. Moreover, SMART works as an incipient failure detection method and does not provide an estimate of the remaining life of the HDD. In this paper, we propose a failure prediction method using SMART attributes and a Bayesian Network. The proposed method, named Bayesian network based Method for Failure prediction in HDDs (BNFH) uses a subset of the SMART attributes and a set of SMART trend related attributes to provide remaining life estimates of HDDs. To demonstrate practical usefulness, this method was applied to a dataset consisting of 49,056 hard drives from Backblaze's data centers.