Reliability for Networked Storage Nodes

doi:10.1109/DSN.2006.61

Proceedings Article10.1109/DSN.2006.61

Reliability for Networked Storage Nodes

KK Rao, +2 more

- 25 Jun 2006

- pp 237-248

56

TL;DR: This paper presents alternatives for distributing this redundancy, and models to determine the reliability of such systems, and performs sensitivity analyses, where selected parameters are varied to observe their effect on reliability.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article

Mean time to meaningless: MTTDL, Markov models, and storage system reliability

Kevin M. Greenan, +2 more

- 22 Jun 2010

TL;DR: The storage community needs to replace MTTDL with a metric that can be used to accurately compare the reliability of systems in a way that reflects the impact of data loss in the real world.

...read moreread less

106

Proceedings Article•10.1109/MSST.2010.5496983

Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs

Kevin M. Greenan, +2 more

- 03 May 2010

TL;DR: This paper describes constructions of two novel flat XOR-code, Stepped Combination and HD-Combination codes, and describes an algorithm for flatXOR-codes that enumerates recovery equations, i.e., sets of disks that can recover a failed disk.

...read moreread less

103

Journal Article•10.1145/1629075.1629076

Higher reliability redundant disk arrays: Organization, operation, and coding

Alexander Thomasian, +1 more

- 30 Nov 2009

- ACM Transactions on Storage

TL;DR: Variations to RAID5 and RAID6 organizations are described, including clustered RAID, different methods to update parities, rebuild processing, disk scrubbing to eliminate sector errors, and the intra-disk redundancy (IDR) method to deal with sector errors.

...read moreread less

74

Proceedings Article•10.1109/DSN.2006.61

Reliability for Networked Storage Nodes

KK Rao, +2 more

- 25 Jun 2006

TL;DR: This paper presents alternatives for distributing this redundancy, and models to determine the reliability of such systems, and performs sensitivity analyses, where selected parameters are varied to observe their effect on reliability.

...read moreread less

57

Guest paper: Failure trends in a large disk drive population

Eduardo Pinheiro, +2 more

- 23 Apr 2007

TL;DR: It is found that temperature and activity levels were much less correlated with drive failures than previously reported, and models based on SMART parameters alone are unlikely to be useful for predicting individual drive failures.

...read moreread less

46

...

Expand

References

•Book

Probability and Statistics With Reliability, Queuing and Computer Science Applications

Kishor S. Trivedi

- 01 Jan 1982

TL;DR: Probability and Statistics with Reliability, Queuing and Computer Science Applications, Second Edition as discussed by the authors is a comprehensive introduction to probabiliby, stochastic processes, and statistics for students of computer science, electrical and computer engineering, and applied mathematics.

...read moreread less

2.6K

•Proceedings Article

Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?

Bianca Schroeder, +1 more

- 13 Feb 2007

TL;DR: In this article, the authors present and analyze field-gathered disk replacement data from a number of large production systems, including high-performance computing sites and internet services sites, and find that in the field, annual disk replacement rates typically exceed 1%, with 2-4% common and up to 13% observed on some systems.

...read moreread less

920

•Journal Article•10.1109/12.364531

EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures

Mario Blaum, +3 more

- 01 Feb 1995

- IEEE Transactions on Computers

TL;DR: A novel method for tolerating up to two disk failures in RAID architectures based on Reed-Solomon error-correcting codes, which can be used in any system requiring large symbols and relatively short codes, for instance, in multitrack magnetic recording.

...read moreread less

814

•Proceedings Article

Row-diagonal parity for double disk failure correction

Peter F. Corbett, +6 more

- 31 Mar 2004

TL;DR: Implementation results show that RDP performance can be made nearly equal to single parity RAID-4 and RAID-5 performance.

...read moreread less

551

Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!).

Peter F. Corbett, +6 more

- 01 Jan 2004

TL;DR: Implementation results show that RDP performance can be made nearly equal to single parity RAID-4 and RAID-5 performance.

...read moreread less

443

...

Expand

Reliability for Networked Storage Nodes

Chat with Paper

AI Agents for this Paper

Citations

Mean time to meaningless: MTTDL, Markov models, and storage system reliability

Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs

Higher reliability redundant disk arrays: Organization, operation, and coding

Reliability for Networked Storage Nodes

Guest paper: Failure trends in a large disk drive population

References

Probability and Statistics With Reliability, Queuing and Computer Science Applications

Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?

EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures

Row-diagonal parity for double disk failure correction

Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!).

Related Papers (5)

A case for redundant arrays of inexpensive disks (RAID)

EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures

Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?

Row-diagonal parity for double disk failure correction

WEAVER codes: highly fault tolerant erasure codes for storage systems