About: Data striping is a research topic. Over the lifetime, 1058 publications have been published within this topic receiving 31497 citations. The topic is also known as: striping & striped.
TL;DR: In this article, the data is divided into segments and each segment is distributed randomly on one of several storage units, independent of the storage units on which other segments of the media data are stored.
Abstract: Multiple applications request data from multiple storage units over a computer network. The data is divided into segments and each segment is distributed randomly on one of several storage units, independent of the storage units on which other segments of the media data are stored. Redundancy information corresponding to each segment also is distributed randomly over the storage units. The redundancy information for a segment may be a copy of the segment, such that each segment is stored on at least two storage units. The redundancy information also may be based on two or more segments. This random distribution of segments of data and corresponding redundancy information improves both scalability and reliability. When a storage unit fails, its load is distributed evenly over to remaining storage units and its lost data may be recovered because of the redundancy information. When an application requests a selected segment of data, the request may be processed by the storage unit with the shortest queue of requests. Random fluctuations in the load applied by multiple applications on multiple storage units are balanced nearly equally over all of the storage units. Small data files also may be stored on storage units that combine small files into larger segments of data using a log structured file system. This combination of techniques results in a system which can transfer both multiple, independent high-bandwidth streams of data and small data files in a scalable manner in both directions between multiple applications and multiple storage units.
TL;DR: In this article, a data layout mechanism for allocating metadata within a storage system employing data striping is described, where metadata associated with at least two data stripe units of the same stripe is stored within a single metadata chunk, and metadata chunks are evenly distributed across the storage devices.
Abstract: A data layout mechanism is described for allocating metadata within a storage system employing data striping. The data layout mechanism includes a number of storage devices, each of the storage devices having storage spaces allocated to store individual data stripe units associated with a number of stripes. The data layout mechanism further includes a plurality of metadata chunks allocated within the storage devices such that (1) metadata associated with at least two data stripe units of the same stripe is stored within a single metadata chunk, and (2) the metadata chunks are evenly distributed across the storage devices.
TL;DR: In this article, a system and method for dynamic data recovery operates on a computer storage system that includes a plurality of disk drives for storing parity groups, each parity group includes storage blocks, each of the storage blocks is stored on a separate disk drive such that no two storage blocks from a given parity set reside on the same disk drive.
Abstract: A system and method for dynamic data recovery is described. The system and method for dynamic data recovery operates on a computer storage system that includes a plurality of disk drives for storing parity groups. Each parity group includes storage blocks. Each of the storage blocks is stored on a separate disk drive such that no two storage blocks from a given parity set reside on the same disk drive. The computer storage system further includes a recovery module to dynamically recover data that is lost when at least a portion of one disk drive in the plurality of disk drives becomes unavailable. The recovery module is configured to produce a reconstructed block by using information in the remaining storage blocks of a parity set that corresponds to an unavailable storage block. The recovery module is further configured to split the parity group corresponding to an unavailable storage block into two parity groups if the parity group corresponding to the unavailable storage block spanned all of the drives in the plurality of disk drives.
TL;DR: In this paper, the authors describe a failure involving a controller or controller interface, the virtual disks that are accessed via the affected interfaces are re-mapped to another interface in order to continue to provide high data availability.
Abstract: A fibre channel storage area network (SAN) provides virtualized storage space for a number of servers to a number of virtual disks implemented on various virtual redundant array of inexpensive disks (RAID) devices striped across a plurality of physical disk drives. The SAN includes plural controllers and communication paths to allow for fail-safe and fail-over operation. The plural controllers can be loosely-coupled to provide n-way redundancy and have more than one independent channel for communicating with one another. In the event of a failure involving a controller or controller interface, the virtual disks that are accessed via the affected interfaces are re-mapped to another interface in order to continue to provide high data availability. In particular, a common memory storage device is connected to the back-ends of every controller to provide a storage area. In this manner, the common memory storage device can be accessed via operations similar to those a controller already uses to presently access the physical disks which are connected to the back-end of the controllers.
TL;DR: A case is made for applying RAID-like techniques used by disks and file systems, but at the cloud storage level, to allow customers to avoid vendor lock-in, reduce the cost of switching providers, and better tolerate provider outages or failures.
Abstract: The increasing popularity of cloud storage is leading organizations to consider moving data out of their own data centers and into the cloud. However, success for cloud storage providers can present a significant risk to customers; namely, it becomes very expensive to switch storage providers. In this paper, we make a case for applying RAID-like techniques used by disks and file systems, but at the cloud storage level. We argue that striping user data across multiple providers can allow customers to avoid vendor lock-in, reduce the cost of switching providers, and better tolerate provider outages or failures. We introduce RACS, a proxy that transparently spreads the storage load over many providers. We evaluate a prototype of our system and estimate the costs incurred and benefits reaped. Finally, we use trace-driven simulations to demonstrate how RACS can reduce the cost of switching storage vendors for a large organization such as the Internet Archive by seven-fold or more by varying erasure-coding parameters.