About: Data cluster is a research topic. Over the lifetime, 282 publications have been published within this topic receiving 6856 citations. The topic is also known as: allocation unit.
TL;DR: Cluster analysis as mentioned in this paper is the formal study of algorithms and methods for grouping objects according to measured or perceived intrinsic characteristics, which is one of the most fundamental modes of understanding and learning.
Abstract: The practice of classifying objects according to perceived similarities is the basis for much of science. Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms in to taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and methods for grouping objects according to measured or perceived intrinsic characteristics. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes cluster analysis (unsupervised learning) from discriminant analysis (supervised learning). The objective of cluster analysis is to simply find a convenient and valid organization of the data, not to establish rules for separating future data into categories.
TL;DR: In this article, a semiconductor nonvolatile mass storage memory is partitioned into user files and system files, each partition having a plurality of sectors, each cluster stores the system file for a single predetermined LBA.
Abstract: A semiconductor non-volatile mass storage memory is partitioned into user files and system files. The system files partition is further subdivided into clusters, each cluster having a plurality of sectors. Each cluster stores the system file for a single predetermined LBA. As the information within the LBA is changed, the new information is written into an empty sector within the cluster. Once the cluster is filled, the system either erases for recycling the cluster or preferably locates an empty cluster and repeats the process with that new cluster. Once all the clusters are filled, all clusters containing old data are erased for recycling.
TL;DR: This paper presents a novel disk storage architecture called DCD, Disk Caching Disk, for the purpose of optimizing I/O performance, which can be applied directly to current file systems without the need of changing the operating system.
Abstract: This paper presents a novel disk storage architecture called DCD, Disk Caching Disk, for the purpose of optimizing I/O performance. The main idea of the DCD is to use a small log disk, referred to as cache-disk, as a secondary disk cache to optimize write performance. While the cache-disk and the normal data disk have the same physical properties, the access speed of the former differs dramatically from the latter because of different data units and different ways in which data are accessed. Our objective is to exploit this speed difference by using the log disk as a cache to build a reliable and smooth disk hierarchy. A small RAM buffer is used to collect small write requests to form a log which is transferred onto the cache-disk whenever the cache-disk is idle. Because of the temporal locality that exists in office/engineering work-load environments, the DCD system shows write performance close to the same size RAM (i.e. solid-state disk) for the cost of a disk. Moreover, the cache-disk can also be implemented as a logical disk in which case a small portion of the normal data disk is used as the log disk. Trace-driven simulation experiments are carried out to evaluate the performance of the proposed disk architecture. Under the office/engineering work-load environment, the DCD shows superb disk performance for writes as compared to existing disk systems. Performance improvements of up to two orders of magnitude are observed in terms of average response time for write operations. Furthermore, DCD is very reliable and works at the device or device driver level. As a result, it can be applied directly to current file systems without the need of changing the operating system.
TL;DR: An unsupervised learning method called Adaptive Unsupervised Multi-view Feature Selection (AUMFS), which combines data cluster labels prediction and adaptive multi-view visual similar graph learning into a unified framework to solve the objective function of AUMFS.
Abstract: To reveal and leverage the correlated and complemental information between different views, a great amount of multi-view learning algorithms have been proposed in recent years. However, unsupervised feature selection in multi-view learning is still a challenge due to lack of data labels that could be utilized to select the discriminative features. Moreover, most of the traditional feature selection methods are developed for the single-view data, and are not directly applicable to the multi-view data. Therefore, we propose an unsupervised learning method called Adaptive Unsupervised Multi-view Feature Selection (AUMFS) in this paper. AUMFS attempts to jointly utilize three kinds of vital information, i.e., data cluster structure, data similarity and the correlations between different views, contained in the original data together for feature selection. To achieve this goal, a robust sparse regression model with the l2,1-norm penalty is introduced to predict data cluster labels, and at the same time, multiple view-dependent visual similar graphs are constructed to flexibly model the visual similarity in each view. Then, AUMFS integrates data cluster labels prediction and adaptive multi-view visual similar graph learning into a unified framework. To solve the objective function of AUMFS, a simple yet efficient iterative method is proposed. We apply AUMFS to three visual concept recognition applications (i.e., social image concept recognition, object recognition and video-based human action recognition) on four benchmark datasets. Experimental results show the proposed method significantly outperforms several state-of-the-art feature selection methods. More importantly, our method is not very sensitive to the parameters and the optimization method converges very fast.
TL;DR: In this paper, the authors propose a method of distributing a set of data among a plurality of disks, which provides for load balancing in the event of a disk failure, in which data block accesses to the failed disk are redirected to a disk in the other cluster having a copy of the data block and further access to the disks that remain operational are rebalanced.
Abstract: A method of distributing a set of data among a plurality of disks, which provides for load balancing in the event of a disk failure. In accordance with the method the total number of the disks in an array are divided into a number of clusters. The blocks of data are then stored in each cluster such that each cluster contains a complete set of the data and such that data block placement in each cluster is a unique permutation of the data block placement in the other clusters. In the event of a disk failure, data block accesses to the failed disk are redirected to a disk in the other cluster having a copy of the data block and further access to the disks that remain operational are rebalanced.