Data cluster

Topic Tools

Papers published on a yearly basis

Papers

Book Chapter•10.1007/978-3-540-87479-9_3•

Data Clustering: 50 Years Beyond K-means

[...]

Anil K. Jain¹•Institutions (1)

Michigan State University¹

15 Sep 2008

TL;DR: Cluster analysis as mentioned in this paper is the formal study of algorithms and methods for grouping objects according to measured or perceived intrinsic characteristics, which is one of the most fundamental modes of understanding and learning.

...read moreread less

Abstract: The practice of classifying objects according to perceived similarities is the basis for much of science. Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms in to taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and methods for grouping objects according to measured or perceived intrinsic characteristics. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes cluster analysis (unsupervised learning) from discriminant analysis (supervised learning). The objective of cluster analysis is to simply find a convenient and valid organization of the data, not to establish rules for separating future data into categories.

...read moreread less

6,706 citations

Patent•

Method of and architecture for controlling system data with automatic wear leveling in a semiconductor non-volatile mass storage memory

[...]

Petro Estakhri, Mahmud Assar, Robert Alan Reid, Berhanu Iman

13 Sep 1995

TL;DR: In this article, a semiconductor nonvolatile mass storage memory is partitioned into user files and system files, each partition having a plurality of sectors, each cluster stores the system file for a single predetermined LBA.

...read moreread less

Abstract: A semiconductor non-volatile mass storage memory is partitioned into user files and system files. The system files partition is further subdivided into clusters, each cluster having a plurality of sectors. Each cluster stores the system file for a single predetermined LBA. As the information within the LBA is changed, the new information is written into an empty sector within the cluster. Once the cluster is filled, the system either erases for recycling the cluster or preferably locates an empty cluster and repeats the process with that new cluster. Once all the clusters are filled, all clusters containing old data are erased for recycling.

...read moreread less

248 citations

Proceedings Article•10.1145/232973.232991•

DCD --- Disk Caching Disk: A New Approach for Boosting I/O Performance

[...]

Yiming Hu¹, Qing Yang¹•Institutions (1)

University of Rhode Island¹

1 May 1996

TL;DR: This paper presents a novel disk storage architecture called DCD, Disk Caching Disk, for the purpose of optimizing I/O performance, which can be applied directly to current file systems without the need of changing the operating system.

...read moreread less

Abstract: This paper presents a novel disk storage architecture called DCD, Disk Caching Disk, for the purpose of optimizing I/O performance. The main idea of the DCD is to use a small log disk, referred to as cache-disk, as a secondary disk cache to optimize write performance. While the cache-disk and the normal data disk have the same physical properties, the access speed of the former differs dramatically from the latter because of different data units and different ways in which data are accessed. Our objective is to exploit this speed difference by using the log disk as a cache to build a reliable and smooth disk hierarchy. A small RAM buffer is used to collect small write requests to form a log which is transferred onto the cache-disk whenever the cache-disk is idle. Because of the temporal locality that exists in office/engineering work-load environments, the DCD system shows write performance close to the same size RAM (i.e. solid-state disk) for the cost of a disk. Moreover, the cache-disk can also be implemented as a logical disk in which case a small portion of the normal data disk is used as the log disk. Trace-driven simulation experiments are carried out to evaluate the performance of the proposed disk architecture. Under the office/engineering work-load environment, the DCD shows superb disk performance for writes as compared to existing disk systems. Performance improvements of up to two orders of magnitude are observed in terms of average response time for write operations. Furthermore, DCD is very reliable and works at the device or device driver level. As a result, it can be applied directly to current file systems without the need of changing the operating system.

...read moreread less

159 citations

Book Chapter•10.1007/978-3-642-37331-2_26•

Adaptive unsupervised multi-view feature selection for visual concept recognition

[...]

Yinfu Feng¹, Jun Xiao¹, Yueting Zhuang¹, Xiaoming Liu²•Institutions (2)

Zhejiang University¹, Michigan State University²

5 Nov 2012

TL;DR: An unsupervised learning method called Adaptive Unsupervised Multi-view Feature Selection (AUMFS), which combines data cluster labels prediction and adaptive multi-view visual similar graph learning into a unified framework to solve the objective function of AUMFS.

...read moreread less

Abstract: To reveal and leverage the correlated and complemental information between different views, a great amount of multi-view learning algorithms have been proposed in recent years. However, unsupervised feature selection in multi-view learning is still a challenge due to lack of data labels that could be utilized to select the discriminative features. Moreover, most of the traditional feature selection methods are developed for the single-view data, and are not directly applicable to the multi-view data. Therefore, we propose an unsupervised learning method called Adaptive Unsupervised Multi-view Feature Selection (AUMFS) in this paper. AUMFS attempts to jointly utilize three kinds of vital information, i.e., data cluster structure, data similarity and the correlations between different views, contained in the original data together for feature selection. To achieve this goal, a robust sparse regression model with the l2,1-norm penalty is introduced to predict data cluster labels, and at the same time, multiple view-dependent visual similar graphs are constructed to flexibly model the visual similarity in each view. Then, AUMFS integrates data cluster labels prediction and adaptive multi-view visual similar graph learning into a unified framework. To solve the objective function of AUMFS, a simple yet efficient iterative method is proposed. We apply AUMFS to three visual concept recognition applications (i.e., social image concept recognition, object recognition and video-based human action recognition) on four benchmark datasets. Experimental results show the proposed method significantly outperforms several state-of-the-art feature selection methods. More importantly, our method is not very sensitive to the parameters and the optimization method converges very fast.

...read moreread less

131 citations

Patent•

HMC: A hybrid mirror-and-chained data replication method to support high data availability for disk arrays

[...]

Ming-Syan Chen¹, Hui-I Hsiao¹, Chung-Shen Li¹, Philip S. Yu¹•Institutions (1)

IBM¹

18 Aug 1994

TL;DR: In this paper, the authors propose a method of distributing a set of data among a plurality of disks, which provides for load balancing in the event of a disk failure, in which data block accesses to the failed disk are redirected to a disk in the other cluster having a copy of the data block and further access to the disks that remain operational are rebalanced.

...read moreread less

Abstract: A method of distributing a set of data among a plurality of disks, which provides for load balancing in the event of a disk failure. In accordance with the method the total number of the disks in an array are divided into a number of clusters. The blocks of data are then stored in each cluster such that each cluster contains a complete set of the data and such that data block placement in each cluster is a unique permutation of the data block placement in the other clusters. In the event of a disk failure, data block accesses to the failed disk are redirected to a disk in the other cluster having a copy of the data block and further access to the disks that remain operational are rebalanced.

...read moreread less

129 citations

...

Expand

Year	Papers
2021	16
2020	21
2019	35
2018	21
2017	17
2016	12

Topic Tools

Papers published on a yearly basis

Papers

Data Clustering: 50 Years Beyond K-means

Method of and architecture for controlling system data with automatic wear leveling in a semiconductor non-volatile mass storage memory

DCD --- Disk Caching Disk: A New Approach for Boosting I/O Performance

Adaptive unsupervised multi-view feature selection for visual concept recognition

HMC: A hybrid mirror-and-chained data replication method to support high data availability for disk arrays

Related Topics (5)

Performance Metrics