Consensus clustering

Topic Tools

Papers published on a yearly basis

1 / 2

Papers

Journal Article•10.1145/331499.331504•

Data clustering: a review

[...]

Anil K. Jain¹, M. N. Murty², Patrick J. Flynn³•Institutions (3)

Michigan State University¹, Indian Institute of Science², Ohio State University³

01 Sep 1999-ACM Computing Surveys

TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.

...read moreread less

Abstract: Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

...read moreread less

15,162 citations

Journal Article•10.1016/J.PATREC.2009.09.011•

Data clustering: 50 years beyond K-means

[...]

Anil K. Jain¹•Institutions (1)

Michigan State University¹

1 Jun 2010

TL;DR: A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.

...read moreread less

Abstract: Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into a system of ranked taxa: domain, kingdom, phylum, class, etc. Cluster analysis is the formal study of methods and algorithms for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is to find structure in data and is therefore exploratory in nature. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty in designing a general purpose clustering algorithm and the ill-posed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection during data clustering, and large scale data clustering.

...read moreread less

8,460 citations

Journal Article•10.1007/BF02289588•

Hierarchical clustering schemes

[...]

S. C. Johnson¹•Institutions (1)

Bell Labs¹

01 Sep 1967-Psychometrika

TL;DR: A useful correspondence is developed between any hierarchical system of such clusters, and a particular type of distance measure, that gives rise to two methods of clustering that are computationally rapid and invariant under monotonic transformations of the data.

...read moreread less

Abstract: Techniques for partitioning objects into optimally homogeneous groups on the basis of empirical measures of similarity among those objects have received increasing attention in several different fields. This paper develops a useful correspondence between any hierarchical system of such clusters, and a particular type of distance measure. The correspondence gives rise to two methods of clustering that are computationally rapid and invariant under monotonic transformations of the data. In an explicitly defined sense, one method forms clusters that are optimally “connected,” while the other forms clusters that are optimally “compact.”

...read moreread less

5,113 citations

Journal Article•10.1093/BIOINFORMATICS/BTQ170•

ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking

[...]

Matthew D. Wilkerson¹, D. Neil Hayes¹•Institutions (1)

University of North Carolina at Chapel Hill¹

15 Jun 2010-Bioinformatics

TL;DR: The consensus clustering (CC) method provides quantitative and visual stability evidence for estimating the number of unsupervised classes in a dataset and ConsensusClusterPlus implements the CC method in R and extends it with new functionality and visualizations including item tracking, item- Consensus and cluster-consensus plots.

...read moreread less

Abstract: Summary: Unsupervised class discovery is a highly useful technique in cancer research, where intrinsic groups sharing biological characteristics may exist but are unknown. The consensus clustering (CC) method provides quantitative and visual stability evidence for estimating the number of unsupervised classes in a dataset. ConsensusClusterPlus implements the CC method in R and extends it with new functionality and visualizations including item tracking, item-consensus and cluster-consensus plots. These new features provide users with detailed information that enable more specific decisions in unsupervised class discovery. Availability: ConsensusClusterPlus is open source software, written in R, under GPL-2, and available through the Bioconductor project (http://www.bioconductor.org/). Contact: ude.cnu.dem@srekliwm Supplementary Information: Supplementary data are available at Bioinformatics online.

...read moreread less

5,038 citations

Journal Article•10.1162/153244303321897735•

Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

[...]

Alexander Strehl¹, Joydeep Ghosh¹•Institutions (1)

University of Texas at Austin¹

01 Mar 2003-Journal of Machine Learning Research

TL;DR: This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings and proposes three effective and efficient techniques for obtaining high-quality combiners (consensus functions).

...read moreread less

Abstract: This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application scenarios for the resultant 'knowledge reuse' framework that we call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information. In addition to a direct maximization approach, we propose three effective and efficient techniques for obtaining high-quality combiners (consensus functions). The first combiner induces a similarity measure from the partitionings and then reclusters the objects. The second combiner is based on hypergraph partitioning. The third one collapses groups of clusters into meta-clusters which then compete for each object to determine the combined clustering. Due to the low computational costs of our techniques, it is quite feasible to use a supra-consensus function that evaluates all three approaches against the objective function and picks the best solution for a given situation. We evaluate the effectiveness of cluster ensembles in three qualitatively different application scenarios: (i) where the original clusters were formed based on non-identical sets of features, (ii) where the original clustering algorithms worked on non-identical sets of objects, and (iii) where a common data-set is used and the main purpose of combining multiple clusterings is to improve the quality and robustness of the solution. Promising results are obtained in all three situations for synthetic as well as real data-sets.

...read moreread less

4,938 citations

...

Expand

Year	Papers
2025	3
2024	19
2023	50
2022	65
2021	53
2020	61

Topic Tools

Papers published on a yearly basis

Papers

Data clustering: a review

Data clustering: 50 years beyond K-means

Hierarchical clustering schemes

ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking

Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

Related Topics (5)

Performance Metrics