An alternative extension of the k-means algorithm for clustering categorical data

Open AccessJournal Article

An alternative extension of the k-means algorithm for clustering categorical data

Ohn Mar San, +2 more

- 01 Jan 2004

- International Journal of Applied Mathema...

- Vol. 14, Iss: 2, pp 241-247

192

TL;DR: This paper shows how to apply the notion of “cluster centers” on a dataset of categorical objects and how to use this notion for formulating the clustering problem of categorically objects as a partitioning problem.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1158/1078-0432.CCR-17-2484

Classifying colorectal cancer by tumor location rather than sidedness highlights a continuum in mutation profiles and consensus molecular subtypes

Jonathan M. Loree, +19 more

- 27 Nov 2017

- Clinical Cancer Research

TL;DR: Current right/left classifications may not fully recapitulate regional variations in tumor biology and the sigmoid-rectal region appears unique and the transverse colon is distinct from other right-sided locations.

...read moreread less

278

Journal Article•10.1016/J.PATCOG.2017.01.016

Subspace clustering guided unsupervised feature selection

Pengfei Zhu, +4 more

- 01 Jun 2017

- Pattern Recognition

TL;DR: Experimental results on benchmark datasets for unsupervised feature selection show that SCUFS outperforms the state-of-the-art UFS methods and can uncover the underlying multi-subspace structure of data.

...read moreread less

241

Proceedings Article•10.1109/ICCIMA.2007.127

Clustering Categorical Data Using Silhouette Coefficient as a Relocating Measure

S. Aranganayagi, +1 more

- 13 Dec 2007

TL;DR: Experimental results show that the proposed method to cluster categorical data is efficient and based on the minimum dissimilarity value objects are grouped into cluster using silhouette coefficient.

...read moreread less

183

Book Chapter•10.1007/978-981-15-1209-4_1

Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient

Duy-Tai Dinh, +2 more

- 29 Nov 2019

TL;DR: This paper proposes an algorithm named k-SCC to estimate the optimal k in categorical data clustering, which outperforms the compared algorithms in determining the number of clusters for each dataset.

...read moreread less

148

Journal Article•10.1109/TNNLS.2015.2451151

Space Structure and Clustering of Categorical Data

Yuhua Qian, +4 more

- 01 Oct 2016

- IEEE Transactions on Neural Networks

TL;DR: A novel data-representation scheme for the categorical data, which maps a set of categorical objects into a Euclidean space is designed and a general framework for space structure based categorical clustering algorithms (SBC) is designed.

...read moreread less

100

...

Expand

References

Journal Article•10.1016/S0306-4379(00)00022-3

ROCK: a robust clustering algorithm for categorical attributes

Sudipto Guha, +2 more

- 01 Jul 2000

- Information Systems

TL;DR: This paper develops a robust hierarchical clustering algorithm ROCK that employs links and not distances when merging clusters, and indicates that ROCK not only generates better quality clusters than traditional algorithms, but it also exhibits good scalability properties.

...read moreread less

1.9K

•Journal Article•10.1016/S0019-9958(69)90591-9

A new approach to clustering

Enrique H. Ruspini

- 01 Jul 1969

- Information & Computation

TL;DR: A new method of representation of the reduced data, based on the idea of “fuzzy sets,” is proposed to avoid some of the problems of current clustering procedures and to provide better insight into the structure of the original data.

...read moreread less

1.6K

Journal Article•10.1109/TPAMI.1984.4767478

K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality

Shokri Z. Selim, +1 more

- 01 Jan 1984

- IEEE Transactions on Pattern Analysis an...

TL;DR: It is shown that under certain conditions the K-means algorithm may fail to converge to a local minimum, and that it converges under differentiability conditions to a Kuhn-Tucker point.

...read moreread less

1.3K

Journal Article•10.1002/BS.3830120210

A clustering technique for summarizing multivariate data.

Geoffrey H. Ball, +1 more

- 01 Mar 1967

- Systems Research and Behavioral Science

TL;DR: A practical computing method termed ISODATA, which finds the cluster structure of such data, is described and provides a fit to the data of a set of cluster centers that tends to minimize the sum of the squared distances of each data point from its closest cluster center.

...read moreread less

960

Journal Article•10.1016/0031-3203(90)90087-2

A new approach to clustering

Roland Wilson, +1 more

- 01 Nov 1990

- Pattern Recognition

TL;DR: Estimation theory is used to derive a new approach to the clustering problem, a unification of centroid and mode estimation, achieved by considering the effect of spatial scale on the estimator, which is a multiresolution method which spans a range of spatial scales.

...read moreread less

548

...

Expand

An alternative extension of the k-means algorithm for clustering categorical data

Chat with Paper

AI Agents for this Paper

Citations

Classifying colorectal cancer by tumor location rather than sidedness highlights a continuum in mutation profiles and consensus molecular subtypes

Subspace clustering guided unsupervised feature selection

Clustering Categorical Data Using Silhouette Coefficient as a Relocating Measure

Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient

Space Structure and Clustering of Categorical Data

References

ROCK: a robust clustering algorithm for categorical attributes

A new approach to clustering

K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality

A clustering technique for summarizing multivariate data.

A new approach to clustering

Related Papers (5)

Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Some methods for classification and analysis of multivariate observations

Data clustering: a review

A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining.

ROCK: a robust clustering algorithm for categorical attributes