Open AccessJournal Article
An alternative extension of the k-means algorithm for clustering categorical data
TL;DR: This paper shows how to apply the notion of “cluster centers” on a dataset of categorical objects and how to use this notion for formulating the clustering problem of categorically objects as a partitioning problem.
read more
Abstract: Most of the earlier work on clustering has mainly been focused on numerical data whose inherent geometric properties can be exploited to naturally define distance functions between data points. Recently, the problem of clustering categorical data has started drawing interest. However, the computational cost makes most of the previous algorithms unacceptable for clustering very large databases. The k-means algorithm is well known for its efficiency in this respect. At the same time, working only on numerical data prohibits them from being used for clustering categorical data. The main contribution of this paper is to show how to apply the notion of “cluster centers” on a dataset of categorical objects and how to use this notion for formulating the clustering problem of categorical objects as a partitioning problem. Finally, a k-means-like algorithm for clustering categorical data is introduced. The clustering performance of the algorithm is demonstrated with two well-known data sets, namely, soybean diseaseand nursery databases.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Classifying colorectal cancer by tumor location rather than sidedness highlights a continuum in mutation profiles and consensus molecular subtypes
Jonathan M. Loree,Allan Al Pereira,Michael Lam,Alexandra N. Willauer,Kanwal Pratap Singh Raghav,Arvind Dasari,Van K. Morris,Shailesh Advani,David G. Menter,Cathy Eng,Kenna R. Mills Shaw,Russell Broaddus,Mark J. Routbort,Yusha Liu,Jeffrey S. Morris,Rajyalakshmi Luthra,Funda Meric-Bernstam,Michael J. Overman,Dipen M. Maru,Scott Kopetz +19 more
TL;DR: Current right/left classifications may not fully recapitulate regional variations in tumor biology and the sigmoid-rectal region appears unique and the transverse colon is distinct from other right-sided locations.
Subspace clustering guided unsupervised feature selection
TL;DR: Experimental results on benchmark datasets for unsupervised feature selection show that SCUFS outperforms the state-of-the-art UFS methods and can uncover the underlying multi-subspace structure of data.
241
Clustering Categorical Data Using Silhouette Coefficient as a Relocating Measure
S. Aranganayagi,K. Thangavel +1 more
- 13 Dec 2007
TL;DR: Experimental results show that the proposed method to cluster categorical data is efficient and based on the minimum dissimilarity value objects are grouped into cluster using silhouette coefficient.
183
Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient
Duy-Tai Dinh,Tsutomu Fujinami,Van-Nam Huynh +2 more
- 29 Nov 2019
TL;DR: This paper proposes an algorithm named k-SCC to estimate the optimal k in categorical data clustering, which outperforms the compared algorithms in determining the number of clusters for each dataset.
148
Space Structure and Clustering of Categorical Data
TL;DR: A novel data-representation scheme for the categorical data, which maps a set of categorical objects into a Euclidean space is designed and a general framework for space structure based categorical clustering algorithms (SBC) is designed.
100
References
ROCK: a robust clustering algorithm for categorical attributes
TL;DR: This paper develops a robust hierarchical clustering algorithm ROCK that employs links and not distances when merging clusters, and indicates that ROCK not only generates better quality clusters than traditional algorithms, but it also exhibits good scalability properties.
1.9K
A new approach to clustering
TL;DR: A new method of representation of the reduced data, based on the idea of “fuzzy sets,” is proposed to avoid some of the problems of current clustering procedures and to provide better insight into the structure of the original data.
1.6K
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality
TL;DR: It is shown that under certain conditions the K-means algorithm may fail to converge to a local minimum, and that it converges under differentiability conditions to a Kuhn-Tucker point.
1.3K
A clustering technique for summarizing multivariate data.
Geoffrey H. Ball,David J. Hall +1 more
TL;DR: A practical computing method termed ISODATA, which finds the cluster structure of such data, is described and provides a fit to the data of a set of cluster centers that tends to minimize the sum of the squared distances of each data point from its closest cluster center.
960
A new approach to clustering
Roland Wilson,Michael Spann +1 more
TL;DR: Estimation theory is used to derive a new approach to the clustering problem, a unification of centroid and mode estimation, achieved by considering the effect of spatial scale on the estimator, which is a multiresolution method which spans a range of spatial scales.
548