Initializing k-means clustering algorithm using statistical information
TL;DR: This work proposes an enhancement to the initialization process of k-means, which depends on using statistical information from the data set to initialize the prototypes, and shows that the algorithm gives valid clusters, and that it decreases error and time.
read more
Abstract: K-means clustering algorithm is one of the best known algorithms used in clustering; nevertheless it has many disadvantages as it may converge to a local optimum, depending on its random initialization of prototypes. We will propose an enhancement to the initialization process of k-means, which depends on using statistical information from the data set to initialize the prototypes. We show that our algorithm gives valid clusters, and that it decreases error and time. General Terms Data Mining, Unsupervised Learning, Data Clustering.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster
Muhammad Ali Syakur,Bain Khusnul Khotimah,Eka Mala Sari Rochman,Budi Dwi Satoto +3 more
- 01 Apr 2018
TL;DR: Researchers will use a combination of K-Means method with elbow to improve efficient and effective k-means performance in processing large amounts of data.
1K
A machine learning approach to cluster destination image on Instagram
TL;DR: This study constructed a novel methodological framework by evaluating different machine learning models to group textual information based on pictorial content to uncover the destination image based on Instagram photographs.
81
The K -Means Algorithm Evolution
Joaquín Pérez-Ortega,Nelva Nely Almanza-Ortega,Andrea Vega-Villalobos,Rodolfo A. Pazos-Rangel,Crispín Zavala-Díaz,Alicia Martínez-Rebollar +5 more
- 03 Apr 2019
TL;DR: It is remarkable that some of the most successful algorithm variants were found and it is considered that the main improvements may inspire the development of new heuristics for K-means or other clustering algorithms.
K-Means Clustering With Natural Density Peaks for Discovering Arbitrary-Shaped Clusters.
TL;DR: Zhang et al. as mentioned in this paper proposed a novel K-means algorithm for identifying arbitrary-shaped clusters, called NDP-Kmeans, which defines neighbor-based distance between NDPs and takes advantage of the neighborbased distance to compute the graph distance between them.
29
References
Some methods for classification and analysis of multivariate observations
James B. MacQueen
- 01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
•Proceedings Article
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise
Martin Ester,Hans-Peter Kriegel,Jörg Sander,Xiaowei Xu +3 more
- 02 Aug 1996
TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
20.3K
•Proceedings Article
A density-based algorithm for discovering clusters in large spatial Databases with Noise
Martin Ester,Hans-Peter Kriegel,Jörg Sander,Xiaowei Xu +3 more
- 01 Jan 1996
TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
•Book
Vector Quantization and Signal Compression
Allen Gersho,Robert M. Gray +1 more
- 01 Jan 1991
TL;DR: The author explains the design and implementation of the Levinson-Durbin Algorithm, which automates the very labor-intensive and therefore time-heavy and expensive process of designing and implementing a Quantizer.
8K