Open Access
An expectation-maximization algorithm working on data summary
Huidong Jin,Kwong-Sak Leung,Man Leung Wong +2 more
- 01 Jan 2002
- pp 221-226
TL;DR: The proposed EMACF (Expectation-Maximization Algorithm on Clustering Features) algorithm employs data summary features including weight, mean, and variance explicitly and it is proved that EMacF converges to a local maximum likelihood value.
read more
Abstract: Scalable cluster analysis addresses the problem of processing large data sets with limited resources, e.g., memory and computation time. A data summarization or sampling procedure is an essential step of most scalable algorithms. It forms a compact representation of the data. Based on it, traditional clustering algorithms can process large data sets efficiently. However, there is little work on how to effectively perform cluster analysis on data summaries. From the principle of the general expectation-maximization algorithm, we propose a model-based clustering algorithm to make better use of these data summaries in this paper. The proposed EMACF (Expectation-Maximization Algorithm on Clustering Features) algorithm employs data summary features including weight, mean, and variance explicitly. We prove that EMACF converges to a local maximum likelihood value. The computation time of EMACF is linear with the number of data summaries instead of the number of data items, and thus can be integrated with any efficient data summarization procedure to construct a scalable clustering algorithm.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Scaling-Up Model-Based Clustering Algorithm by Working on Clustering Features
Huidong Jin,Kwong-Sak Leung,Man Leung Wong +2 more
- 12 Aug 2002
TL;DR: The experimental results show that gEMACF can generate more accurate results than other scalable clustering algorithms and can run two order of magnitude faster than the traditional expectation-maximization algorithm with little loss of accuracy.
References
•Book
The EM algorithm and extensions
Geoffrey J. McLachlan,Thriyambakam Krishnan +1 more
- 15 Nov 1996
TL;DR: The EM Algorithm and Extensions describes the formulation of the EM algorithm, details its methodology, discusses its implementation, and illustrates applications in many statistical contexts, opening the door to the tremendous potential of this remarkably versatile statistical tool.
BIRCH: an efficient data clustering method for very large databases
Tian Zhang,Raghu Ramakrishnan,Miron Livny +2 more
- 01 Jun 1996
TL;DR: Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) as discussed by the authors is a data clustering method that is especially suitable for very large databases.
Accelerating EM for Large Databases
TL;DR: Two approaches are presented that significantly reduce the computational cost of applying the EM algorithm to databases with a large number of cases, including databases with large dimensionality.
Visualization of navigation patterns on a Web site using model-based clustering
Igor V. Cadez,David Heckerman,Christopher Meek,Padhraic Smyth,Steven White +4 more
- 01 Aug 2000
TL;DR: A new methodology for visualizing navigation patterns on a Web site that clusters users according to the order in which they request Web pages using a mixture of rst-order Markov models using the ExpectationMaximization algorithm.
Related Papers (5)
Xiaowei Gu,Plamen Angelov +1 more
- 09 Oct 2016
[...]
Maria-Florina Balcan,Travis Dick,Manuel Lang +2 more
- 30 Apr 2020