Variable selection for model-based clustering using the integrated complete-data likelihood
Marbac Matthieu,Sedki Mohammed +1 more
TL;DR: In this article, a new information criterion based on the integrated complete-data likelihood is proposed to perform the variable selection in Gaussian mixture models without requiring any parameter estimation, and parameter inference is needed only for the unique selected model.
read more
Abstract: Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty. However, the calibration of the penalty term can suffer from criticisms. Model selection methods are an efficient alternative, yet they require a difficult optimization of an information criterion which involves combinatorial problems. First, most of these optimization algorithms are based on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are often greedy because they need multiple calls of EM algorithms. Here we propose to use a new information criterion based on the integrated complete-data likelihood. It does not require any estimate and its maximization is simple and computationally efficient. The original contribution of our approach is to perform the model selection without requiring any parameter estimation. Then, parameter inference is needed only for the unique selected model. This approach is used for the variable selection of a Gaussian mixture model with conditional independence assumption. The numerical experiments on simulated and benchmark datasets show that the proposed method often outperforms two classical approaches for variable selection.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Phenomapping of patients with heart failure with preserved ejection fraction using machine learning-based unsupervised cluster analysis
Matthew W. Segar,Kershaw V. Patel,Colby Ayers,Mujeeb A. Basit,W.H. Wilson Tang,Duwayne L Willett,Jarett D. Berry,Justin L. Grodin,Ambarish Pandey +8 more
TL;DR: To identify distinct phenotypic subgroups in a highly‐dimensional, mixed‐data cohort of individuals with heart failure with preserved ejection fraction (HFpEF) using unsupervised clustering analysis.
233
A survey of feature selection methods for Gaussian mixture models and hidden Markov models
Stephen Adams,Peter A. Beling +1 more
TL;DR: A review of the literature on feature selection techniques specifically designed for Gaussian mixture models (GMMs) and hidden Markov models (HMMs), two common parametric latent variable models, concludes that further research into unsupervised feature selection methods for HMMs is required and that established methods for GMMs could be adapted to HMMs.
63
Adaptive servo ventilation for sleep apnoea in heart failure: the FACE study 3-month data.
Renaud Tamisier,Thibaud Damy,Sébastien Bailly,Jean-Marc Davy,Johan Verbraecken,Florent Lavergne,Alain Palot,Frederic Goutorbe,Marie-Pia d'Ortho,Jean-Louis Pépin +9 more
TL;DR: The European, multicentre, prospective, observational cohort trial, FACE, evaluated the effects of adaptive servo ventilation (ASV) therapy on morbidity and mortality in patients with systolic heart failure (HF) who have a left ventricular ejection fraction below 45% and predominant central sleep apnoea (CSA) as mentioned in this paper.
32
Distance-based clustering challenges for unbiased benchmarking studies.
TL;DR: This work shows that Parameter optimization on datasets without distance-based clusters, Algorithm selection by unsupervised quality measures on biomedical data, and Benchmarking clustering algorithms with first-order statistics or box plots or a small number of trials are biased and often not recommended.
Development and validation of optimal phenomapping methods to estimate long-term atherosclerotic cardiovascular disease risk in patients with type 2 diabetes
Matthew W. Segar,Matthew W. Segar,Kershaw V. Patel,Kershaw V. Patel,Muthiah Vaduganathan,Melissa C. Caughey,Byron C. Jaeger,Mujeeb A. Basit,Duwayne L Willett,Javed Butler,Partho P. Sengupta,Thomas J. Wang,Darren K. McGuire,Darren K. McGuire,Ambarish Pandey,Ambarish Pandey +15 more
TL;DR: In this paper, the authors evaluated four phenomapping strategies and their ability to stratify CVD risk in individuals with type 2 diabetes and to identify subgroups who may benefit from specific therapies.
16
References
Estimating the Dimension of a Model
TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
Todd R. Golub,Todd R. Golub,Donna K. Slonim,Pablo Tamayo,Christine Huard,Michelle Gaasenbeek,Jill P. Mesirov,Hilary A. Coller,Mignon L. Loh,James R. Downing,Michael A. Caligiuri,Clara D. Bloomfield,Eric S. Lander +12 more
TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
•Posted Content
The Bayesian Choice: From Decision Theoretic Foundations to Computational Implementation
TL;DR: The winner of the 2004 DeGroot Prize, the authors, is a graduate-level textbook that introduces Bayesian statistics and decision theory, covering both the basic ideas of statistical theory, and also some of the more modern and advanced topics of bayesian statistics such as complete class theorems, the Stein effect, Bayesian model choice, hierarchical and empirical Bayes modeling, Monte Carlo integration including Gibbs sampling, and other MCMC techniques.
895
A framework for feature selection in clustering
Daniela Witten,Robert Tibshirani +1 more
TL;DR: A novel framework for sparse clustering is proposed, in which one clusters the observations using an adaptively chosen subset of the features, which uses a lasso-type penalty to select the features.
792