Journal Article10.1007/S11047-014-9446-5
Bayesian versus data driven model selection for microarray data
5
TL;DR: The results show that, although in some cases Bayesian methods guarantee good results, they are not able to compete in terms of ability to predict the correct number of clusters in a dataset with the data-driven methods.
read more
Abstract: Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is a particular instance of the model selection problem, i.e., the identification of the correct number of clusters in a dataset. In what follows, for ease of reference, we refer to that instance still as model selection. It is an important part of any statistical analysis. The techniques used for solving it are mainly either Bayesian or data-driven, and are both based on internal knowledge. That is, they use information obtained by processing the input data. Although both techniques have been evaluated in the realm of microarray data analysis, their merits (relative to each other) has not been assessed. Here we will fill this gap in the literature by comparing three Bayesians versus several state of the art data-driven model selection methods. Our results show that, although in some cases Bayesian methods guarantee good results, they are not able to compete in terms of ability to predict the correct number of clusters in a dataset with the data-driven methods.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
PTP1B phosphatase as a novel target of oleuropein activity in MCF-7 breast cancer model.
Paulina Przychodzen,Alicja Kuban-Jankowska,Roksana Wyszkowska,Giampaolo Barone,Giosuè Lo Bosco,Fabrizio Lo Celso,Anna Kamm,Agnieszka Daca,Tomasz Kostrzewa,Magdalena Gorska-Ponikowska,Magdalena Gorska-Ponikowska +10 more
TL;DR: It is evidenced that the reduced activity of phosphatase PTP1B after treatment with oleuropein is strictly correlated with decreased MCF-7 cellular viability and cell cycle arrest.
19
ValWorkBench: An open source Java library for cluster validation, with applications to microarray data analysis
TL;DR: The main objective of this paper is to provide the interested researcher with the full software documentation of an open source cluster validation platform having the main features of being easily extendible in a homogeneous way and of offering software components that can be readily re-used.
7
Analysis of Feature Selection Algorithms and a Comparative study on Heterogeneous Classifier for High Dimensional Data survey
TL;DR: The application of best feature selection techniques to improve learning algorithm predictive accuracy in microarray dataset and KDD (Knowledge Discovery and Data Mining Tools Conference) Cup 99 dataset with respective classification and feature selection algorithms.
A Novel CCT5 Missense Variant Associated with Early Onset Motor Neuropathy.
Vincenzo Antona,Federica Scalia,Elisa Giorgio,Francesca Clementina Radio,Alfredo Brusco,Massimiliano Oliveri,Giovanni Corsello,Fabrizio Lo Celso,Fabrizio Lo Celso,Maria Vadalà,Everly Conway de Macario,Alberto J.L. Macario,Francesco Cappello,Mario Giuffrè +13 more
TL;DR: Noteworthy is the striking difference between the phenotypes putatively linked to mutations in the same CCT subunit but located in different structural domains, offering a unique opportunity for elucidating their distinctive roles in health and disease.
References
A new look at the statistical model identification
TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.
Estimating the Dimension of a Model
TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Estimating the dimension of a model
Gideon Schwarz
- 01 Jan 2005
TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
40.6K
•Book
The Elements of Statistical Learning
Trevor Hastie,Robert Tibshirani,Jerome H. Friedman +2 more
- 01 Jan 2001
29.4K
Data clustering: a review
TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.