Missing value estimation for DNA microarray gene expression data: local least squares imputation
TL;DR: Imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process.
read more
Abstract: Motivation: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process.
Results: The proposed local least squares imputation method (LLSimpute) represents a target gene that has missing values as a linear combination of similar genes. The similar genes are chosen by k-nearest neighbors or k coherent genes that have large absolute values of Pearson correlation coefficients. Non-parametric missing values estimation method of LLSimpute are designed by introducing an automatic k-value estimator. In our experiments, the proposed LLSimpute method shows competitive results when compared with other imputation methods for missing value estimation on various datasets and percentages of missing values in the data.
Availability: The software is available at http://www.cs.umn.edu/~hskim/tools.html
Contact: hpark@cs.umn.edu
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling
Orly Alter,Patrick O. Brown,David Botstein +2 more
- 01 Mar 2001
TL;DR: Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.
1.9K
pcaMethods—a bioconductor package providing PCA methods for incomplete data
TL;DR: PcaMethods is a Bioconductor compliant library for computing principal component analysis (PCA) on incomplete data sets that can be analyzed directly or used to estimate missing values to enable the use of missing value sensitive statistical methods.
1.1K
•Book
Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)
Guojun Gan,Chaoqun Ma,Jianhong Wu +2 more
- 01 May 2007
Abstract: Preface Part I. Clustering, Data and Similarity Measures: 1. Data clustering 2. DataTypes 3. Scale conversion 4. Data standardization and transformation 5. Data visualization 6. Similarity and dissimilarity measures Part II. Clustering Algorithms: 7. Hierarchical clustering techniques 8. Fuzzy clustering algorithms 9. Center Based Clustering Algorithms 10. Search based clustering algorithms 11. Graph based clustering algorithms 12. Grid based clustering algorithms 13. Density based clustering algorithms 14. Model based clustering algorithms 15. Subspace clustering 16. Miscellaneous algorithms 17. Evaluation of clustering algorithms Part III. Applications of Clustering: 18. Clustering gene expression data Part IV. Matlab and C++ for Clustering: 19. Data clustering in Matlab 20. Clustering in C/C++ A. Some clustering algorithms B. Thekd-tree data structure C. Matlab Codes D. C++ Codes Subject index Author index.
908
Pattern classification with missing data: a review
TL;DR: The aim of this work is to analyze the missing data problem in pattern classification tasks, and to summarize and compare some of the well-known methods used for handling missing values.
A Survey of Evolutionary Algorithms for Clustering
Eduardo R. Hruschka,Ricardo J. G. B. Campello,Alex A. Freitas,A.C.P.L.F. de Carvalho +3 more
- 01 Mar 2009
TL;DR: An up-to-date overview that is fully devoted to evolutionary algorithms for clustering, is not limited to any particular kind of evolutionary approach, and comprises advanced topics like multiobjective and ensemble-based evolutionary clustering.
References
•Book
The Nature of Statistical Learning Theory
Vladimir Vapnik
- 01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
46K
Molecular portraits of human breast tumours
Charles M. Perou,Therese Sørlie,Michael B. Eisen,Matt van de Rijn,Stefanie S. Jeffrey,Christian A. Rees,Jonathan R. Pollack,Douglas T. Ross,Hilde Johnsen,Lars A. Akslen,Øystein Fluge,Alexander Pergamenschikov,Cheryl A. Williams,Shirley Zhu,Per Eystein Lønning,Anne Lise Børresen-Dale,Patrick O. Brown,David Botstein +17 more
TL;DR: Variation in gene expression patterns in a set of 65 surgical specimens of human breast tumours from 42 different individuals were characterized using complementary DNA microarrays representing 8,102 human genes, providing a distinctive molecular portrait of each tumour.
Information Theory and Statistical Mechanics. II
TL;DR: In this article, the authors consider statistical mechanics as a form of statistical inference rather than as a physical theory, and show that the usual computational rules, starting with the determination of the partition function, are an immediate consequence of the maximum-entropy principle.
14K
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
Todd R. Golub,Todd R. Golub,Donna K. Slonim,Pablo Tamayo,Christine Huard,Michelle Gaasenbeek,Jill P. Mesirov,Hilary A. Coller,Mignon L. Loh,James R. Downing,Michael A. Caligiuri,Clara D. Bloomfield,Eric S. Lander +12 more
TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Gene expression profiling predicts clinical outcome of breast cancer
Laura J. van't Veer,Hongyue Dai,Marc J. van de Vijver,Yudong D. He,Augustinus A. M. Hart,Mao Mao,Hans Peterse,Karin van der Kooy,Matthew J. Marton,Anke T. Witteveen,George J. Schreiber,Ron M. Kerkhoven,Christopher J. Roberts,Peter S. Linsley,René Bernards,Stephen H. Friend +15 more
TL;DR: DNA microarray analysis on primary breast tumours of 117 young patients is used and supervised classification is applied to identify a gene expression signature strongly predictive of a short interval to distant metastases (‘poor prognosis’ signature) in patients without tumour cells in local lymph nodes at diagnosis, providing a strategy to select patients who would benefit from adjuvant therapy.