Missing value imputation for epistatic MAPs
TL;DR: It is shown that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions, and therefore the use of symmetric nearest neighbor based approaches as they offer consistently accurate imputations across multiple datasets in a tractable manner is suggested.
read more
Abstract: Background
Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A nifty collaborative analysis to predicting a novel tool (DRFLLS) for missing values estimation
Samaher Al-Janabi,Ayad F. Alkaim +1 more
- 01 Jan 2020
TL;DR: This paper, attempting to search the capability of building a novel tool to estimate missing values of various datasets called developed random forest and local least squares (DRFLLS), finds the optimal number of neighborhoods of missing values is associated with the highest value of PC and a smaller value of NRMSE.
157
Missing value imputation affects the performance of machine learning: A review and analysis of the literature (2010–2021)
Md. Kamrul Hasan,Md. Ashraful Alam,Shidhartho Roy,Aishwariya Dutta,Md. Tasnim Jawad,Sunanda Das +5 more
TL;DR: In this article, the authors conduct a rigorous review and analysis of the state-of-the-art Missing Value Imputation (MVI) methods in the literature published in the last decade and select 191 articles for review using the well-known Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) technique.
89
On protocols and measures for the validation of supervised methods for the inference of biological networks
TL;DR: A critical review and assessment of protocols and measures proposed in the literature are performed and specific guidelines how to best exploit and evaluate machine learning techniques for network inference are derived.
•Journal Article
The Advantage of Careful Imputation Sources in Sparse Data-Environment of Recommender Systems: Generating Improved SVD-based Recommendations
TL;DR: The Advantage of Careful Imputation Sources in Sparse Data-Environment of Recommender Systems: Generating Improved SVD-based Recommendations and how to choose the best sources for these sources.
52
Quantitative maps of genetic interactions in yeast - Comparative evaluation and integrative analysis
TL;DR: Even if the correlation between the currently available quantitative genetic interaction maps in yeast is relatively low, their comparability can be improved by means of the computational matrix approximation procedure, which will enable integrative analysis and detection of a wider spectrum of genetic interactions using data from the complementary screening approaches.
References
Gene Ontology: tool for the unification of biology
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Cluster analysis and display of genome-wide expression patterns
TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Missing value estimation methods for DNA microarrays.
Olga G. Troyanskaya,Michael N. Cantor,Gavin Sherlock,Patrick O. Brown,Trevor Hastie,Robert Tibshirani,David Botstein,Russ B. Altman +7 more
TL;DR: It is shown that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVD Impute and KNN Impute surpass the commonly used row average method (as well as filling missing values with zeros).
Java Treeview---extensible visualization of microarray data
TL;DR: Java Treeview as mentioned in this paper is an open-source, cross-platform rewrite that handles very large datasets well, and supports extensions to the file format that allow the results of additional analysis to be visualized and compared.
Systematic Genetic Analysis with Ordered Arrays of Yeast Deletion Mutants
Amy Hin Yan Tong,Marie Evangelista,Ainslie B. Parsons,Hong Xu,Gary D. Bader,Gary D. Bader,Nicholas Pagé,Mark D. Robinson,Sasan Raghibizadeh,Christopher W. V. Hogue,Christopher W. V. Hogue,Howard Bussey,Brenda J. Andrews,Mike Tyers,Mike Tyers,Charles Boone,Charles Boone +16 more
TL;DR: A method for systematic construction of double mutants, termed synthetic genetic array (SGA) analysis, in which a query mutation is crossed to an array of ∼4700 deletion mutants is developed, which should produce a global map of gene function.
2.3K
Related Papers (5)
Amy Hin Yan Tong,Guillaume Lesage,Gary D. Bader,Huiming Ding,Hong Xu,Xiaofeng Xin,James D. Young,Gabriel F. Berriz,Renee L. Brost,Michael Chang,Yiqun Chen,Xin Cheng,Gordon Chua,Helena Friesen,Debra S. Goldberg,Jennifer Haynes,Christine Humphries,Grace He,Shamiza Hussein,Lizhu Ke,Nevan J. Krogan,Zhijian Li,Joshua N. Levinson,Hong Lu,Patrice Menard,Christella Munyana,Ainslie B. Parsons,Owen Ryan,Raffi Tonikian,Tania Michelle Roberts,Anne-Marie Sdicu,Jesse Shapiro,Bilal N. Sheikh,Bernhard Suter,Sharyl L. Wong,Lan V. Zhang,Hongwei Zhu,Christopher G. Burd,Sean Munro,Chris Sander,Jasper Rine,Jack Greenblatt,Matthias Peter,Anthony Bretscher,Graham Bell,Frederick P. Roth,Grant W. Brown,Brenda J. Andrews,Howard Bussey,Charles Boone +49 more