A benchmark for statistical microarray data analysis that preserves actual biological and technical variance.
Benoit De Hertogh,Bertrand De Meulder,Fabrice Berger,Michael Pierre,Eric Bareke,Anthoula Gaigneaux,Eric Depiereux +6 more
TL;DR: A novel method ranks the probesets from a dataset composed of publicly-available biological microarray data and extracts subset matrices with precise information/noise ratios to determine the capability of different methods to better estimate variance for a given number of replicates.
read more
Abstract: Recent reanalysis of spike-in datasets underscored the need for new and more accurate benchmark datasets for statistical microarray analysis. We present here a fresh method using biologically-relevant data to evaluate the performance of statistical methods. Our novel method ranks the probesets from a dataset composed of publicly-available biological microarray data and extracts subset matrices with precise information/noise ratios. Our method can be used to determine the capability of different methods to better estimate variance for a given number of replicates. The mean-variance and mean-fold change relationships of the matrices revealed a closer approximation of biological reality. Performance analysis refined the results from benchmarks published previously. We show that the Shrinkage t test (close to Limma) was the best of the methods tested, except when two replicates were examined, where the Regularized t test and the Window t test performed slightly better. The R scripts used for the analysis are available at http://urbm-cluster.urbm.fundp.ac.be/~bdemeulder/
.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?
Nicholas J. Schurch,Pieta Schofield,Marek Gierlinski,Christian Cole,Alexander Sherstnev,Vijender Singh,Nicola Wrobel,Karim Gharbi,Gordon G. Simpson,Tom Owen-Hughes,Mark Blaxter,Geoffrey J. Barton +11 more
TL;DR: For future RNA-seq experiments, results suggest that at least six biological replicates should be used, rising to at least 12 when it is important to identify SDE genes for all fold changes, and if fewer than 12 replicates are used, a superior combination of true positive and false positive performances makes edgeR and DESeq2 the leading tools.
Evaluation of tools for differential gene expression analysis by RNA-seq on a 48 biological replicate experiment
Nicholas J. Schurch,Pieta Schofield,Marek Gierlinski,Christian Cole,Alexander Sherstnev,Vijender Singh,Nicola Wrobel,Karim Gharbi,Gordon G. Simpson,Gordon G. Simpson,Tom Owen-Hughes,Mark Blaxter,Geoffrey J. Barton +12 more
TL;DR: In this article, an RNA-seq experiment with 48 biological replicates in each of 2 conditions was performed to determine the number of biological replication required to identify the most effective statistical analysis tools for identifying differential gene expression (DGE).
A global approach to analysis and interpretation of metabolic data for plant natural product discovery
Manhoi Hur,Alexis Ann Campbell,Marcia Almeida-De-Macedo,Ling Li,Nick Ransom,Adarsh Jose,Matthew C. Crispin,Basil J. Nikolau,Eve Syrkin Wurtele +8 more
TL;DR: A public database repository for metabolomics, tools and approaches for statistical analysis of metabolomics data, and methods for integrating these datasets with transcriptomic data to create hypotheses concerning specialized metabolisms that generate the diversity in natural product chemistry are detailed.
153
Benchmarking quantitative label-free LC–MS data processing workflows using a complex spiked proteomic standard dataset
Claire Ramus,Agnès Hovasse,Marlène Marcellin,Marlène Marcellin,Anne-Marie Hesse,Emmanuelle Mouton-Barbosa,Emmanuelle Mouton-Barbosa,David Bouyssié,David Bouyssié,Sebastian Vaca,Christine Carapito,Karima Chaoui,Karima Chaoui,Christophe Bruley,Jérôme Garin,Sarah Cianférani,Myriam Ferro,Alain Van Dorssaeler,Odile Burlet-Schiltz,Odile Burlet-Schiltz,Christine Schaeffer,Yohann Couté,Anne Gonzalez de Peredo,Anne Gonzalez de Peredo +23 more
TL;DR: A controlled standard dataset was provided and used to evaluate the performances of several label-free bioinformatics tools in different workflows, for detection of variant proteins with different absolute expression levels and fold change values and for evaluation of downstream statistical methods.
83
References
Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments
TL;DR: The hierarchical model of Lonnstedt and Speed (2002) is developed into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples and the moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom.
Significance analysis of microarrays applied to the ionizing radiation response
TL;DR: A method that assigns a score to each gene on the basis of change in gene expression relative to the standard deviation of repeated measurements is described, suggesting that this repair pathway for UV-damaged DNA might play a previously unrecognized role in repairing DNA damaged by ionizing radiation.
A Multiple Comparison Procedure for Comparing Several Treatments with a Control
TL;DR: In this article, a multiple comparison procedure for comparing several treatments with a control is presented, which is based on the Multiple Comparison Procedure for Comparing Several Treatments with a Control (MCPC).
6K
The probable error of a mean
TL;DR: In this article, an experiment may be regarded as forming an individual of a population of experiments which might be performed under the same conditions, and a series of experiments is a sample drawn from this population.
3.9K
The Analysis of Variance
A. P. Dempster,Henry Scheffé +1 more
Abstract: Originally published in 1959, this classic volume has had a major impact on generations of statisticians. Newly issued in the Wiley Classics Series, the book examines the basic theory of analysis of variance by considering several different mathematical models. Part I looks at the theory of fixed-effects models with independent observations of equal variance, while Part II begins to explore the analysis of variance in the case of other models.
2.5K