A computationally efficient modular optimal discovery procedure
TL;DR: A new estimate of the optimal discovery procedure (ODP) called the modular ODP (mODP), which is relatively insensitive to the choice of the number of modules, but dramatically reduces the computational complexity from quadratic to linear in thenumber of genes.
read more
Abstract: Motivation: It is well known that patterns of differential gene expression across biological conditions are often shared by many genes, particularly those within functional groups. Taking advantage of these patterns can lead to increased statistical power and biological clarity when testing for differential expression in a microarray experiment. The optimal discovery procedure (ODP), which maximizes the expected number of true positives for each fixed number of expected false positives, is a framework aimed at this goal. Storey et al. introduced an estimator of the ODP for identifying differentially expressed genes. However, their ODP estimator grows quadratically in computational time with respect to the number of genes. Reducing this computational burden is a key step in making the ODP practical for usage in a variety of high-throughput problems.
Results: Here, we propose a new estimate of the ODP called the modular ODP (mODP). The existing ‘full ODP’ requires that the likelihood function for each gene be evaluated according to the parameter estimates for all genes. The mODP assigns genes to modules according to a Kullback–Leibler distance, and then evaluates the statistic only at the module-averaged parameter estimates. We show that the mODP is relatively insensitive to the choice of the number of modules, but dramatically reduces the computational complexity from quadratic to linear in the number of genes. We compare the full ODP algorithm and mODP on simulated data and gene expression data from a recent study of Morrocan Amazighs. The mODP and full ODP algorithm perform very similarly across a range of comparisons.
Availability: The mODP methodology has been implemented into EDGE, a comprehensive gene expression analysis software package in R, available at http://genomine.org/edge/.
Contact: jstorey@princeton.edu
Supplementary information:Supplementary data are available at Bioinformatics online.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Studying and modelling dynamic biological processes using time-series gene expression data
TL;DR: The basic patterns that have been observed in time-series experiments are discussed, how these patterns are combined to form expression programs, and the computational analysis, visualization and integration of these data to infer models of dynamic biological systems.
551
Toward an Understanding of Changes in Diversity Associated with Fecal Microbiome Transplantation Based on 16S rRNA Gene Deep Sequencing
Dea Shahinas,Michael S. Silverman,Taylor Sittler,Charles Y. Chiu,Peter T. Kim,Emma Allen-Vercoe,Scott Weese,Andrew Wong,Donald E. Low,Donald E. Low,Dylan R. Pillai +10 more
TL;DR: The microbial diversity of pre- and posttransplant stool specimens from CDI patients is explored using deep sequencing of the 16S rRNA gene to explore the human microbial diversity in patients with Clostridium difficile infection (CDI) disease after FT.
204
Mistimed food intake and sleep alters 24-hour time-of-day patterns of the human plasma proteome
Christopher M. Depner,Edward L. Melanson,Andrew W. McHill,Kenneth P. Wright,Kenneth P. Wright +4 more
TL;DR: The circadian clock, the behavioral wake–sleep/food intake–fasting cycle, and interactions between these processes regulate 24-h time-of-day patterns of human plasma proteins and help identify mechanisms of circadian misalignment that may contribute to metabolic dysregulation.
97
Tile-Based Fisher Ratio Analysis of Comprehensive Two-Dimensional Gas Chromatography Time-of-Flight Mass Spectrometry (GC × GC–TOFMS) Data Using a Null Distribution Approach
Brendon A. Parsons,Luke C. Marney,W. Christopher Siegler,Jamin C. Hoggard,Bob W. Wright,Robert E. Synovec +5 more
TL;DR: A study using tile-based F-ratio analysis whereby four non-native analytes were spiked into diesel fuel at several concentrations ranging from 0 to 100 ppm, finding spiked analytes at ∼1 to ∼10 ppm, depending upon the degree of mass spectral selectivity and 2D chromatographic resolution, with minimal occurrence of false positives.
91
The transcriptome of a complete episode of acute otitis media
Michelle L. Hernandez,Michelle L. Hernandez,Anke Leichtle,Anke Leichtle,Kwang Pak,Kwang Pak,Nicholas J. G. Webster,Nicholas J. G. Webster,Stephen I. Wasserman,Allen F. Ryan,Allen F. Ryan +10 more
TL;DR: The results characterize the global gene response during otitis media and identify key signaling and transcription factor networks that control the defense of the middle ear against infection.
52
References
Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments
TL;DR: The hierarchical model of Lonnstedt and Speed (2002) is developed into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples and the moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom.
Significance analysis of microarrays applied to the ionizing radiation response
TL;DR: A method that assigns a score to each gene on the basis of change in gene expression relative to the standard deviation of repeated measurements is described, suggesting that this repair pathway for UV-damaged DNA might play a previously unrecognized role in repairing DNA damaged by ionizing radiation.
Statistical significance for genomewide studies
John D. Storey,Robert Tibshirani +1 more
TL;DR: This work proposes an approach to measuring statistical significance in genomewide studies based on the concept of the false discovery rate, which offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted.
A General Framework for Weighted Gene Co-Expression Network Analysis
Bin Zhang,Steve Horvath +1 more
TL;DR: A general framework for `soft' thresholding that assigns a connection weight to each gene pair is described and several node connectivity measures are introduced and provided empirical evidence that they can be important for predicting the biological significance of a gene.
5.7K