A computationally efficient modular optimal discovery procedure

doi:10.1093/BIOINFORMATICS/BTQ701

Open AccessJournal Article10.1093/BIOINFORMATICS/BTQ701

A computationally efficient modular optimal discovery procedure

Sangsoon Woo, +2 more

- 01 Feb 2011

- Bioinformatics

- Vol. 27, Iss: 4, pp 509-515

16

TL;DR: A new estimate of the optimal discovery procedure (ODP) called the modular ODP (mODP), which is relatively insensitive to the choice of the number of modules, but dramatically reduces the computational complexity from quadratic to linear in thenumber of genes.

Abstract: Motivation: It is well known that patterns of differential gene expression across biological conditions are often shared by many genes, particularly those within functional groups. Taking advantage of these patterns can lead to increased statistical power and biological clarity when testing for differential expression in a microarray experiment. The optimal discovery procedure (ODP), which maximizes the expected number of true positives for each fixed number of expected false positives, is a framework aimed at this goal. Storey et al. introduced an estimator of the ODP for identifying differentially expressed genes. However, their ODP estimator grows quadratically in computational time with respect to the number of genes. Reducing this computational burden is a key step in making the ODP practical for usage in a variety of high-throughput problems. Results: Here, we propose a new estimate of the ODP called the modular ODP (mODP). The existing ‘full ODP’ requires that the likelihood function for each gene be evaluated according to the parameter estimates for all genes. The mODP assigns genes to modules according to a Kullback–Leibler distance, and then evaluates the statistic only at the module-averaged parameter estimates. We show that the mODP is relatively insensitive to the choice of the number of modules, but dramatically reduces the computational complexity from quadratic to linear in the number of genes. We compare the full ODP algorithm and mODP on simulated data and gene expression data from a recent study of Morrocan Amazighs. The mODP and full ODP algorithm perform very similarly across a range of comparisons. Availability: The mODP methodology has been implemented into EDGE, a comprehensive gene expression analysis software package in R, available at http://genomine.org/edge/. Contact: jstorey@princeton.edu Supplementary information:Supplementary data are available at Bioinformatics online.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1038/NRG3244

Studying and modelling dynamic biological processes using time-series gene expression data

Ziv Bar-Joseph, +2 more

- 01 Aug 2012

- Nature Reviews Genetics

TL;DR: The basic patterns that have been observed in time-series experiments are discussed, how these patterns are combined to form expression programs, and the computational analysis, visualization and integration of these data to infer models of dynamic biological systems.

...read moreread less

551

•Journal Article•10.1128/MBIO.00338-12

Toward an Understanding of Changes in Diversity Associated with Fecal Microbiome Transplantation Based on 16S rRNA Gene Deep Sequencing

Dea Shahinas, +10 more

- 01 Nov 2012

- Mbio

TL;DR: The microbial diversity of pre- and posttransplant stool specimens from CDI patients is explored using deep sequencing of the 16S rRNA gene to explore the human microbial diversity in patients with Clostridium difficile infection (CDI) disease after FT.

...read moreread less

204

•Journal Article•10.1073/PNAS.1714813115

Mistimed food intake and sleep alters 24-hour time-of-day patterns of the human plasma proteome

Christopher M. Depner, +4 more

- 05 Jun 2018

- Proceedings of the National Academy of S...

TL;DR: The circadian clock, the behavioral wake–sleep/food intake–fasting cycle, and interactions between these processes regulate 24-h time-of-day patterns of human plasma proteins and help identify mechanisms of circadian misalignment that may contribute to metabolic dysregulation.

...read moreread less

97

Journal Article•10.1021/AC504472S

Tile-Based Fisher Ratio Analysis of Comprehensive Two-Dimensional Gas Chromatography Time-of-Flight Mass Spectrometry (GC × GC–TOFMS) Data Using a Null Distribution Approach

Brendon A. Parsons, +5 more

- 26 Mar 2015

- Analytical Chemistry

TL;DR: A study using tile-based F-ratio analysis whereby four non-native analytes were spiked into diesel fuel at several concentrations ranging from 0 to 100 ppm, finding spiked analytes at ∼1 to ∼10 ppm, depending upon the degree of mass spectral selectivity and 2D chromatographic resolution, with minimal occurrence of false positives.

...read moreread less

91

•Journal Article•10.1186/S12864-015-1475-7

The transcriptome of a complete episode of acute otitis media

Michelle L. Hernandez, +10 more

- 03 Apr 2015

- BMC Genomics

TL;DR: The results characterize the global gene response during otitis media and identify key signaling and transcription factor networks that control the defense of the middle ear against infection.

...read moreread less

52

...

Expand

References

•Journal Article•10.1214/AOMS/1177729694

On Information and Sufficiency

Solomon Kullback, +1 more

- 01 Mar 1951

- Annals of Mathematical Statistics

19.8K

•Journal Article•10.2202/1544-6115.1027

Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments

Gordon K. Smyth

- 12 Feb 2004

- Statistical Applications in Genetics and...

TL;DR: The hierarchical model of Lonnstedt and Speed (2002) is developed into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples and the moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom.

...read moreread less

12.9K

•Journal Article•10.1073/PNAS.091062498

Significance analysis of microarrays applied to the ionizing radiation response

Virginia Goss Tusher, +2 more

- 24 Apr 2001

- Proceedings of the National Academy of S...

TL;DR: A method that assigns a score to each gene on the basis of change in gene expression relative to the standard deviation of repeated measurements is described, suggesting that this repair pathway for UV-damaged DNA might play a previously unrecognized role in repairing DNA damaged by ionizing radiation.

...read moreread less

12.4K

•Journal Article•10.1073/PNAS.1530509100

Statistical significance for genomewide studies

John D. Storey, +1 more

- 05 Aug 2003

- Proceedings of the National Academy of S...

TL;DR: This work proposes an approach to measuring statistical significance in genomewide studies based on the concept of the false discovery rate, which offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted.

...read moreread less

10.4K

Journal Article•10.2202/1544-6115.1128

A General Framework for Weighted Gene Co-Expression Network Analysis

Bin Zhang, +1 more

- 12 Aug 2005

- Statistical Applications in Genetics and...

TL;DR: A general framework for `soft' thresholding that assigns a connection weight to each gene pair is described and several node connectivity measures are introduced and provided empirical evidence that they can be important for predicting the biological significance of a gene.

...read moreread less

5.7K