Background rareness-based iterative multiple sequence alignment algorithm for regulatory element detection.

doi:10.1093/BIOINFORMATICS/BTG266

Open AccessJournal Article10.1093/BIOINFORMATICS/BTG266

Background rareness-based iterative multiple sequence alignment algorithm for regulatory element detection.

Chandrasegaran Narasimhan, +2 more

- 12 Oct 2003

- Bioinformatics

- Vol. 19, Iss: 15, pp 1952-1963

22

TL;DR: This work presents a new deterministic iterative algorithm for regulatory element detection based on a Markov chain background that alleviates the drawbacks of MAP (maximum a posteriori log likelihood) scores.

Abstract: Motivation: Experimental methods capable of generating sets of co-regulated genes have become commonplace, however, recognizing the regulatory motifs responsible for this regulation remains difficult. As a result, computational detection of transcription factor binding sites in such data sets has been an active area of research. Most approaches have utilized either Gibbs sampling or greedy strategies to identify such elements in sets of sequences. These existing methods have varying degrees of success depending on the strength and length of the signals and the number of available sequences. We present a new deterministic iterative algorithm for regulatory element detection based on a Markov chain background. As in other methods, sequences in the entire genome and the training set are taken into account in order to discriminate against commonly occurring signals and produce patterns, which are significant in the training set. Results: The results of the algorithm compare favorably with existing tools on previously known and newly compiled data sets. The iteration based search appears rather rigorous, not only finding the binding sites, but also showing how the binding site stands out from genomic background. The approach used to score the results is critical and a discussion of various scoring schemes and options is also presented. Benchmarking of several methods shows that while most tools are good at detecting strong signals, Gibbs sampling algorithms give inconsistent results when the regulatory element signal becomes weak. A Markov chain based background model alleviates the drawbacks of MAP (maximum a posteriori log likelihood) scores. Availability: Available on request from the authors. Contact: uberbacherec@ornl.gov Supplementary information: Data and the results presented in this paper are available on the web at http://compbio.ornl.gov/ mira/index.html

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1093/NAR/GKH465

Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes

Giulio Pavesi, +3 more

- 01 Jul 2004

- Nucleic Acids Research

TL;DR: Weeder Web is a web interface to Weeder, an algorithm for the automatic discovery of conserved motifs in a set of related regulatory DNA sequences, likely to be instances of binding sites for some transcription factor.

...read moreread less

551

•Journal Article•10.1186/GB-2009-10-10-R108

Genomic DNA k-mer spectra: models and modalities

Benny Chor, +4 more

- 08 Oct 2009

- Genome Biology

TL;DR: Multimodal spectra are characterized by specific ranges of values of C+G content and of CpG dinucleotide suppression, a range that encompasses all tetrapods analyzed, and are found to capture low-order Markov models fairly well.

...read moreread less

252

•Journal Article•10.1093/BIB/BBS016

Motif discovery and transcription factor binding sites before and after the next-generation sequencing era

Federico Zambelli, +2 more

- 01 Mar 2013

- Briefings in Bioinformatics

TL;DR: ChIP, applied to transcription factors and coupled with genome tiling arrays or next-generation sequencing technologies (ChIP-Seq) has opened new avenues in research, as well as posed new challenges to bioinformaticians developing algorithms and methods for motif discovery.

...read moreread less

165

•Journal Article•10.1093/BIB/5.3.217

In silico representation and discovery of transcription factor binding sites

Giulio Pavesi, +2 more

- 01 Sep 2004

- Briefings in Bioinformatics

TL;DR: A survey of existing methods proposed for the identification of transcription factor binding sites in the regulatory regions of co-expressed genes, focusing both on the ideas underlying them and their availability to the scientific community is provided.

...read moreread less

107

•Journal Article•10.1093/NAR/GKL285

MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes

Giulio Pavesi, +5 more

- 01 Jul 2006

- Nucleic Acids Research

TL;DR: The MoD (MOtif Discovery) Tools web server comprises a set of tools for the discovery of novel conserved sequence and structure motifs in nucleotide sequences, motifs that in turn are good candidates for regulatory activity.

...read moreread less

87

...

Expand

References

•Journal Article•10.1214/AOMS/1177729694

On Information and Sufficiency

Solomon Kullback, +1 more

- 01 Mar 1951

- Annals of Mathematical Statistics

19.8K

•Posted Content

On Information and Sufficiency

Huaiyu Zhu

- 01 Feb 1997

- Research Papers in Economics

TL;DR: The information deviation between any two finite measures cannot be increased by any statistical operations (Markov morphisms) and is invarient if and only if the morphism is sufficient for these two measures as mentioned in this paper.

...read moreread less

7.3K

•Proceedings Article

Fitting a mixture model by expectation maximization to discover motifs in biopolymers.

Timothy L. Bailey, +1 more

- 01 Jan 1994

TL;DR: The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences.

...read moreread less

5.5K

•Journal Article•10.1091/MBC.9.12.3273

Comprehensive Identification of Cell Cycle–regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization

Paul T. Spellman, +10 more

- 01 Dec 1998

- Molecular Biology of the Cell

TL;DR: A comprehensive catalog of yeast genes whose transcript levels vary periodically within the cell cycle is created, and it is found that the mRNA levels of more than half of these 800 genes respond to one or both of these cyclins.

...read moreread less

5.4K

Journal Article•10.1126/SCIENCE.278.5338.680

Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale

Joseph L. DeRisi, +2 more

- 24 Oct 1997

- Science

TL;DR: DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used to carry out a comprehensive investigation of the temporal program of gene expression accompanying the metabolic shift from fermentation to respiration, and the expression patterns of many previously uncharacterized genes provided clues to their possible functions.

...read moreread less

4.9K