Background rareness-based iterative multiple sequence alignment algorithm for regulatory element detection.
TL;DR: This work presents a new deterministic iterative algorithm for regulatory element detection based on a Markov chain background that alleviates the drawbacks of MAP (maximum a posteriori log likelihood) scores.
read more
Abstract: Motivation: Experimental methods capable of generating sets of co-regulated genes have become commonplace, however, recognizing the regulatory motifs responsible for this regulation remains difficult. As a result, computational detection of transcription factor binding sites in such data sets has been an active area of research. Most approaches have utilized either Gibbs sampling or greedy strategies to identify such elements in sets of sequences. These existing methods have varying degrees of success depending on the strength and length of the signals and the number of available sequences. We present a new deterministic iterative algorithm for regulatory element detection based on a Markov chain background. As in other methods, sequences in the entire genome and the training set are taken into account in order to discriminate against commonly occurring signals and produce patterns, which are significant in the training set. Results: The results of the algorithm compare favorably with existing tools on previously known and newly compiled data sets. The iteration based search appears rather rigorous, not only finding the binding sites, but also showing how the binding site stands out from genomic background. The approach used to score the results is critical and a discussion of various scoring schemes and options is also presented. Benchmarking of several methods shows that while most tools are good at detecting strong signals, Gibbs sampling algorithms give inconsistent results when the regulatory element signal becomes weak. A Markov chain based background model alleviates the drawbacks of MAP (maximum a posteriori log likelihood) scores. Availability: Available on request from the authors. Contact: uberbacherec@ornl.gov Supplementary information: Data and the results presented in this paper are available on the web at http://compbio.ornl.gov/ mira/index.html
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes
TL;DR: Weeder Web is a web interface to Weeder, an algorithm for the automatic discovery of conserved motifs in a set of related regulatory DNA sequences, likely to be instances of binding sites for some transcription factor.
Genomic DNA k-mer spectra: models and modalities
TL;DR: Multimodal spectra are characterized by specific ranges of values of C+G content and of CpG dinucleotide suppression, a range that encompasses all tetrapods analyzed, and are found to capture low-order Markov models fairly well.
Motif discovery and transcription factor binding sites before and after the next-generation sequencing era
TL;DR: ChIP, applied to transcription factors and coupled with genome tiling arrays or next-generation sequencing technologies (ChIP-Seq) has opened new avenues in research, as well as posed new challenges to bioinformaticians developing algorithms and methods for motif discovery.
In silico representation and discovery of transcription factor binding sites
TL;DR: A survey of existing methods proposed for the identification of transcription factor binding sites in the regulatory regions of co-expressed genes, focusing both on the ideas underlying them and their availability to the scientific community is provided.
MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes
Giulio Pavesi,Paolo Mereghetti,Federico Zambelli,Marco Stefani,Giancarlo Mauri,Graziano Pesole +5 more
TL;DR: The MoD (MOtif Discovery) Tools web server comprises a set of tools for the discovery of novel conserved sequence and structure motifs in nucleotide sequences, motifs that in turn are good candidates for regulatory activity.
References
•Posted Content
On Information and Sufficiency
TL;DR: The information deviation between any two finite measures cannot be increased by any statistical operations (Markov morphisms) and is invarient if and only if the morphism is sufficient for these two measures as mentioned in this paper.
7.3K
•Proceedings Article
Fitting a mixture model by expectation maximization to discover motifs in biopolymers.
Timothy L. Bailey,Charles Elkan +1 more
- 01 Jan 1994
TL;DR: The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences.
Comprehensive Identification of Cell Cycle–regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization
Paul T. Spellman,Gavin Sherlock,Gavin Sherlock,Michael Q. Zhang,Vishwanath R. Iyer,Kirk R. Anders,Michael B. Eisen,Patrick O. Brown,Patrick O. Brown,David Botstein,Bruce Futcher +10 more
TL;DR: A comprehensive catalog of yeast genes whose transcript levels vary periodically within the cell cycle is created, and it is found that the mRNA levels of more than half of these 800 genes respond to one or both of these cyclins.
5.4K
Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale
TL;DR: DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used to carry out a comprehensive investigation of the temporal program of gene expression accompanying the metabolic shift from fermentation to respiration, and the expression patterns of many previously uncharacterized genes provided clues to their possible functions.