About: Conserved non-coding sequence is a research topic. Over the lifetime, 366 publications have been published within this topic receiving 28959 citations.
TL;DR: A comprehensive search for conserved elements in vertebrate genomes is conducted, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes), using a two-state phylogenetic hidden Markov model (phylo-HMM).
Abstract: We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based on this model. The predicted elements cover roughly 3%-8% of the human genome (depending on the details of the calibration procedure) and substantially higher fractions of the more compact Drosophila melanogaster (37%-53%), Caenorhabditis elegans (18%-37%), and Saccharaomyces cerevisiae (47%-68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties with ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3' UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.
TL;DR: There are 481 segments longer than 200 base pairs that are absolutely conserved between orthologous regions of the human, rat, and mouse genomes, which represent a class of genetic elements whose functions and evolutionary origins are yet to be determined, but which are more highly conserving between these species than are proteins.
Abstract: There are 481 segments longer than 200 base pairs (bp) that are absolutely conserved (100% identity with no insertions or deletions) between orthologous regions of the human, rat, and mouse genomes. Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish. These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development. Along with more than 5000 sequences of over 100 bp that are absolutely conserved among the three sequenced mammals, these represent a class of genetic elements whose functions and evolutionary origins are yet to be determined, but which are more highly conserved between these species than are proteins and appear to be essential for the ontogeny of mammals and other vertebrates.
TL;DR: The conserved noncoding elements identified in TFCONES represent a catalog of highly prioritized putative cis-regulatory elements of TF-encoding genes and are candidates for functional assay.
Abstract: Background
Transcription factors (TFs) regulate gene transcription and play pivotal roles in various biological processes such as development, cell cycle progression, cell differentiation and tumor suppression. Identifying cis-regulatory elements associated with TF-encoding genes is a crucial step in understanding gene regulatory networks. To this end, we have used a comparative genomics approach to identify putative cis-regulatory elements associated with TF-encoding genes in vertebrates.
TL;DR: The definitive features of these novel elements are that they include site‐specific integration functions (the Integrase and the insertion site); (ii) that they are able to acquire various gene units and act as an expression cassette by supplying the promoter for the inserted genes.
Abstract: A family of novel mobile DNA elements is described, examples of which are found at several independent locations and encode a variety of antibiotic resistance genes. The complete elements consist of two conserved segments separated by a segment of variable length and sequence which includes inserted antibiotic resistance genes. The conserved segment located 3' to the inserted resistance genes was sequenced from Tn21 and R46, and the sequences are identical over a region of 2026 bases, which includes the sulphonamide resistance gene sull, and two further open reading frames of unknown function. The complete sequences of both the 3' and 5' conserved regions of the DNA element have been determined. A 59-base sequence element, found at the junctions of inserted DNA sequences and the conserved 3' segment, is also present at this location in the R46 sequence. A copy of one half of this 59-base element is found at the end of the sull gene, suggesting that sull, though part of the conserved region, was also originally inserted into an ancestral element by site-specific integration. Inverted or direct terminal repeats or short target site duplications, both of which are characteristics of class I and class II transposons, are not found at the outer boundaries of the elements described here. Furthermore, the conserved regions do not encode any proteins related to known transposition proteins, except the DNA integrase encoded by the 5' conserved region which is implicated in the gene insertion process. Mobilization of this element has not been observed experimentally; mobility is implied from the identification of the element in at least four independent locations, in Tn21, R46 (IncN), R388 (IncW) and Tn1696. The definitive features of these novel elements are (i) that they include site-specific integration functions (the integrase and the insertion site); (ii) that they are able to acquire various gene units and act as an expression cassette by supplying the promoter for the inserted genes. As a consequence of acquiring different inserted genes, the element exists in a variety of forms which differ in the number and nature of the inserted genes. This family of elements appears formally distinct from other known mobile DNA elements and we propose the name DNA integration elements, or integrons.
TL;DR: BioProspector, a C program using a Gibbs sampling strategy, examines the upstream region of genes in the same gene expression pattern group and looks for regulatory sequence motifs, showing preliminary success in finding the binding motifs for Saccharomyces cerevisiae RAP1, Bacillus subtilis RNA polymerase, and Escherichia coli CRP.
Abstract: The development of genome sequencing and DNA microarray analysis of gene expression gives rise to the demand for data-mining tools. BioProspector, a C program using a Gibbs sampling strategy, examines the upstream region of genes in the same gene expression pattern group and looks for regulatory sequence motifs. BioProspector uses zero to third-order Markov background models whose parameters are either given by the user or estimated from a specified sequence file. The significance of each motif found is judged based on a motif score distribution estimated by a Monte Carlo method. In addition, BioProspector modifies the motif model used in the earlier Gibbs samplers to allow for the modeling of gapped motifs and motifs with palindromic patterns. All these modifications greatly improve the performance of the program. Although testing and development are still in progress, the program has shown preliminary success in finding the binding motifs for Saccharomyces cerevisiae RAP1, Bacillus subtilis RNA polymerase, and Escherichia coli CRP. We are currently working on combining BioProspector with a clustering program to explore gene expression networks and regulatory mechanisms.