TL;DR: The nucleotide sequences of all coding regions and a significant part of the flanking regions of the chicken c- src gene, which is a cellular homolog of the v-src gene of Rous sarcoma virus, are determined and it is suggested that the c-src sequence was captured by a virus through recombination at both sides of the c -src gene, and that the recombinations occurred at the level of proviral DNA.
TL;DR: A computational technique is presented to discover large conserved gene motifs that cover all the samples and classes in the data and constructs XMOTIFS that distinguish between the various classes.
Abstract: We propose a representation for gene expression data called conserved gene expression motifs or XMOTIFs. A gene's expression level is conserved across a set of samples if the gene is expressed with the same abundance in all the samples. A conserved gene expression motif is a subset of genes that is simultaneously conserved across a subset of samples. We present a computational technique to discover large conserved gene motifs that cover all the samples and classes in the data. When applied to published data sets representing different cancers or disease outcomes, our algorithm constructs XMOTIFS that distinguish between the various classes.
TL;DR: A fluorescence in situ hybridization study showed that the human genes coding for prostaglandin-endoperoxide synthase 1 (PTGS1) and prostaglandsin- endoperoxide-synthase 2 (PT GS2) were mapped to distinct chromosomes 9q32-q33.3 and 1q25.3, respectively, indicating that these genes are not genetically linked.
Abstract: The human gene (PTGS2) encoding an inducible isozyme of prostaglandin-endoperoxide synthase (prostaglandin-endoperoxide synthase 2) that is distinct from the well-characterized and constitutive isozyme (prostaglandin-endoperoxide synthase 1), was isolated using a polymerase-chain reaction-generated cDNA fragment probe for human prostaglandin-endoperoxide synthase 2. Nucleotide sequence analysis of the entire human prostaglandin-endoperoxide-synthase-2 gene demonstrated that it is more than 8.3 kb in size and consists of ten exons; this gene is very similar to the murine and chicken prostaglandin-endoperoxide-synthase-2 genes. The structures of exons in the human prostaglandin-endoperoxide-synthase-2 gene were also similar to those of the human prostaglandin-endoperoxide-synthase-1 gene (PTGS1). However, the sizes of introns in the human prostaglandin-endoperoxide-synthase-2 gene were generally smaller than those of the human prostaglandin-endoperoxide-synthase-1 gene. Primer-extension analysis indicated that the transcriptional-start site is 134 bases upstream of the translational-initiation site. The sequence of the 1.69-kb region of nucleotides preceding the transcriptional-start site and the first 0.8-kb intron contained a canonical TATA box and various transcriptional-regulatory elements (CArG box, NF-IL6, PEA-1, myb, GATA-1, xenobiotic-response element, cAMP-response element, NF-kappa B, PEA-3, Sp-1 and 12-O-tetradecanoyl-phorbol-13-acetate-response element). The nucleotide sequence of the 5'-flanking region (275 bp) of the human prostaglandin-endoperoxide-synthase-2 gene showed 63% similarity to the sequence of murine prostaglandin-endoperoxide-synthase-2/TIS10 gene, but essentially no homology to the chicken prostaglandin-endoperoxide-synthase-2 gene, and human and murine prostaglandin-endoperoxide-synthase-1 genes. A fluorescence in situ hybridization study showed that the human genes coding for prostaglandin-endoperoxide synthase 1 (PTGS1) and prostaglandin-endoperoxidase synthase 2 (PTGS2) were mapped to distinct chromosomes 9q32-q33.3 and 1q25.2-q25.3, respectively, indicating that these genes are not genetically linked.
TL;DR: Identification of so many unusual gene models in Drosophila suggests that some mechanisms for gene regulation are more prevalent than previously believed, and underscores the complex challenges of eukaryotic gene prediction.
Abstract: Background: The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences. Results: Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes. Conclusions: Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.
TL;DR: This paper describes how these coding effects can be measured and used to detect protein coding regions.
Abstract: Protein genes can be found either by searching the DNA sequence for signals such as ribosome binding sites or by looking for the effects that coding for a protein has on the coding sequence. This paper describes how these coding effects can be measured and used to detect protein coding regions.