TL;DR: A new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size is presented and its ability to detect tandem repeats that have undergone extensive mutational change is demonstrated.
Abstract: A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm’s speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human β T cell receptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface at c3.biomath.mssm.edu/trf.html has been established for automated use of the program.
TL;DR: Two cassettes with tetracycline-resistance (TcR) and kanamycin-res resistance (KmR) determinants have been developed for the construction of insertion and deletion mutants of cloned genes in Escherichia coli.
TL;DR: A novel family of repetitive DNA sequences that is present among both domains of the prokaryotes but absent from eukaryotes or viruses is studied, characterized by direct repeats, varying in size from 21 to 37 bp, interspaced by similarly sized non‐repetitive sequences.
Abstract: Using in silico analysis we studied a novel family of repetitive DNA sequences that is present among both domains of the prokaryotes (Archaea and Bacteria), but absent from eukaryotes or viruses. This family is characterized by direct repeats, varying in size from 21 to 37 bp, interspaced by similarly sized non-repetitive sequences. To appreciate their characteri-stic structure, we will refer to this family as the clustered regularly interspaced short palindromic repeats (CRISPR). In most species with two or more CRISPR loci, these loci were flanked on one side by a common leader sequence of 300-500 b. The direct repeats and the leader sequences were conserved within a species, but dissimilar between species. The presence of multiple chromosomal CRISPR loci suggests that CRISPRs are mobile elements. Four CRISPR-associated (cas) genes were identified in CRISPR-containing prokaryotes that were absent from CRISPR-negative prokaryotes. The cas genes were invariably located adjacent to a CRISPR locus, indicating that the cas genes and CRISPR loci have a functional relationship. The cas3 gene showed motifs characteristic for helicases of the superfamily 2, and the cas4 gene showed motifs of the RecB family of exonucleases, suggesting that these genes are involved in DNA metabolism or gene expression. The spatial coherence of CRISPR and cas genes may stimulate new research on the genesis and biological role of these repeats and genes.
TL;DR: The cag region may encode a novel H. pylori secretion system for the export of virulence determinants and Transposon inactivation of several of the cagI genes abolishes induction of IL-8 expression in gastric epithelial cell lines.
Abstract: cagA, a gene that codes for an immunodominant antigen, is present only in Helicobacter pylori strains that are associated with severe forms of gastroduodenal disease (type I strains). We found that the genetic locus that contains cagA (cag) is part of a 40-kb DNA insertion that likely was acquired horizontally and integrated into the chromosomal glutamate racemase gene. This pathogenicity island is flanked by direct repeats of 31 bp. In some strains, cag is split into a right segment (cagI) and a left segment (cagII) by a novel insertion sequence (IS605). In a minority of H. pylori strains, cagI and cagII are separated by an intervening chromosomal sequence. Nucleotide sequencing of the 23,508 base pairs that form the cagI region and the extreme 3' end of the cagII region reveals the presence of 19 ORFs that code for proteins predicted to be mostly membrane associated with one gene (cagE), which is similar to the toxin-secretion gene of Bordetella pertussis, ptlC, and the transport systems required for plasmid transfer, including the virB4 gene of Agrobacterium tumefaciens. Transposon inactivation of several of the cagI genes abolishes induction of IL-8 expression in gastric epithelial cell lines. Thus, we believe the cag region may encode a novel H. pylori secretion system for the export of virulence determinants.
TL;DR: The locus that contains cagA (cag) is part of a 40-kb DNA insertion that likely was acquired horizontally and integrated into the chromosomal glutamateracemasegene.
Abstract: cagA, a gene that codes for an immunodom- inantantigen,ispresentonlyinHelicobacterpyloristrainsthat are associated with severe forms of gastroduodenal disease (type Is trains). We found that the genetic locus that contains cagA (cag) is part of a 40-kb DNA insertion that likely was acquired horizontally and integrated into the chromosomal glutamateracemasegene.Thispathogenicityislandisflanked by direct repeats of 31 bp. In some strains,cagis split into a right segment (cagI) and a left segment (cagII) by a novel insertion sequence (IS605). In a minority ofH. pyloristrains, cagI andcagII are separated by an intervening chromosomal sequence. Nucleotide sequencing of the 23,508 base pairs that formthecagIregionandtheextreme3*endofthecagIIregion reveals the presence of 19 ORFs that code for proteins predicted to be mostly membrane associated with one gene (cagE), which is similar to the toxin-secretion gene ofBorde- tella pertussis, ptlC, and the transport systems required for plasmid transfer, including the virB4 gene of Agrobacterium tumefaciens. Transposon inactivation of several of the cagI genes abolishes induction of IL-8 expression in gastric epi- thelial celllines. Thus, we believe thecagregion may encode a novel H. pylori secretion system for the export of virulence determinants.