ESTprep: preprocessing cDNA sequence reads.
Todd E. Scheetz,Nishank Trivedi,Chad A. Roberts,Tamara A. Kucaba,Brian Berger,Natalie L. Robinson,Clayton L. Birkett,Allen J. Gavin,Brian O'Leary,Terry A. Braun,Maria de Fatima Bonaldo,John Robinson,Val C. Sheffield,Marcelo B. Soares,Thomas L. Casavant +14 more
TL;DR: The ESTprep as discussed by the authors preprocessed expressed sequence tag (EST) sequences to identify the location of features present in ESTs and allowed the sequence to pass only if it meets various quality criteria.
read more
Abstract: Motivation: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated fo rv arious features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. Results: This paper describes ESTprep ,ap rogram designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. Availability: The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Correlation of mRNA and protein levels: Cell type-specific gene expression of cluster designation antigens in the prostate
Laura E. Pascal,Laura E. Pascal,Lawrence D. True,David S. Campbell,Eric W. Deutsch,Michael C. Risk,Ilsa Coleman,Lillian J Eichner,Lillian J Eichner,Peter S. Nelson,Alvin Y. Liu,Alvin Y. Liu +11 more
TL;DR: Divergence between the two data types was most frequently seen for genes whose array signals exceeded background but lacked immunoreactivity by immunostaining, and agreement between these two very different methodologies has great implications for their respective use in both molecular studies and clinical trials employing molecular biomarkers.
Genome Organization of More Than 300 Defensin-Like Genes in Arabidopsis
TL;DR: Evidence is provided of a large DEFL superfamily present in expressed tissues of all sequenced plants, including four of the largest clusters of DEFLs, and the first evidence of expression, most frequently in floral tissues.
268
Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach
D. Lee Taylor,Michael G. Booth,Jack W. McFarland,Ian C. Herriott,Niall Lennon,Chad Nusbaum,Thomas G. Marr +6 more
TL;DR: A primer‐tagging approach that allows pooling and subsequent sorting of numerous samples, which is directed to amplification of a region spanning the nuclear ribosomal internal transcribed spacers and partial large subunit from fungi in environmental samples, suggests that the pig‐tagged primers can be used to increase ecological inference in high throughput sequencing projects on fungi.
63
EST-based gene discovery in pig: virtual expression patterns and comparative mapping to human.
Christopher K. Tuggle,Jon A. Green,Carolyn Fitzsimmons,Rami J. Woods,Randall S. Prather,Sergei Malchenko,Bento Soares,Tamara A. Kucaba,Keith Crouch,Christina C. Smith,Dylan Tack,Natalie L. Robinson,Brian O'Leary,Todd E. Scheetz,Thomas L. Casavant,Daniel Pomp,Brad J. Edeal,Yuandan Zhang,Max F. Rothschild,Kevin Garwood,William D. Beavis +20 more
TL;DR: Computer software is developed to identify sequence similarity of these pig genes with their human counterparts, and to extract the mapping information of these human homologues from genome databases, and it is demonstrated the utility of this software for comparative mapping by localizing 61 genes on the porcine physical map for Chromosomes 5, 10, and 14.
63
Construction of a medicinal leech transcriptome database and its application to the identification of leech homologs of neural and innate immune genes
Eduardo R. Macagno,Terry Gaasterland,Lee Edsall,Vineet Bafna,Marcelo B. Soares,Todd E. Scheetz,Thomas L. Casavant,Corinne Da Silva,Patrick Wincker,Aurélie Tasiemski,Michel Salzet +10 more
TL;DR: The sequences obtained represent the first major database of genes expressed in this important model system, and show a strong resemblance to the corresponding mammalian genes, indicating that this important physiological response may have older origins than what has been previously proposed.
References
Identification of common molecular subsequences.
TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).
11.3K
Base-calling of automated sequencer traces using Phred. I. accuracy assessment
TL;DR: In this article, a base-calling program for automated sequencer traces, phred, with improved accuracy was proposed. But it was not shown to achieve a lower error rate than the ABI software, averaging 40%-50% fewer errors in the data sets examined independent of position in read, machine running conditions, or sequencing chemistry.
The staden sequence analysis package
TL;DR: The current version of the sequence analysis package developed at the MRC Laboratory of Molecular Biology is described, which has come to be known as the “Staden Package,” and provides powerful tools for DNA sequence determination.
1.3K
A new DNA sequence assembly program
TL;DR: The Genome Assembly Program (GAP), a new program for DNA sequence assembly, is described, which retains the useful components of the previous work, but includes many novel ideas and methods.
1K
Patterns of Variant Polyadenylation Signal Usage in Human Genes
Emmanuel Beaudoing,Susan M. Freier,Jacqueline R. Wyatt,Jean-Michel Claverie,Daniel Gautheret +4 more
TL;DR: The average number of ESTs associated with each signal type suggests that variant signals (including the common AUUAAA) are processed less efficiently than the canonical signal and could therefore be selected for regulatory purposes.