A structure-based method for protein sequence alignment
Maricel G. Kann,Paul A. Thiessen,Anna R. Panchenko,Alejandro A. Schäffer,Stephen F. Altschul,Stephen H. Bryant +5 more
TL;DR: SALTO as mentioned in this paper aligns protein query sequences to position-specific scoring matrices (PSSMs) using rules for placing and scoring gaps that are consistent with the conserved regions of domain alignments from NCBI's conserved domain database.
read more
Abstract: Motivation: With the continuing rapid growth of protein sequence data, protein sequence comparison methods have become the most widely used tools of bioinformatics. Among these methods are those that use position-specific scoring matrices (PSSMs) to describe protein families. PSSMs can capture information about conserved patterns within families, which can be used to increase the sensitivity of searches for related sequences. Certain types of structural information, however, are not generally captured by PSSM search methods. Here we introduce a program, Structure-based ALignment TOol (SALTO), that aligns protein query sequences to PSSMs using rules for placing and scoring gaps that are consistent with the conserved regions of domain alignments from NCBI's Conserved Domain Database.
Results: In most cases, the alignment scores obtained using the local alignment version follow an extreme value distribution. SALTO's performance in finding related sequences and producing accurate alignments is similar to or better than that of IMPALA; one advantage of SALTO is that it imposes an explicit gapping model on each protein family.
Availability: A stand-alone version of the program that can generate global or local alignments is available by ftp distribution (ftp://ftp.ncbi.nih.gov/pub/SALTO/), and has been incorporated to Cn3D structure/alignment viewer.
Contact: bryant@ncbi.nlm.nih.gov
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches.
TL;DR: A version of the BLAST protein database search program, modified to employ this new measure of sequence similarity, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set.
63
Functionally Compensating Coevolving Positions Are Neither Homoplasic Nor Conserved in Clades
Gregory B. Gloor,Gaurav Tyagi,Dana Abrassart,Andrew J. Kingston,Andrew D. Fernandes,Stanley D. Dunn,Christopher J. Brandl +6 more
TL;DR: It is demonstrated that a pair of positions in phosphoglycerate kinase that score highly by three nonparametric covariation measures are important for function even though the positions can be occupied by aliphatic, aromatic, or charged residues.
28
Potential implications of availability of short amino acid sequences in proteins: an old and new approach to protein decoding and design.
Joji M. Otaki,Tomonori Gotoh,Haruhiko Yamamoto +2 more
- 01 Jan 2008
TL;DR: An analytical method in which a protein sequence is considered to be constructed by serial superimpositions of short amino acid sequences of n amino acid sets, especially triplets, is proposed, which will elucidate general rules hidden in the primary sequences and eventually contributes to rebuilding the paradigm of protein science.
17
The identification of complete domains within protein sequences using accurate E-values for semi-global alignment
TL;DR: The new tool, GLOBAL, overcomes some limitations of previous semi-global HMM alignment tools: it has accurate E-values and the possibility of the heuristic acceleration required for high-throughput applications, yet maintains their superior retrieval performance.
16
An optimal Mesh Algorithm for Remote Protein Homology Detection
Firdaus Abdullah,Razib M. Othman,Shahreen Kasim,Rathiah Hashim +3 more
- 13 Apr 2011
TL;DR: The result from this paper shows that the combination of refined SVM-Struct and PROMALS3D performs the best against other programs, which suggests that this combination is the best for RPHD.
8
References
Basic Local Alignment Search Tool
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
98.8K
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
The Pfam protein families database
Marco Punta,Penny Coggill,Ruth Y. Eberhardt,Jaina Mistry,John Tate,Chris Boursnell,Ningze Pang,Kristoffer Forslund,Goran Ceric,Jody Clements,Andreas Heger,Liisa Holm,Erik L. L. Sonnhammer,Sean R. Eddy,Alex Bateman,Robert D. Finn +15 more
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Pfam: the protein families database.
Robert D. Finn,Alex Bateman,Jody Clements,Penelope Coggill,Ruth Y. Eberhardt,Sean R. Eddy,Andreas Heger,Kirstie Hetherington,Liisa Holm,Jaina Mistry,Erik L. L. Sonnhammer,John Tate,Marco Punta +12 more
TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.
Improved tools for biological sequence comparison.
TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
13.3K