Accelerated Profile HMM Searches

Open AccessBook

Accelerated Profile HMM Searches

- 01 May 2015

6.2K

TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.

Abstract: Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the "multiple segment Viterbi" (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call "sparse rescaling". These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1093/BIOINFORMATICS/BTS635

STAR: ultrafast universal RNA-seq aligner

Alexander Dobin, +8 more

- 01 Jan 2013

- Bioinformatics

TL;DR: The Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure outperforms other aligners by a factor of >50 in mapping speed.

...read moreread less

45K

•Journal Article•10.1038/S41586-021-03819-2

Highly accurate protein structure prediction with AlphaFold

John M. Jumper, +33 more

- 15 Jul 2021

- Nature

TL;DR: For example, AlphaFold as mentioned in this paper predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture. But the accuracy is limited by the fact that no homologous structure is available.

...read moreread less

28.2K

•Journal Article•10.1093/BIOINFORMATICS/BTU153

Prokka: Rapid Prokaryotic Genome Annotation

Torsten Seemann

- 15 Jul 2014

- Bioinformatics

TL;DR: Prokka is introduced, a command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer, and produces standards-compliant output files for further analysis or viewing in genome browsers.

...read moreread less

16K

•Journal Article•10.1093/NAR/GKR1065

The Pfam protein families database

Marco Punta, +15 more

- 01 Jan 2000

- Nucleic Acids Research

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.

...read moreread less

14.8K

•Journal Article•10.1093/NAR/GKT1223

Pfam: the protein families database.

Robert D. Finn, +12 more

- 01 Jan 2014

- Nucleic Acids Research

TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.

...read moreread less

14.1K

...

Expand

References

Journal Article•10.1016/S0022-2836(05)80360-2

Basic Local Alignment Search Tool

Stephen F. Altschul, +4 more

- 01 Oct 1990

- Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

98.8K

•Journal Article•10.1093/NAR/25.17.3389

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Stephen F. Altschul, +6 more

- 01 Sep 1997

- Nucleic Acids Research

TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.

...read moreread less

76.5K

Journal Article•10.1109/5.18626

A tutorial on hidden Markov models and selected applications in speech recognition

Lawrence R. Rabiner

- 01 Feb 1989

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.

...read moreread less

24.3K

•Journal Article•10.1186/1471-2105-10-421

BLAST+: architecture and applications.

Christiam Camacho, +6 more

- 15 Dec 2009

- BMC Bioinformatics

TL;DR: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences.

...read moreread less

18.8K

•Journal Article•10.1093/NAR/GKR1065

The Pfam protein families database

Marco Punta, +15 more

- 01 Jan 2000

- Nucleic Acids Research

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.

...read moreread less

14.8K