Optimizing reduced-space sequence analysis.

doi:10.1093/BIOINFORMATICS/16.12.1082

Open AccessJournal Article10.1093/BIOINFORMATICS/16.12.1082

Optimizing reduced-space sequence analysis.

Raymond Wheeler, +1 more

- 01 Dec 2000

- Bioinformatics

- Vol. 16, Iss: 12, pp 1082-1090

22

TL;DR: The improved row checkpoint algorithm is improved by analyzing optimal checkpoint placement and performing up to one half the computation of the original algorithm, and the improved diagonal checkpoint algorithm performs up to 35% fewer computational steps than the original.

Abstract: Motivation: Dynamic programming is the core algorithm of sequence comparison, alignment and linear hidden Markov model (HMM) training. For a pair of sequence lengths m and n, the problem can be solved readily in O(mn) time and O(mn) space. The checkpoint algorithm introduced by Grice et al. (CABIOS, 13, 45–53, 1997) runs in O(Lmn) time and O(Lm L √ n) space, where L is a positive integer determined by m, n, and the amount of available workspace. The algorithm is appropriate for many string comparison problems, including all-paths and single-best-path hidden Markov model training, and is readily parallelizable. The checkpoint algorithm has a diagonal version that can solve the single-best-path alignment problem in O(mn) time and O(m + n) space. Results: In this work, we improve performance by analyzing optimal checkpoint placement. The improved row checkpoint algorithm performs up to one half the computation of the original algorithm. The improved diagonal checkpoint algorithm performs up to 35% fewer computational steps than the original. We modified the SAM hidden Markov modeling package to use the improved row checkpoint algorithm. For a fixed sequence length, the new version is up to 33% faster for all-paths and 56% faster for single-best-path HMM training, depending on sequence length and allocated memory. Over a typical set of protein sequence lengths, the improvement is ∼10%. Availability: The SAM hidden Markov modeling package is freely available for academic use from http:// www. cse.ucsc.edu/ research/ compbio/ sam.html. The C++ code used to find optimal checkpoint placements is available from http:// www.cse.ucsc.edu/ research/ kestrel.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1002/GEPI.20533

MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes

Yun Li, +4 more

- 01 Dec 2010

- Genetic Epidemiology

TL;DR: It is shown that genotype imputation of common variants using HapMap haplotypes as a reference is very accurate using either genome‐wide SNP data or smaller amounts of data typical in fine‐mapping studies, and it is illustrated how association analyses of unobserved variants will benefit from ongoing advances such as larger Hap map reference panels and whole genome shotgun sequencing technologies.

...read moreread less

2.1K

•Journal Article•10.1016/J.AJHG.2009.01.005

A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals

Brian L. Browning, +1 more

- 13 Feb 2009

- American Journal of Human Genetics

TL;DR: It is demonstrated that substantial gains in imputation accuracy accrue with increasingly large reference panel sizes, particularly when imputing low-frequency variants, and that unphased reference panels can provide highly accurate genotype imputation.

...read moreread less

1.7K

•Journal Article•10.1016/J.AJHG.2015.11.020

Genotype Imputation with Millions of Reference Samples

Brian L. Browning, +1 more

- 07 Jan 2016

- American Journal of Human Genetics

TL;DR: A genotype imputation method that scales to millions of reference samples and achieves fast, accurate, and memory-efficient genotypes imputation by restricting the probability model to markers that are genotypes in the target samples and by performing linear interpolation to impute ungenotyped variants.

...read moreread less

1.1K

•Journal Article•10.1186/1471-2105-9-224

Implementing EM and Viterbi algorithms for Hidden Markov Model in linear memory

Alexander G. Churbanov, +1 more

- 30 Apr 2008

- BMC Bioinformatics

TL;DR: A memory sparse version of the Baum-Welch algorithm with modifications to the original probabilistic table topologies to make memory use independent of sequence length (and linearly dependent on state number) and a linear memory implementation of the Viterbi decoding algorithm.

...read moreread less

56

•Journal Article•10.1186/1471-2105-6-231

A linear memory algorithm for Baum-Welch training.

István Miklós, +1 more

- 19 Sep 2005

- BMC Bioinformatics

TL;DR: In this article, the first linear space algorithm for Baum-Welch training was proposed, which has a memory requirement of O(M) memory and O(LMTγγγεγετε εγεεγαγεδεταταγατεγγα ε ≥ 0 for a hidden Markov model with M states, T free transition and E free emission parameters.

...read moreread less

37

...

Expand

References

Journal Article•10.1016/0022-2836(81)90087-5

Identification of common molecular subsequences.

Temple F. Smith, +1 more

- 25 Mar 1981

- Journal of Molecular Biology

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

...read moreread less

11.3K

•Book

Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids

Richard Durbin, +3 more

- 01 Feb 2005

TL;DR: This book gives a unified, up-to-date and self-contained account, with a Bayesian slant, of such methods, and more generally to probabilistic methods of sequence analysis.

...read moreread less

4.5K

•Journal Article•10.1093/NAR/27.1.49

The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999.

Amos Marc Bairoch, +1 more

- 01 Jan 1998

- Nucleic Acids Research

TL;DR: The Human Proteomics Initiative (HPI), a major project to annotate all known human sequences according to the quality standards of SWISS-PROT, is described.

...read moreread less

3.6K

Journal Article•10.1145/321796.321811

The String-to-String Correction Problem

Robert A. Wagner, +1 more

- 01 Jan 1974

- Journal of the ACM

TL;DR: An algorithm is presented which solves the string-to-string correction problem in time proportional to the product of the lengths of the two strings.

...read moreread less

3.5K

Journal Article•10.1006/JMBI.1994.1104

Hidden markov models in computational biology: applications to protein modeling

Anders Krogh, +4 more

- 01 Aug 1993

- Journal of Molecular Biology

TL;DR: The results suggest the presence of an EF-hand calcium binding motif in a highly conserved and evolutionary preserved putative intracellular region of 155 residues in the alpha-1 subunit of L-type calcium channels which play an important role in excitation-contraction coupling.

...read moreread less

2.1K

...

Expand

Optimizing reduced-space sequence analysis.

Chat with Paper

AI Agents for this Paper

Citations

MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes

A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals

Genotype Imputation with Millions of Reference Samples

Implementing EM and Viterbi algorithms for Hidden Markov Model in linear memory

A linear memory algorithm for Baum-Welch training.

References

Identification of common molecular subsequences.

Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids

The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999.

The String-to-String Correction Problem

Hidden markov models in computational biology: applications to protein modeling

Related Papers (5)

Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

A tutorial on hidden Markov models and selected applications in speech recognition

A linear memory algorithm for Baum-Welch training.

Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids

Implementing EM and Viterbi algorithms for Hidden Markov Model in linear memory