Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark

doi:10.1109/CLOUD.2017.83

Proceedings Article10.1109/CLOUD.2017.83

Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark

Bo Xu, +5 more

- 25 Jun 2017

- pp 608-615

17

TL;DR: CloudSW is presented, an efficient distributed Smith-Waterman algorithm which leverages Apache Spark and SIMD instructions to accelerate the algorithm and which has excellent scalability and achieves up to 529 giga cell updates per second in protein database search with 50 nodes in Aliyun Cloud.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.3390/APP9214502

Development of Scalable On-Line Anomaly Detection System for Autonomous and Adaptive Manufacturing Processes

Seunghyun Choi, +2 more

- 24 Oct 2019

- Applied Sciences

TL;DR: The proposed architecture framework and method for the implementation of the Scalable On-line Anomaly Detection System (SOADS) which can detect process anomalies via real-time processing and analyze large amounts of process execution data in the context of autonomous and adaptive manufacturing processes succeeded in large-scale data processing and analysis.

...read moreread less

11

•Dissertation

Identifying Polymorphic Malware Variants Using Biosequence Analysis Techniques

Vijay Naidu

- 01 Jan 2018

TL;DR: It is suggested that the number of children under the age of five should be counted as one in a family rather than two in the case of a family of five.

...read moreread less

6

Journal Article•10.1007/S11227-020-03338-3

High throughput BLAST algorithm using spark and cassandra

Fernando Cores, +2 more

- 01 Feb 2021

- The Journal of Supercomputing

TL;DR: A new implementation of the Basic Local Alignment Search Tool algorithm is presented, named Sparky-Blast, which is capable of using the distributed resources of a Big-Data Cluster to process queries in parallel, improving both the response time and the system throughput.

...read moreread less

5

Proceedings Article•10.1109/CCGRID51090.2021.00034

Comparing SARS-CoV-2 Sequences using a Commercial Cloud with a Spot Instance Based Dynamic Scheduler

Luan Teylo, +5 more

- 10 May 2021

TL;DR: In this paper, the authors compared SARS-CoV-2 sequences with MASA-OpenMP in the Amazon Elastic Compute Cloud (Amazon EC2), using both spot and on-demand instances.

...read moreread less

5

Proceedings Article•10.1109/CLUSTER49012.2020.00044

Efficient Execution of Dynamic Programming Algorithms on Apache Spark

Mohammad Mahdi Javanmard, +5 more

- 01 Sep 2020

TL;DR: This work designs and implements well-decomposable and tunable dynamic programming algorithms from the Gaussian Elimination Paradigm, such as Floyd-Warshall's all-pairs shortest path and Gaussian elimination without pivoting, for execution on Apache Spark based on parametric multi-way recursive divide-&-conquer algorithms.

...read moreread less

5

...

Expand

References

•Journal Article•10.1093/BIOINFORMATICS/BTP352

The Sequence Alignment/Map format and SAMtools

Heng Li, +8 more

- 01 Aug 2009

- Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

60.7K

•Journal Article•10.1093/BIOINFORMATICS/BTP324

Fast and accurate short read alignment with Burrows–Wheeler transform

Heng Li, +1 more

- 01 Jul 2009

- Bioinformatics

TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.

...read moreread less

55.5K

•Journal Article•10.1038/NMETH.1923

Fast gapped-read alignment with Bowtie 2

Ben Langmead, +3 more

- 01 Apr 2012

- Nature Methods

TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

...read moreread less

52.8K

Journal Article•10.1038/NMETH.3176

Fast and sensitive protein alignment using DIAMOND

Benjamin Buchfink, +2 more

- 01 Jan 2015

- Nature Methods

TL;DR: DIAMOND is introduced, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.

...read moreread less

11.6K

Journal Article•10.1016/0022-2836(81)90087-5

Identification of common molecular subsequences.

Temple F. Smith, +1 more

- 25 Mar 1981

- Journal of Molecular Biology

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

...read moreread less

11.3K