A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers
Gagandeep Singh,Mohammed Alser,Ali Khodamoradi,Kristof Denolf,Can Fırtına,Meryem Banu Cavlak,Henk Corporaal,Onur Mutlu +7 more
TL;DR: Zhang et al. as mentioned in this paper proposed a quantization-aware base calling neural architecture search (QABAS) framework to find the best bit-width precision for each neural network layer.
read more
Abstract: Nanopore sequencing is a widely-used high-throughput genome sequencing technology that can se-quence long fragments of a genome. Nanopore sequencing generates noisy electrical signals that need to be converted into a standard string of DNA nucleotide bases using a computational step called basecalling. The performance of basecalling has critical implications for all later steps in genome analysis. Many researchers adopt complex deep learning-based models from the speech recognition domain to perform basecalling without considering the compute demands of such models, which leads to slow, inefficient, and memory-hungry basecallers. Therefore, there is a need to reduce the computation and memory cost of basecalling while maintaining accuracy. However, developing a very fast basecaller that can provide high accuracy requires a deep understanding of genome sequencing, machine learning, and hardware design. Our goal is to develop a comprehensive framework for creating deep learning-based basecallers that provide high efficiency and performance. We introduce RUBICON, a framework to develop hardware-optimized basecallers. RUBICON consists of two novel machine-learning techniques that are specifically designed for basecalling. First, we introduce the first quantization-aware basecalling neural architecture search (QABAS) framework to specialize the basecalling neural network architecture for a given hardware acceleration platform while jointly exploring and finding the best bit-width precision for each neural network layer. Second, we develop SkipClip, the first technique to remove the skip connections present in modern basecallers to greatly reduce resource and storage requirements without any loss in basecalling accuracy. We demonstrate the benefits of RUBICON by developing RUBICALL, the first hardware-optimized basecaller that performs fast and accurate basecalling. Our experimental results on state-of-the-art computing systems show that RUBICALL is a fast, memory-efficient, and hardware-friendly basecaller. Compared to the fastest state-of-the-art basecaller, RUBICALL provides a 3.96× speedup with 2.97% higher accuracy. Compared to an expert-designed basecaller, RUBICALL provides a 141.15× speedup without losing accuracy while also achieving a 6.88× and 2.94× reduction in neural network model size and the number of parameters, respectively. We show that RUBICON helps researchers develop hardware-optimized basecallers that are superior to expert-designed models and can inspire independent future ideas.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes
Can Fırtına,Nika Mansouri Ghiasi,Joel Lindegger,Gagandeep Singh,Meryem Banu Cavlak,Haiyu Mao,Onur Mutlu +6 more
TL;DR: In this paper , the authors proposed a hash-based similarity search for read-and-write analysis of nanopore raw signals for large genomes using a hash value, regardless of the slight variations in these signals.
SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation
Gagandeep Singh,Ali Khodamoradi,Kristof Denolf,Jack Lo,Juan G'omez-Luna,Joseph Melber,Andra Bisca,Henk Corporaal,Onur Mutlu +8 more
- 06 Mar 2023
TL;DR: SPARTA as discussed by the authors uses the MLIR (Multi-Level Intermediate Representation) compiler framework to accelerate the horizontal diffusion stencil by designing the first scaled-out spatial accelerator.
TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering
Meryem Banu Cavlak,Gagandeep Singh,Mohammed Alser,Can Fırtına,Joel Lindegger,Mohammad Sadrosadati,Nika Mansouri Ghiasi,Can Alkan,Onur Mutlu +8 more
TL;DR: TargetCall as mentioned in this paper proposes to discard reads that will not match the target reference (i.e., off-target reads) prior to base calling, which is the first fast and widely-applicable pre-base calling filter to eliminate the wasted computation in base calling.
Swordfish: A Framework for Evaluating Deep Neural Network-based Basecalling using Computation-In-Memory with Non-Ideal Memristors
Taha Shahroodi,Gagandeep Singh,Mahdi Zahedi,Haiyu Mao,Joel Lindegger,Can Fırtına,Stephan Wong,Onur Mutlu,Said Hamdioui +8 more
TL;DR: This paper proposes Swordfish, a novel hardware/software co-design framework that can effectively address the two aforementioned issues, and leverages various hardware/software co-design solutions to mitigate the basecalling accuracy loss due to such non-idealities.
RUBICON: a framework for designing efficient deep learning-based genomic basecallers
Gagandeep Singh,Mohammed Alser,Kristof Denolf,Can Fırtına,Ali Khodamoradi,Meryem Banu Cavlak,Henk Corporaal,Onur Mutlu +7 more
TL;DR: RUBICON is presented, a framework to develop efficient hardware-optimized basecallers and RUBICALL is developed, the first hardware-optimized mixed-precision basecaller that performs efficient basecalling, outperforming the state-of-the-art basecallers.
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
•Proceedings Article
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe,Christian Szegedy +1 more
- 06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
•Posted Content
Distilling the Knowledge in a Neural Network
TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
21.2K
Minimap2: pairwise alignment for nucleotide sequences
TL;DR: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database and is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mapper at higher accuracy, surpassing most aligners specialized in one type of alignment.