Performance of neural network basecalling tools for Oxford Nanopore sequencing.
TL;DR: The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance, and users should consider producing a custom model using a larger neural network and/or training data from the same species.
read more
Abstract: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences (‘polishing’) with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
High-resolution molecular atlas of a lung tumor in 3D
Tancredi Massimo Pentimalli,Simon Schallenberg,Daniel León-Periñán,Ivano Legnini,Gwendolin Thomas,Anastasiya Boltengagen,Jose Nimo,Lukas Ruff,Gabriel Dernbach,Philipp Jurmeister,Sarah Murphy,Mark Gregory,Yang Xiang,Michelangelo Cordenonsi,Stefano Piccolo,Fabian Coscia,Andrew Woehler,Nikos Karaiskos,Frederick Klauschen,Nikolaus Rajewsky +19 more
TL;DR: A 3D spatial atlas of a routine clinical sample, an aggressive human lung carcinoma, by combining in situ quantification of 960 cancer-related genes across ∼340,000 cells with measurements of tissue-mechanical components was presented in this paper .
A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers
Gagandeep Singh,Mohammed Alser,Ali Khodamoradi,Kristof Denolf,Can Fırtına,Meryem Banu Cavlak,Henk Corporaal,Onur Mutlu +7 more
TL;DR: Zhang et al. as mentioned in this paper proposed a quantization-aware base calling neural architecture search (QABAS) framework to find the best bit-width precision for each neural network layer.
9
ANI, Mash and Dashing equally differentiate between Klebsiella species
TL;DR: In this paper, the authors compared 982 Klebsiella genomes using different species-delimiting measures: Average Nucleotide Identity (ANI), as well as Mash, Dashing, and DNA compositional signatures, which can be run in a fraction of the time required to run ANI.
9
Long-read-based genome assembly reveals numerous endogenous viral elements in the green algal bacterivore Cymbomonas tetramitiformis.
Yangtsho Gyaltshen,Andrey Rozenberg,A. Paasch,John A Burns,Sally Warring,Raegan Larson,Xyrus X. Maurer-Alcalá,Joel Dacks,Apurva Narechania,Eunsoo Kim +9 more
TL;DR: Past (and possibly ongoing) multiple alga–virus interactions that accompanied the genome evolution of C. tetramitiformis are illustrated.
9
Metagenomic Analysis at the Edge with Jetson Xavier NX.
Piotr Grzesik,Dariusz Mrozek +1 more
- 16 Jun 2021
TL;DR: In this paper, the authors present a study on using Edge devices such as Jetson Xavier NX as a platform for running real-time analysis and evaluate it both from a performance and energy efficiency standpoint.
9
References
Minimap2: pairwise alignment for nucleotide sequences
TL;DR: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database and is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mapper at higher accuracy, surpassing most aligners specialized in one type of alignment.
Initial sequencing and comparative analysis of the mouse genome.
Robert H. Waterston,Kerstin Lindblad-Toh,Ewan Birney,Jane Rogers,Josep F. Abril,Pankaj K. Agarwal,Richa Agarwala,Rachel Ainscough,Marina Alexandersson,Peter An,Stylianos E. Antonarakis,John Attwood,Robert Baertsch,J Bailey,K F Barlow,Stephan Beck,Eric Berry,Bruce W. Birren,Toby Bloom,Peer Bork,Marc Botcherby,Nicolas Bray,Michael R. Brent,Daniel G. Brown,Daniel G. Brown,Stephen D. Brown,Carol J. Bult,John Burton,Jonathan Butler,R. D. Campbell,Piero Carninci,Simon Cawley,Francesca Chiaromonte,Asif T. Chinwalla,Deanna M. Church,Michele Clamp,C M Clee,Francis S. Collins,Lisa Cook,Richard R. Copley,Alan Coulson,Olivier Couronne,James Cuff,Val Curwen,Tim Cutts,Mark J. Daly,Robert David,Joy Davies,Kimberly D. Delehaunty,Justin Deri,Emmanouil T. Dermitzakis,Colin N. Dewey,Nicholas J. Dickens,Mark Diekhans,Sheila Dodge,Inna Dubchak,Diane M. Dunn,Sean R. Eddy,Laura Elnitski,Richard D. Emes,Pallavi Eswara,Eduardo Eyras,Adam Felsenfeld,Ginger A. Fewell,Paul Flicek,Karen Foley,Wayne N. Frankel,Lucinda Fulton,Robert S. Fulton,Terrence S. Furey,Diane Gage,Richard A. Gibbs,Gustavo Glusman,Sante Gnerre,Nick Goldman,Leo Goodstadt,Darren Grafham,Tina Graves,Eric D. Green,Simon G. Gregory,Roderic Guigó,Mark S. Guyer,Ross C. Hardison,David Haussler,Yoshihide Hayashizaki,Deana W. LaHillier,Angela S. Hinrichs,Wratko Hlavina,Timothy Holzer,Fan Hsu,Axin Hua,Tim Hubbard,Adrienne Hunt,Ian J. Jackson,David B. Jaffe,L. Steven Johnson,Matthew Jones,Thomas A. Jones,A Joy,Michael Kamal,Elinor K. Karlsson,Donna Karolchik,Arkadiusz Kasprzyk,Jun Kawai,Evan Keibler,Cristyn Kells,W. James Kent,Andrew Kirby,Diana L. Kolbe,Ian F Korf,Raju Kucherlapati,Edward J. Kulbokas,David Kulp,Tom Landers,J. P. Leger,Steven Leonard,Ivica Letunic,Rosie Levine,Jia Li,Ming Li,Christine Lloyd,Susan Lucas,Bin Ma,Donna Maglott,Elaine R. Mardis,Lucy Matthews,Evan Mauceli,John Mayer,Megan McCarthy,W. Richard McCombie,Stuart McLaren,Kirsten McLay,John Douglas Mcpherson,James Meldrim,Beverley Meredith,Jill P. Mesirov,Webb Miller,Tracie L. Miner,Emmanuel Mongin,Kate Montgomery,Michael J. Morgan,Richard Mott,James C. Mullikin,Donna M. Muzny,William E. Nash,Joanne O. Nelson,Michael N. Nhan,Robert Nicol,Zemin Ning,Chad Nusbaum,Michael J. O’Connor,Yasushi Okazaki,Karen Oliver,Emma Overton-Larty,Lior Pachter,Genís Parra,Kymberlie H. Pepin,Jane Peterson,Pavel A. Pevzner,Robert W. Plumb,Craig Pohl,Alex Poliakov,Tracy C. Ponce,Chris P. Ponting,Simon C. Potter,Michael A. Quail,Alexandre Reymond,Bruce A. Roe,Krishna M. Roskin,Edward M. Rubin,Alistair G. Rust,Ralph Santos,Victor Sapojnikov,Brian Schultz,Jörg Schultz,Matthias S. Schwartz,Scott Schwartz,Carol Scott,Steven Seaman,Steve Searle,Ted Sharpe,Andrew Sheridan,Ratna Shownkeen,Sarah Sims,Jonathan Singer,Guy Slater,Arian F.A. Smit,Douglas Smith,Brian Spencer,Arne Stabenau,Nicole Stange-Thomann,Charles W. Sugnet,Mikita Suyama,Glenn Tesler,Johanna Thompson,David Torrents,Evanne Trevaskis,John Tromp,Catherine Ucla,Abel Ureta-Vidal,Jade P. Vinson,Andrew von Niederhausern,Claire M. Wade,Melanie M. Wall,R. J. Weber,Robert B. Weiss,Michael C. Wendl,Anthony P. West,Kris A. Wetterstrand,Raymond Wheeler,Simon Whelan,Jamey Wierzbowski,David Willey,Sophie Williams,Richard K. Wilson,Eitan E. Winter,Kim C. Worley,Dudley Wyman,Shan Yang,Shiaw Pyng Yang,Evgeny M. Zdobnov,Michael C. Zody,Eric S. Lander +222 more
TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
Alex Graves,Santiago Fernández,Faustino Gomez,Jürgen Schmidhuber +3 more
- 25 Jun 2006
TL;DR: This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.
6.8K
Versatile and open software for comparing large genomes
Stefan Kurtz,Adam M. Phillippy,Arthur L. Delcher,Michael E. Smoot,Martin Shumway,Corina Antonescu,Steven L. Salzberg +6 more
TL;DR: The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes.
Fast and accurate de novo genome assembly from long uncorrected reads
TL;DR: It is shown that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment-based, stand-alone consensus module called Racon.