Performance of neural network basecalling tools for Oxford Nanopore sequencing.
TL;DR: The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance, and users should consider producing a custom model using a larger neural network and/or training data from the same species.
read more
Abstract: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences (‘polishing’) with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT’s Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Machine learning and computation-enabled intelligent sensor design
TL;DR: A new generation of computational sensing systems that reduce the data burden while also improving sensing capabilities are envisioned, enabling low-cost and compact sensor implementations engineered through iterative analysis of data-driven sensing outcomes.
Host and pathogen response to bacteriophage engineered against Mycobacterium abscessus lung infection
Jerry A. Nick,Rebekah M. Dedrick,Alice L. Gray,E. Vladar,Bailey E. Smith,K. Freeman,Kenneth C. Malcolm,L. Elaine Epperson,Nabeeh A. Hasan,Jordi Hendrix,Kimberly Callahan,Kendra N. Walton,Brian Vestal,Emily C. Wheeler,Noel Rysavy,Katie R. Poch,Silvia M. Caceres,V. Lovell,Katherine B. Hisert,Vinicius Calado Nogueira de Moura,Delphi Chatterjee,Prithwiraj De,Natalia Weakly,Stacey L. Martiniano,David A. Lynch,Charles L. Daley,Michael J. Strong,Fan Jia,Graham F. Hatfull,Rebecca M. Davidson +29 more
TL;DR: The course and associated markers of a successful phage treatment of M. abscessus in advanced lung disease are described and evidence of phage-induced lysis was observed using molecular and metabolic assays combined with clinical assessments.
162
scIGANs: single-cell RNA-seq imputation using generative adversarial networks
TL;DR: This work demonstrated in many ways with compelling evidence that scIGANs is not only an application of GANs in omics data but also represents a competing imputation method for the scRNA-seq data.
Analysis of short tandem repeat expansions and their methylation state with nanopore sequencing
Pay Giesselmann,Björn Brändl,Etienne Raimondeau,Rebecca Bowen,Christian Rohrandt,Rashmi Tandon,Helene Kretzmer,Günter Assum,Christina Galonska,Reiner Siebert,Ole Ammerpohl,Andrew John Heron,Susanne A. Schneider,Julia Ladewig,Philipp Koch,Bernhard Schuldt,James E. Graham,Alexander Meissner,Alexander Meissner,Franz-Josef Müller +19 more
TL;DR: This work demonstrates the precise quantification of repeat numbers in conjunction with the determination of CpG methylation states in the repeat expansion and in adjacent regions at the single-molecule level without amplification.
MARS: discovering novel cell types across heterogeneous single-cell experiments.
Maria Brbic,Marinka Zitnik,Sheng Wang,Angela Oliveira Pisco,Russ B. Altman,Spyros Darmanis,Jure Leskovec +6 more
TL;DR: The method has a unique ability to discover cell types that have never been seen before and annotate experiments that are as yet unannotated and overcomes the heterogeneity of cell types by transferring latent cell representations across multiple datasets.
References
Minimap2: pairwise alignment for nucleotide sequences
TL;DR: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database and is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mapper at higher accuracy, surpassing most aligners specialized in one type of alignment.
Initial sequencing and comparative analysis of the mouse genome.
Robert H. Waterston,Kerstin Lindblad-Toh,Ewan Birney,Jane Rogers,Josep F. Abril,Pankaj K. Agarwal,Richa Agarwala,Rachel Ainscough,Marina Alexandersson,Peter An,Stylianos E. Antonarakis,John Attwood,Robert Baertsch,J Bailey,K F Barlow,Stephan Beck,Eric Berry,Bruce W. Birren,Toby Bloom,Peer Bork,Marc Botcherby,Nicolas Bray,Michael R. Brent,Daniel G. Brown,Daniel G. Brown,Stephen D. Brown,Carol J. Bult,John Burton,Jonathan Butler,R. D. Campbell,Piero Carninci,Simon Cawley,Francesca Chiaromonte,Asif T. Chinwalla,Deanna M. Church,Michele Clamp,C M Clee,Francis S. Collins,Lisa Cook,Richard R. Copley,Alan Coulson,Olivier Couronne,James Cuff,Val Curwen,Tim Cutts,Mark J. Daly,Robert David,Joy Davies,Kimberly D. Delehaunty,Justin Deri,Emmanouil T. Dermitzakis,Colin N. Dewey,Nicholas J. Dickens,Mark Diekhans,Sheila Dodge,Inna Dubchak,Diane M. Dunn,Sean R. Eddy,Laura Elnitski,Richard D. Emes,Pallavi Eswara,Eduardo Eyras,Adam Felsenfeld,Ginger A. Fewell,Paul Flicek,Karen Foley,Wayne N. Frankel,Lucinda Fulton,Robert S. Fulton,Terrence S. Furey,Diane Gage,Richard A. Gibbs,Gustavo Glusman,Sante Gnerre,Nick Goldman,Leo Goodstadt,Darren Grafham,Tina Graves,Eric D. Green,Simon G. Gregory,Roderic Guigó,Mark S. Guyer,Ross C. Hardison,David Haussler,Yoshihide Hayashizaki,Deana W. LaHillier,Angela S. Hinrichs,Wratko Hlavina,Timothy Holzer,Fan Hsu,Axin Hua,Tim Hubbard,Adrienne Hunt,Ian J. Jackson,David B. Jaffe,L. Steven Johnson,Matthew Jones,Thomas A. Jones,A Joy,Michael Kamal,Elinor K. Karlsson,Donna Karolchik,Arkadiusz Kasprzyk,Jun Kawai,Evan Keibler,Cristyn Kells,W. James Kent,Andrew Kirby,Diana L. Kolbe,Ian F Korf,Raju Kucherlapati,Edward J. Kulbokas,David Kulp,Tom Landers,J. P. Leger,Steven Leonard,Ivica Letunic,Rosie Levine,Jia Li,Ming Li,Christine Lloyd,Susan Lucas,Bin Ma,Donna Maglott,Elaine R. Mardis,Lucy Matthews,Evan Mauceli,John Mayer,Megan McCarthy,W. Richard McCombie,Stuart McLaren,Kirsten McLay,John Douglas Mcpherson,James Meldrim,Beverley Meredith,Jill P. Mesirov,Webb Miller,Tracie L. Miner,Emmanuel Mongin,Kate Montgomery,Michael J. Morgan,Richard Mott,James C. Mullikin,Donna M. Muzny,William E. Nash,Joanne O. Nelson,Michael N. Nhan,Robert Nicol,Zemin Ning,Chad Nusbaum,Michael J. O’Connor,Yasushi Okazaki,Karen Oliver,Emma Overton-Larty,Lior Pachter,Genís Parra,Kymberlie H. Pepin,Jane Peterson,Pavel A. Pevzner,Robert W. Plumb,Craig Pohl,Alex Poliakov,Tracy C. Ponce,Chris P. Ponting,Simon C. Potter,Michael A. Quail,Alexandre Reymond,Bruce A. Roe,Krishna M. Roskin,Edward M. Rubin,Alistair G. Rust,Ralph Santos,Victor Sapojnikov,Brian Schultz,Jörg Schultz,Matthias S. Schwartz,Scott Schwartz,Carol Scott,Steven Seaman,Steve Searle,Ted Sharpe,Andrew Sheridan,Ratna Shownkeen,Sarah Sims,Jonathan Singer,Guy Slater,Arian F.A. Smit,Douglas Smith,Brian Spencer,Arne Stabenau,Nicole Stange-Thomann,Charles W. Sugnet,Mikita Suyama,Glenn Tesler,Johanna Thompson,David Torrents,Evanne Trevaskis,John Tromp,Catherine Ucla,Abel Ureta-Vidal,Jade P. Vinson,Andrew von Niederhausern,Claire M. Wade,Melanie M. Wall,R. J. Weber,Robert B. Weiss,Michael C. Wendl,Anthony P. West,Kris A. Wetterstrand,Raymond Wheeler,Simon Whelan,Jamey Wierzbowski,David Willey,Sophie Williams,Richard K. Wilson,Eitan E. Winter,Kim C. Worley,Dudley Wyman,Shan Yang,Shiaw Pyng Yang,Evgeny M. Zdobnov,Michael C. Zody,Eric S. Lander +222 more
TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
Alex Graves,Santiago Fernández,Faustino Gomez,Jürgen Schmidhuber +3 more
- 25 Jun 2006
TL;DR: This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.
6.8K
Versatile and open software for comparing large genomes
Stefan Kurtz,Adam M. Phillippy,Arthur L. Delcher,Michael E. Smoot,Martin Shumway,Corina Antonescu,Steven L. Salzberg +6 more
TL;DR: The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes.
Fast and accurate de novo genome assembly from long uncorrected reads
TL;DR: It is shown that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment-based, stand-alone consensus module called Racon.