BlobToolKit - Interactive Quality Assessment of Genome Assemblies.
Richard Challis,Richard Challis,Edward Richards,Jeena Ragan,Guy Cochrane,Mark Blaxter,Mark Blaxter +6 more
1.4K
TL;DR: BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies, is presented, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the browser-based Viewer.
read more
Abstract: Reconstruction of target genomes from sequence data produced by instruments that are agnostic as to the species-of-origin may be confounded by contaminant DNA. Whether introduced during sample processing or through co-extraction alongside the target DNA, if insufficient care is taken during the assembly process, the final assembled genome may be a mixture of data from several species. Such assemblies can confound sequence-based biological inference and, when deposited in public databases, may be included in downstream analyses by users unaware of underlying problems. We present BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies. BlobToolKit can be used to process assembly, read and analysis files for fully reproducible interactive exploration in the browser-based Viewer. BlobToolKit can be used during assembly to filter non-target DNA, helping researchers produce assemblies with high biological credibility. We have been running an automated BlobToolKit pipeline on eukaryotic assemblies publicly available in the International Nucleotide Sequence Data Collaboration and are making the results available through a public instance of the Viewer at https://blobtoolkit.genomehubs.org/view We aim to complete analysis of all publicly available genomes and then maintain currency with the flow of new genomes. We have worked to embed these views into the presentation of genome assemblies at the European Nucleotide Archive, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the Viewer.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Significantly improving the quality of genome assemblies through curation
Kerstin Howe,William Chow,Joanna Collins,Sarah Pelan,Damon-Lee Pointon,Ying Sims,James Torrance,Alan Tracey,Jonathan Wood +8 more
TL;DR: In this paper, a tried and tested approach for genome curation using gEVAL, the genome evaluation browser, is described and recommended for assembly curation in a GEVAL-independent context to facilitate the uptake of genome curations in the wider community.
1.1K
Significantly improving the quality of genome assemblies through curation
Kerstin Howe,William Chow,Joanna Collins,Sarah Pelan,Damon-Lee Pointon,Ying Sims,James Torrance,Alan Tracey,Jonathan Wood +8 more
TL;DR: This work describes the tried and tested approach for assembly curation using gEVAL, the genome evaluation browser, and outlines the procedures applied to genome curations using g EVAL and also outlines the recommendations for assemblyCuration in an gevAL-independent context to facilitate the uptake of genome curation in the wider community.
941
Sequence locally, think globally: The Darwin Tree of Life Project
TL;DR: The Earth Biogenome Project (EBGP) as mentioned in this paper is a global effort to sequence the genomes of all eukaryotic life on earth, which is as daunting as they are ambitious.
292
Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life
TL;DR: Genomes on a Tree (GoaT) as discussed by the authors is an Elasticsearch-powered datastore and search index for genome-relevant metadata and sequencing project plans and statuses.
186
Massive gene presence-absence variation shapes an open pan-genome in the Mediterranean mussel.
Marco Gerdol,Rebeca Moreira,Fernando Cruz,Jèssica Gómez-Garrido,Anna Vlasova,Umberto Rosani,Paola Venier,Miguel A. Naranjo-Ortiz,Maria Murgarella,Samuele Greco,Pablo Balseiro,André Corvelo,Leonor Frias,Marta Gut,Toni Gabaldón,Alberto Pallavicini,Carlos Canchaya,Beatriz Novoa,Tyler Alioto,David Posada,Antonio Figueras +20 more
TL;DR: This is the first study to report the widespread occurrence of gene presence-absence variation at a whole-genome scale in the animal kingdom and indicates dispensable genes usually belong to young and recently expanded gene families enriched in survival functions, which might be the key to explain the resilience and invasiveness of this species.
References
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
TL;DR: The National Center for Biotechnology Information Reference Sequence (RefSeq) database provides a non-redundant collection of sequences representing genomic data, transcripts and proteins that pragmatically includes sequence data that are currently publicly available in the archival databases.
4.8K
Reagent and laboratory contamination can critically impact sequence-based microbiome analyses
Susannah J. Salter,Michael J. Cox,Elena M. Turek,Szymon T. Calus,William O.C.M. Cookson,Miriam F. Moffatt,Paul Turner,Paul Turner,Julian Parkhill,Nicholas J. Loman,Alan W. Walker,Alan W. Walker +11 more
TL;DR: It is demonstrated that contaminating DNA is ubiquitous in commonly used DNA extraction kits and other laboratory reagents, varies greatly in composition between different kits and kit batches, and that this contamination critically impacts results obtained from samples containing a low microbial biomass.
Wolbachia: master manipulators of invertebrate biology.
TL;DR: The basic biology of Wolbachia is reviewed, with emphasis on recent advances in the authors' understanding of these fascinating endosymbionts, which are found in arthropods and nematodes.
2.7K
Snakemake--a scalable bioinformatics workflow engine.
Johannes Köster,Sven Rahmann +1 more
TL;DR: Snakemake is a workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow.
Genome sequence of the Brown Norway rat yields insights into mammalian evolution
Richard A. Gibbs,George M. Weinstock,Michael L. Metzker,Donna M. Muzny,Erica Sodergren,Steven E. Scherer,Graham R. Scott,David Steffen,Kim C. Worley,Paula E. Burch,Geoffrey Okwuonu,Sandra Hines,Lora Lewis,Christine Deramo,Oliver Delgado,Shannon Dugan-Rocha,George Miner,Margaret Morgan,Alicia Hawes,Rachel Gill,Robert A. Holt,Robert A. Holt,Mark Raymond Adams,Mark Raymond Adams,Peter G. Amanatides,Peter G. Amanatides,Holly Baden-Tillson,Holly Baden-Tillson,Mary Barnstead,Soo H. Chin,Cheryl A. Evans,Steve Ferriera,Steve Ferriera,Carl Fosler,A. Glodek,Zhiping Gu,Don Jennings,Cheryl L. Kraft,Cheryl L. Kraft,Trixie Nguyen,Cynthia Pfannkoch,Cynthia Pfannkoch,Cynthia Sitter,Granger G. Sutton,J. Craig Venter,J. Craig Venter,Trevor Woodage,Douglas Smith,Hong Mei Lee,Erik Gustafson,Patrick Cahill,Arnold Kana,Lynn Doucette-Stamm,Keith Weinstock,Kim Fechtel,Robert B. Weiss,Diane M. Dunn,Eric D. Green,Robert W. Blakesley,Gerard G. Bouffard,Pieter J. de Jong,Kazutoyo Osoegawa,Baoli Zhu,Marco A. Marra,Jacqueline E. Schein,Ian Bosdet,Chris Fjell,Steven J.M. Jones,Martin Krzywinski,Carrie Mathewson,Asim Sarosh Siddiqui,Natasja Wye,John Douglas Mcpherson,John Douglas Mcpherson,Shaying Zhao,Claire M. Fraser,Jyoti Shetty,Sofiya Shatsman,Keita Geer,Yixin Chen,Sofyia Abramzon,William C. Nierman,Paul Havlak,Rui Chen,K. James Durbin,Amy Egan,Yanru Ren,Xing Zhi Song,Bingshan Li,Yue Liu,Xiang Qin,Simon Cawley,Austin J. Cooney,Lisa M. D'Souza,Kirt Martin,Jia Qian Wu,Manuel L. Gonzalez-Garay,Andrew R. Jackson,Kenneth J. Kalafus,Michael P. McLeod,Aleksandar Milosavljevic,Davinder Virk,Andrei Volkov,David A. Wheeler,Zhengdong D. Zhang,Jeffrey A. Bailey,Evan E. Eichler,Eray Tüzün,Ewan Birney,Emmanuel Mongin,Abel Ureta-Vidal,Cara Woodwark,Evgeny M. Zdobnov,Peer Bork,Mikita Suyama,David Torrents,Marina Alexandersson,Barbara J. Trask,Janet M. Young,Hui Huang,Huajun Wang,Heming Xing,Sue Daniels,Darryl Gietzen,Jeanette Schmidt,Kristian Stevens,Ursula Vitt,Jim Wingrove,Francisco Camara,M. Mar Albà,Josep F. Abril,Roderic Guigó,Arian F.A. Smit,Inna Dubchak,Inna Dubchak,Edward M. Rubin,Edward M. Rubin,Olivier Couronne,Olivier Couronne,Alexander Poliakov,Norbert Hubner,Detlev Ganten,Claudia Goesele,Claudia Goesele,Oliver Hummel,Oliver Hummel,Thomas Kreitler,Thomas Kreitler,Young-Ae Lee,Jan Monti,Herbert Schulz,Heike Zimdahl,Heinz Himmelbauer,Hans Lehrach,Howard J. Jacob,Susan Bromberg,Jo Gullings-Handley,Michael I. Jensen-Seaman,Anne E. Kwitek,Jozef Lazar,Dean Pasko,Peter J. Tonellato,Simon N. Twigger,Chris P. Ponting,Jose M. Duarte,Stephen Rice,Leo Goodstadt,Scott A. Beatson,Richard D. Emes,Eitan E. Winter,Caleb Webber,Petra Brandt,Gerald Nyakatura,Margaret Adetobi,Francesca Chiaromonte,Laura Elnitski,Pallavi Eswara,Ross C. Hardison,Minmei Hou,Diana L. Kolbe,Kateryna D. Makova,Webb Miller,Anton Nekrutenko,Cathy Riemer,Scott Schwartz,James Taylor,Shan Yang,Yi Zhang,Klaus Lindpaintner,T. Dan Andrews,Mario Caccamo,Michele Clamp,Laura Clarke,Valerie Curwen,Richard Durbin,Eduardo Eyras,Stephen M. J. Searle,Gregory M. Cooper,Serafim Batzoglou,Michael Brudno,Arend Sidow,Eric A. Stone,Bret A. Payseur,Guillaume Bourque,Carlos López-Otín,Xose S. Puente,Kushal Chakrabarti,Sourav Chatterji,Colin N. Dewey,Lior Pachter,Nicolas Bray,Von Bing Yap,Anat Caspi,San Diego Glenn Tesler,Pavel A. Pevzner,Santa Cruz David Haussler,Krishna M. Roskin,Robert Baertsch,Hiram Clawson,Terrence S. Furey,Angie S. Hinrichs,Donna Karolchik,W. J. Kent,Kate R. Rosenbloom,Heather Trumbower,Matt Weirauch,Matt Weirauch,David Neil Cooper,Peter D. Stenson,Bin Ma,Michael R. Brent,Manimozhiyan Arumugam,David Shteynberg,Richard R. Copley,Martin S. Taylor,Harold Riethman,Uma Mudunuri,Jane Peterson,Mark S. Guyer,Adam Felsenfeld,Susan Old,Stephen C. Mockrin,Francis S. Collins +242 more
TL;DR: This first comprehensive analysis of the genome sequence of the Brown Norway (BN) rat strain is reported, which is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution.