Journal Article10.1101/GR.264879.120
Optimized sample selection for cost-efficient long-read population sequencing.
Timothy Rhyker Ranallo-Benavidez,Zachary H. Lemmon,Sebastian Soyk,Sergey Aganezov,William J Salerno,Rajiv C. McCoy,Zachary B. Lippman,Michael C. Schatz,Fritz J. Sedlazeck +8 more
6
TL;DR: SVCollector as discussed by the authors identifies the optimal subset of individuals for resequencing by analyzing population-level VCF files from low-resolution genotyping studies and then computes a ranked list of samples that maximizes the total number of variants present within a subset of a given size.
read more
Abstract: An increasingly important scenario in population genetics is when a large cohort has been genotyped using a low-resolution approach (e.g., microarrays, exome capture, short-read WGS), from which a few individuals are resequenced using a more comprehensive approach, especially long-read sequencing. The subset of individuals selected should ensure that the captured genetic diversity is fully representative and includes variants across all subpopulations. For example, human variation has historically focused on individuals with European ancestry, but this represents a small fraction of the overall diversity. Addressing this, SVCollector identifies the optimal subset of individuals for resequencing by analyzing population-level VCF files from low-resolution genotyping studies. It then computes a ranked list of samples that maximizes the total number of variants present within a subset of a given size. To solve this optimization problem, SVCollector implements a fast, greedy heuristic and an exact algorithm using integer linear programming. We apply SVCollector on simulated data, 2504 human genomes from the 1000 Genomes Project, and 3024 genomes from the 3000 Rice Genomes Project and show the rankings it computes are more representative than alternative naive strategies. When selecting an optimal subset of 100 samples in these cohorts, SVCollector identifies individuals from every subpopulation, whereas naive methods yield an unbalanced selection. Finally, we show the number of variants present in cohorts selected using this approach follows a power-law distribution that is naturally related to the population genetic concept of the allele frequency spectrum, allowing us to estimate the diversity present with increasing numbers of samples.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Towards population-scale long-read sequencing.
TL;DR: Sedlazeck et al. as discussed by the authors survey recent developments in population-scale long-read sequencing, highlight potential challenges of a scaled-up approach and provide guidance regarding experimental design.
Jasmine and Iris: population-scale structural variant comparison and analysis
Melanie Kirsche,Gautam Prabhu,Rachel M. Sherman,Bohan Ni,Alexis Battle,Sergey Aganezov,Michael C. Schatz +6 more
TL;DR: An optimized pipeline for improved inference and analysis of structural variants (SVs) has been developed, which uses Iris for refining breakpoints and sequences, and Jasmine for comparing SV calls at population scale, and reveals a set of high-confidence de novo SVs confirmed by multiple technologies.
95
Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space
Michael C. Schatz,Anthony Philippakis,Enis Afgan,Eric Banks,Vincent J. Carey,Robert J. Carroll,Alessandro Culotti,Kyle Ellrott,Jeremy Goecks,Robert L. Grossman,Ira M. Hall,Kasper D. Hansen,Jonathan Lawson,Jeffrey T. Leek,Anne O'Donnell‐Luria,Stephen Mosher,Martin Morgan,Anton Nekrutenko,Brian D. O’Connor,Kevin Osborn,Benedict Paten,Candace Patterson,Frederick J. Tan,Casey Overby Taylor,Jennifer Vessio,Levi Waldron,Ting Wang,Kristin Wuichet,Alexander Baumann,Andrew Rula,Anton Kovalsy,C. Bernard,Derek Caetano-Anollés,Géraldine A. Van der Auwera,Justin Canas,K. Ümit Yüksel,Kate Herman,Megan Taylor,Marianie Simeon,Michaël Baumann,Qi Wang,Robert Title,Ruchi Munshi,Sushma Chaluvadi,Valerie B Reeves,William Disman,Salin Thomas,Allie Hajian,Elizabeth Kiernan,Namrata Gupta,Trish Vosburg,Ludwig Geistlinger,Marcel Ramos,Sehyun Oh,Dave Rogers,Frances McDade,Mim Hastie,Nitesh Turaga,Alexander Ostrovsky,Alexandru Mahmoud,Dannon Baker,D. L. Clements,Katherine E.L. Cox,Keith Suderman,Nataliya Kucher,Sergey Golitsynskiy,Samantha Zarate,Sarah J. Wheelan,Kai Kammers,Ana Stevens,Carolyn M. Hutter,Christopher Wellington,Elena M. Ghanaim,Ken Wiley,Shurjo K. Sen,Valentina Di Francesco,Deni s Yuen,Brian Walsh,Luke Sargent,Vahid Jalili,John Chilton,Lori Shepherd,Benjamin J. Stubbs,Ash O’Farrell,Benton A. Vizzier,Charles Overbeck,Charles Reid,David Steinberg,Elizabeth A. Sheets,Julian K. Lucas,Lon Blauvelt,Louise Cabansay,Noah Warren,Brian Hannafious,Tim Harris,Radhika Reddy,Eric S. Torstenson,M. Katie Banasiewicz,Haley Abel,Jason Walker +99 more
TL;DR: AnVIL is a federated cloud platform designed to manage and store genomics data, enable population-scale analysis, and facilitate collaboration through data sharing. It eliminates the need for data movement while adding security measures and providing scalable, shared computing resources.
89
Jasmine: Population-scale structural variant comparison and analysis
TL;DR: Jasmine as discussed by the authors is a fast and accurate method for structural variants refinement, comparison, and population analysis using an SV proximity graph, which outperforms five widely-used comparison methods, including reducing the rate of Mendelian discordance in three datasets by more than five-fold, and reveals a set of high confidence de novo SVs confirmed by multiple long-read technologies.
Plant pangenomes for crop improvement, biodiversity and evolution.
Mona Schreiber,Murukarthick Jayakodi,Nils Stein,Martin Mascher +3 more
TL;DR: Pangenomes are applicable also in ecological and evolutionary studies, as they help classify and monitor biodiversity across the tree of life, deepen the understanding of how plant species diverged and show how plants adapt to changing environments or new selection pressures exerted by human beings.
31
References
A global reference for human genetic variation.
Adam Auton,Gonçalo R. Abecasis,David Altshuler,Richard Durbin,David R. Bentley,Aravinda Chakravarti,Andrew G. Clark,Peter Donnelly,Evan E. Eichler,Paul Flicek,Stacey Gabriel,Richard A. Gibbs,Eric D. Green,Matthew E. Hurles,Bartha Maria Knoppers,Jan O. Korbel,Eric S. Lander,Charles Lee,Hans Lehrach,Elaine R. Mardis,Gabor T. Marth,Gil McVean,Deborah A. Nickerson,Jeanette Schmidt,Stephen T. Sherry,Jun Wang,Richard K. Wilson,Eric Boerwinkle,Harsha Doddapaneni,Yi Han,Viktoriya Korchina,Christie Kovar,Sandra L. Lee,Donna M. Muzny,Jeffrey G. Reid,Yiming Zhu,Yuqi Chang,Qiang Feng,Qiang Feng,Xiaodong Fang,Xiaodong Fang,Xiaosen Guo,Xiaosen Guo,Min Jian,Min Jian,Hui Jiang,Hui Jiang,Xin Jin,Tianming Lan,Guoqing Li,Jingxiang Li,Yingrui Li,Shengmao Liu,Xiao Liu,Xiao Liu,Yao Lu,Xuedi Ma,Meifang Tang,Bo Wang,Guangbiao Wang,Honglong Wu,Renhua Wu,Xun Xu,Ye Yin,Dandan Zhang,Wenwei Zhang,Jiao Zhao,Meiru Zhao,Xiaole Zheng,Namrata Gupta,Neda Gharani,Lorraine Toji,Norman P. Gerry,Alissa M. Resch,Jonathan Barker,Laura Clarke,Laurent Gil,Sarah E. Hunt,Gavin Kelman,Eugene Kulesha,Rasko Leinonen,William M. McLaren,Rajesh Radhakrishnan,Asier Roa,Dmitriy Smirnov,Richard Smith,Ian Streeter,Anja Thormann,Iliana Toneva,Brendan Vaughan,Xiangqun Zheng-Bradley,Russell J. Grocock,Sean Humphray,Terena James,Zoya Kingsbury,Ralf Sudbrak,M. Albrecht,Vyacheslav Amstislavskiy,Tatiana A. Borodina,Matthias Lienhard,Florian Mertes,Marc Sultan,Bernd Timmermann,Marie-Laure Yaspo,Lucinda Fulton,Victor Ananiev,Zinaida Belaia,Dimitriy Beloslyudtsev,Nathan Bouk,Chao Chen,Deanna M. Church,Robert M. Cohen,Charles Cook,John Garner,Timothy Hefferon,Mikhail Kimelman,Chunlei Liu,John Lopez,Peter Meric,Chris O’Sullivan,Yuri Ostapchuk,Lon Phan,Sergiy Ponomarov,Valerie A. Schneider,Eugene Shekhtman,Karl Sirotkin,Douglas J. Slotta,Hua Zhang,Senduran Balasubramaniam,John Burton,Petr Danecek,Thomas M. Keane,Anja Kolb-Kokocinski,Shane A. McCarthy,James Stalker,Michael A. Quail,Christopher Davies,Jeremy Gollub,Teresa Webster,Brant Wong,Yiping Zhan,Christopher L. Campbell,Yu Kong,Anthony Marcketta,Fuli Yu,Lilian Antunes,Matthew N. Bainbridge,Aniko Sabo,Zhuoyi Huang,Lachlan J. M. Coin,Lin Fang,Lin Fang,Qibin Li,Zhenyu Li,Haoxiang Lin,Binghang Liu,Ruibang Luo,Haojing Shao,Haojing Shao,Yinlong Xie,Chen Ye,Chang Yu,Fan Zhang,Hancheng Zheng,Zhu Hongmei,Can Alkan,Elif Dal,Fatma Kahveci,Erik Garrison,Deniz Kural,Wan-Ping Lee,Wen Fung Leong,Michael Strömberg,Alistair Ward,Jiantao Wu,Mengyao Zhang,Mark J. Daly,Mark A. DePristo,Robert E. Handsaker,Robert E. Handsaker,Eric Banks,Gaurav Bhatia,Guillermo del Angel,Giulio Genovese,Heng Li,Seva Kashin,Seva Kashin,Steven A. McCarroll,Steven A. McCarroll,James Nemesh,Ryan Poplin,Seungtai Yoon,Jayon Lihm,Vladimir Makarov,Srikanth Gottipati,Alon Keinan,Juan L. Rodriguez-Flores,Tobias Rausch,Markus Hsi-Yang Fritz,Adrian M. Stütz,Kathryn Beal,Avik Datta,Javier Herrero,Graham R. S. Ritchie,Daniel R. Zerbino,Pardis C. Sabeti,Pardis C. Sabeti,Ilya Shlyakhter,Ilya Shlyakhter,Stephen F. Schaffner,Stephen F. Schaffner,Joseph J. Vitti,Joseph J. Vitti,David Neil Cooper,Edward V. Ball,Peter D. Stenson,Bret Barnes,Markus J. Bauer,R. Keira Cheetham,Anthony J. Cox,Michael A. Eberle,Scott Kahn,Lisa Murray,John F. Peden,Richard Shaw,Eimear E. Kenny,Mark A. Batzer,Miriam K. Konkel,Jerilyn A. Walker,Daniel G. MacArthur,Monkol Lek,Ralf Herwig,Li Ding,Daniel C. Koboldt,David E. Larson,Kai Ye,Simon Gravel,Anand Swaroop,Emily Y. Chew,Tuuli Lappalainen,Yaniv Erlich,Melissa Gymrek,Melissa Gymrek,Thomas Willems,Jared T. Simpson,Mark D. Shriver,Jeffrey A. Rosenfeld,Carlos Bustamante,Stephen B. Montgomery,Francisco M. De La Vega,Jake K. Byrnes,Andrew Carroll,Marianne K. DeGorter,Phil Lacroute,Brian K. Maples,Alicia R. Martin,Andrés Moreno-Estrada,Andrés Moreno-Estrada,Suyash Shringarpure,Fouad Zakharia,Eran Halperin,Eran Halperin,Yael Baran,Eliza Cerveira,Jaeho Hwang,Ankit Malhotra,Dariusz Plewczynski,Kamen Radew,Mallory Romanovitch,Chengsheng Zhang,Fiona Hyland,David Craig,Alexis Christoforides,Nils Homer,Tyler Izatt,Ahmet Kurdoglu,Shripad Sinari,Kevin Squire,Chunlin Xiao,Jonathan Sebat,Danny Antaki,Madhusudan Gujral,Amina Noor,Kenny Ye,Esteban G. Burchard,Ryan D. Hernandez,Christopher R. Gignoux,David Haussler,David Haussler,Sol Katzman,W. James Kent,Bryan Howie,Andres Ruiz-Linares,Emmanouil T. Dermitzakis,Emmanouil T. Dermitzakis,Scott E. Devine,Hyun Min Kang,Jeffrey M. Kidd,Thomas W. Blackwell,Sean Caron,Wei Chen,S. Emery,Lars G. Fritsche,Christian Fuchsberger,Goo Jun,Goo Jun,Bingshan Li,Robert H. Lyons,Chris Scheller,Carlo Sidore,Carlo Sidore,Carlo Sidore,Shiya Song,Elzbieta Sliwerska,Daniel Taliun,Adrian Tan,Ryan P. Welch,Mary Kate Wing,Xiaowei Zhan,Philip Awadalla,Philip Awadalla,Alan Hodgkinson,Yun Li,Xinghua Shi,Andrew Quitadamo,Gerton Lunter,Jonathan Marchini,Simon Myers,Claire Churchhouse,Olivier Delaneau,Olivier Delaneau,Anjali Gupta-Hinch,Warren W. Kretzschmar,Zamin Iqbal,Iain Mathieson,Androniki Menelaou,Androniki Menelaou,Andy Rimmer,Dionysia Kiara Xifara,Taras K. Oleksyk,Yunxin Fu,Xiaoming Liu,Momiao Xiong,Lynn B. Jorde,David J. Witherspoon,Jinchuan Xing,Brian L. Browning,Sharon R. Browning,Fereydoun Hormozdiari,Peter H. Sudmant,Ekta Khurana,Chris Tyler-Smith,Cornelis A. Albers,Qasim Ayub,Yuan Chen,Vincenza Colonna,Vincenza Colonna,Luke Jostins,Klaudia Walter,Yali Xue,Mark Gerstein,Alexej Abyzov,Suganthi Balasubramanian,Jieming Chen,Declan Clarke,Yao Fu,Arif Harmanci,Mike Jin,Dong-Hoon Lee,Jeremy Liu,Xinmeng Jasmine Mu,Xinmeng Jasmine Mu,Jing Zhang,Yan Zhang,Christopher Hartl,Khalid Shakir,Jeremiah D. Degenhardt,Sascha Meiers,Benjamin Raeder,Francesco Paolo Casale,Oliver Stegle,Eric-Wubbo Lameijer,Ira M. Hall,Vineet Bafna,Jacob J. Michaelson,Eugene J. Gardner,Ryan E. Mills,Gargi Dayama,Ken Chen,Xian Fan,Zechen Chong,Tenghui Chen,Mark Chaisson,John Huddleston,Maika Malig,Bradley J. Nelson,Nicholas F. Parrish,Ben Blackburne,Sarah J. Lindsay,Zemin Ning,Yujun Zhang,Hugo Y. K. Lam,Cristina Sisu,Danny Challis,Uday S. Evani,James T. Lu,Uma Nagaswamy,Jin Yu,Wangshen Li,Lukas Habegger,Haiyuan Yu,Fiona Cunningham,Ian Dunham,Kasper Lage,Kasper Lage,Jakob Berg Jespersen,Jakob Berg Jespersen,Jakob Berg Jespersen,Heiko Horn,Heiko Horn,Donghoon Kim,Rob DeSalle,Apurva Narechania,Melissa A. Wilson Sayres,Fernando L. Mendez,G. David Poznik,Peter A. Underhill,David Mittelman,Ruby Banerjee,Maria Cerezo,Thomas W. Fitzgerald,Sandra Louzada,Andrea Massaia,Fengtang Yang,Divya Kalra,Walker Hale,Xu Dan,Kathleen C. Barnes,Christine Beiswanger,Hongyu Cai,Hongzhi Cao,Hongzhi Cao,Brenna M. Henn,Danielle Jones,Jane Kaye,Alastair Kent,Angeliki Kerasidou,Rasika A. Mathias,Pilar N. Ossorio,Michael Parker,Charles N. Rotimi,Charmaine D.M. Royal,Karla Sandoval,Yeyang Su,Zhongming Tian,Sarah A. Tishkoff,Marc Via,Yuhong Wang,Huanming Yang,Ling Yang,Jiayong Zhu,Walter F. Bodmer,Gabriel Bedoya,Zhiming Cai,Yang Gao,Jiayou Chu,Leena Peltonen,Andrés C. García-Montero,Alberto Orfao,Julie Dutil,Juan Carlos Martínez-Cruzado,R. Mathias,Anselm Hennis,Harold Watson,Colin A. McKenzie,Firdausi Qadri,Regina C. LaRocque,Xiaoyan Deng,Danny Asogun,Onikepe A. Folarin,Christian T. Happi,Omonwunmi Omoniwa,Matt Stremlau,Matt Stremlau,Ridhi Tariyal,Ridhi Tariyal,M Jallow,M Jallow,Fatoumatta Sisay Joof,Fatoumatta Sisay Joof,Tumani Corrah,Tumani Corrah,Kirk A. Rockett,Kirk A. Rockett,Dominic P. Kwiatkowski,Dominic P. Kwiatkowski,Jaspal S. Kooner,Tran Tinh Hien,Sarah J. Dunstan,Sarah J. Dunstan,Nguyen ThuyHang,Richard Fonnie,Robert F. Garry,Lansana Kanneh,Lina M. Moses,John S. Schieffelin,Donald S. Grant,Carla Gallo,Giovanni Poletti,Danish Saleheen,Asif Rasheed,Lisa D. Brooks,Adam Felsenfeld,Jean E. McEwen,Yekaterina Vaydylevich,Audrey Duncanson,Michael Dunn,Jeffery A. Schloss +517 more
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
16.8K
Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.
TL;DR: The relationship between the two estimates of genetic variation at the DNA level, namely the number of segregating sites and the average number of nucleotide differences estimated from pairwise comparison, is investigated in this article.
Mathematical model for studying genetic variation in terms of restriction endonucleases
Masatoshi Nei,Wen-Hsiung Li +1 more
TL;DR: A mathematical model for the evolutionary change of restriction sites in mitochondrial DNA is developed and a measure called "nucleotide diversity" is proposed to express the degree of polymorphism in a population at the nucleotide level.
11K
The Levenberg-Marquardt algorithm: Implementation and theory
Jorge J. Moré
- 01 Jan 1978
TL;DR: A conduit arrangement for a tilt cylinder of a bulldozer comprises a trunnion having a hole to be connected to the hole in a truck frame and a conduit adapted to beconnected through the frame to the tilt cylinder, the end of said conduit being connected by means of a coupling to a stationary conduit.
On the number of segregating sites in genetical models without recombination.
TL;DR: The distribution is obtained for the number of segregating sites observed in a sample from a population which is subject to recurring, new, mutations but not subject to recombination, and applies approximately to three population models.
4.1K
Related Papers (5)
[...]
Hannes P. Eggertsson,Hannes P. Eggertsson,Snaedis Kristmundsdottir,Snaedis Kristmundsdottir,Doruk Beyter,Hakon Jonsson,Astros Skuladottir,Marteinn T. Hardarson,Daniel F. Gudbjartsson,Daniel F. Gudbjartsson,Kari Stefansson,Kari Stefansson,Bjarni V. Halldorsson,Bjarni V. Halldorsson,Páll Melsted,Páll Melsted +15 more