Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations.
TL;DR: DeepSequence clearly stood out, showing both the strongest correlations with DMS data and having the best ability to predict pathogenic mutations, which is especially remarkable given that it is an unsupervised method.
read more
Abstract: To deal with the huge number of novel protein-coding variants identified by genome and exome sequencing studies, many computational variant effect predictors (VEPs) have been developed. Such predictors are often trained and evaluated using different variant data sets, making a direct comparison between VEPs difficult. In this study, we use 31 previously published deep mutational scanning (DMS) experiments, which provide quantitative, independent phenotypic measurements for large numbers of single amino acid substitutions, in order to benchmark and compare 46 different VEPs. We also evaluate the ability of DMS measurements and VEPs to discriminate between pathogenic and benign missense variants. We find that DMS experiments tend to be superior to the top-ranking predictors, demonstrating the tremendous potential of DMS for identifying novel human disease mutations. Among the VEPs, DeepSequence clearly stood out, showing both the strongest correlations with DMS data and having the best ability to predict pathogenic mutations, which is especially remarkable given that it is an unsupervised method. We further recommend SNAP2, DEOGEN2, SNPs&GO, SuSPect and REVEL based upon their performance in these analyses.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A guide to machine learning for biologists.
TL;DR: Machine learning is becoming a widely used tool for the analysis of biological data as mentioned in this paper, however, proper use of machine learning methods can be challenging for experimentalists, proper application of ML methods can also be challenging, and best practices and points to consider when embarking on experiments involving machine learning are discussed.
1.1K
Accurate proteome-wide missense variant effect prediction with AlphaMissense
Jun Cheng,Guido Novati,Joshua Pan,Clare Bycroft,Akvilė Žemgulytė,Taylor Applebaum,Alexander Pritzel,Lai Hong Wong,Michal Zielinski,Tobias Sargeant,Rosalia G. Schneider,Andrew W. Senior,John M. Jumper,Demis Hassabis,Pushmeet Kohli,Žiga Avsec +15 more
TL;DR: AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population frequency databases to predict missense variant pathogenicity, achieves state-of-the-art results across a wide range of genetic and experimental benchmarks, all without explicitly training on such data.
774
Protein design and variant prediction using autoregressive generative models
Jung-Eun Shin,Adam J. Riesselman,Aaron W. Kollasch,Conor McMahon,Elana P. Simon,Chris Sander,Aashish Manglik,Andrew C. Kruse,Debora S. Marks,Debora S. Marks +9 more
TL;DR: In this article, a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments is proposed, which performs state-of-the-art prediction of missense and indel effects and successfully design and test a diverse 105-nanobody library.
Efficient evolution of human antibodies from general protein language models
Brian Hie,Arne Zuidhoek +1 more
TL;DR: This paper showed that general protein language models can efficiently evolve human antibodies by suggesting mutations that are evolutionarily plausible, despite providing the model with no information about the target antigen, binding specificity or protein structure.
Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure
TL;DR: In this paper , the protein-level effects of pathogenic missense mutations associated with different molecular mechanisms are investigated, with striking differences between recessive vs dominant, and loss of function vs non-LOF mutations, with dominant, non-loss of function disease mutations having much milder effects on protein structure, and DN mutations being highly enriched at protein interfaces.
148
References
•Journal Article
Dropout: a simple way to prevent neural networks from overfitting
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
A method and server for predicting damaging missense mutations.
Ivan Adzhubei,Steffen Schmidt,Leonid Peshkin,Vasily Ramensky,Anna Gerasimova,Peer Bork,Alexey S. Kondrashov,Shamil R. Sunyaev +7 more
TL;DR: A new method and the corresponding software tool, PolyPhen-2, which is different from the early tool polyPhen1 in the set of predictive features, alignment pipeline, and the method of classification is presented and performance, as presented by its receiver operating characteristic curves, was consistently superior.
Amino acid substitution matrices from protein blocks
TL;DR: This work has derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins, leading to marked improvements in alignments and in searches using queries from each of the groups.
7.2K
CADD: predicting the deleteriousness of variants throughout the human genome.
Philipp Rentzsch,Daniela Witten,Gregory M. Cooper,Jay Shendure,Martin Kircher,Martin Kircher +5 more
TL;DR: The latest updates to CADD are reviewed, including the most recent version, 1.4, which supports the human genome build GRCh38, and also present updates to the website that include simplified variant lookup, extended documentation, an Application Program Interface and improved mechanisms for integrating CADD scores into other tools or applications.
3.2K
ClinVar: public archive of relationships among sequence variation and human phenotype
Melissa J. Landrum,Jennifer M. Lee,George R. Riley,Wonhee Jang,Wendy S. Rubinstein,Deanna M. Church,Donna Maglott +6 more
TL;DR: To facilitate evaluation of the medical importance of each variant, ClinVar aggregates submissions with the same variation/phenotype combination, adds value from other NCBI databases, assigns a distinct accession of the format RCV000000000.0 and reports if there are conflicting clinical interpretations.
Related Papers (5)
Konrad J. Karczewski,Laurent C. Francioli,Grace Tiao,Beryl B. Cummings,Jessica Alföldi,Qingbo Wang,Ryan L. Collins,Kristen M. Laricchia,Andrea Ganna,Daniel P. Birnbaum,Laura D. Gauthier,Harrison Brand,Matthew Solomonson,Nicholas A. Watts,Daniel R. Rhodes,Moriel Singer-Berk,Eleina M. England,Eleanor G. Seaby,Jack A. Kosmicki,Raymond K. Walters,Katherine Tashman,Yossi Farjoun,Eric Banks,Timothy Poterba,Arcturus Wang,Cotton Seed,Nicola Whiffin,Jessica X. Chong,Kaitlin E. Samocha,Emma Pierce-Hoffman,Zachary Zappala,Anne H. O’Donnell-Luria,Eric Vallabh Minikel,Ben Weisburd,Monkol Lek,James S. Ware,Christopher Vittal,Irina M. Armean,Louis Bergelson,Kristian Cibulskis,Kristen M. Connolly,Miguel Covarrubias,Stacey Donnelly,Steven Ferriera,Stacey Gabriel,Jeff Gentry,Namrata Gupta,Thibault Jeandet,Diane Kaplan,Christopher Llanwarne,Ruchi Munshi,Sam Novod,Nikelle Petrillo,David Roazen,Valentin Ruano-Rubio,Andrea Saltzman,Molly Schleicher,Jose Soto,Kathleen Tibbetts,Charlotte Tolonen,Gordon Wade,Michael E. Talkowski,Benjamin M. Neale,Mark J. Daly,Daniel G. MacArthur +64 more
Nilah M. Ioannidis,Joseph H. Rothstein,Joseph H. Rothstein,Vikas Pejaver,Sumit Middha,Shannon K. McDonnell,Saurabh Baheti,Anthony M. Musolf,Qing Li,Emily R. Holzinger,Danielle M. Karyadi,Lisa A. Cannon-Albright,Craig C. Teerlink,Janet L. Stanford,William B. Isaacs,Jianfeng Xu,Kathleen A. Cooney,Kathleen A. Cooney,Ethan M. Lange,Johanna Schleutker,John D. Carpten,Isaac J. Powell,Olivier Cussenot,Geraldine Cancel-Tassin,Graham G. Giles,Graham G. Giles,Robert J. MacInnis,Robert J. MacInnis,Christiane Maier,Chih-Lin Hsieh,Fredrik Wiklund,William J. Catalona,William D. Foulkes,Diptasri Mandal,Rosalind A. Eeles,Zsofia Kote-Jarai,Carlos Bustamante,Daniel J. Schaid,Trevor Hastie,Elaine A. Ostrander,Joan E. Bailey-Wilson,Predrag Radivojac,Stephen N. Thibodeau,Alice S. Whittemore,Weiva Sieh,Weiva Sieh +45 more