The EVcouplings Python framework for coevolutionary sequence analysis
Thomas A. Hopf,Anna G. Green,Benjamin Schubert,Sophia Mersmann,Charlotta P I Schärfe,John Ingraham,Agnes Toth-Petroczy,Kelly P Brock,Adam J. Riesselman,Chan Kang,Christian Dallago,Chris Sander,Debora S. Marks +12 more
TL;DR: The EVcouplings framework is presented, a fully integrated open-source application and Python package for coevolutionary analysis that enables generation of sequence alignments, calculation and evaluation of evolutionary couplings, and de novo prediction of structure and mutation effects.
read more
Abstract: Summary: Coevolutionary sequence analysis has become a commonly used technique for de novo prediction of the structure and function of proteins, RNA, and protein complexes. This approach requires extensive computational pipelines that integrate multiple tools, databases and extensive data processing steps. We present the EVcouplings framework, a fully integrated open-source application and Python package for coevolutionary analysis. The framework enables generation of sequence alignments, calculation and evaluation of evolutionary couplings (ECs), and de novo prediction of structure and mutation effects. The application has an easy to use command line interface to run workflows with user control over all analysis parameters, while the underlying modular Python package allows interactive data analysis and rapid development of new workflows. Through this multi-layered approach, the EVcouplings framework makes the full power of coevolutionary analyses available to entry-level and advanced users. Availability: https://github.com/debbiemarkslab/evcouplings Contact: sander.research@gmail.com, debbie@hms.harvard.edu
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Large language models generate functional protein sequences across diverse families
Ali Madani,Ben Krause,Eric R. Greene,Subu Subramanian,Benjamin P. Mohr,James M. Holton,Jose L. Olmos,Caiming Xiong,Zachary Z Sun,Richard Socher,James S. Fraser,Nikhil Naik +11 more
TL;DR: ProGen is described, a language model that can generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics.
600
Mega-scale experimental analysis of protein folding stability in biology and design.
Kotaro Tsuboyama,Justas Dauparas,Jonathan Chen,Elodie Laine,Yasser Mohseni Behbahani,Niall M. Mangan,Sergey Ovchinnikov,Gabriel J. Rocklin +7 more
TL;DR: The cDNA display proteolysis method is fast, accurate and uniquely scalable, and promises to reveal the quantitative rules for how amino acid sequences encode folding stability.
172
Structural basis of NINJ1-mediated plasma membrane rupture in cell death
José Carlos Santos,Kristyna Pluhackova,Gonzalo Cebrero,Saray Ramos,Gytis Jankevicius,Ella Hartenian,Stefania A. Mari,Bastian Kohl,Daniel J. Müller,Paul Schanda,Timm Maier,Camilo Perez,Christian Sieben,Petr Broz,Sebastian Hiller +14 more
TL;DR: In this paper , the NINJ1 protein was shown to be an active component of the eukaryotic cell membrane that functions as an in-built breaking point in response to activation of cell death.
Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences.
Anna G. Green,Hadeer Elhabashy,Kelly P Brock,Rohan Maddamsetti,Oliver Kohlbacher,Debora S. Marks +5 more
TL;DR: In this paper, the authors address the challenges of large-scale interaction prediction at residue resolution with a fast alignment concatenation method and a probabilistic score for the interaction of residues.
Neural networks to learn protein sequence-function relationships from deep mutational scanning data
TL;DR: A supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants and analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism.
References
•Book
Accelerated Profile HMM Searches
Sean R. Eddy
- 01 May 2015
TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.
Jupyter Notebooks – a publishing format for reproducible computational workflows
Thomas Kluyver,Benjamin Ragan-Kelley,Fernando Perez,Brian E. Granger,Matthias Bussonnier,Jonathan Frederic,Kyle Kelley,Jessica B. Hamrick,Jason Grout,Sylvain Corlay,Paul Ivanov,Damián Avila,Safia Abdalla,Carol Willing +13 more
- 01 Jan 2016
TL;DR: Jupyter notebooks, a document format for publishing code, results and explanations in a form that is both readable and executable, is presented.
3.4K
Direct-coupling analysis of residue coevolution captures native contacts across many protein families
Faruck Morcos,Andrea Pagnani,Bryan Lunt,Arianna Bertolino,Debora S. Marks,Chris Sander,Riccardo Zecchina,José N. Onuchic,Terence Hwa,Martin Weigt +9 more
TL;DR: The findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, contingent on the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.
Version 1.2 of the Crystallography and NMR system
TL;DR: An improved model for the treatment of disordered solvent for crystallographic refinement that employs a combined grid search and least-squares optimization of the bulk solvent model parameters is included, resulting in lower R values.
1.4K
Protein 3D structure computed from evolutionary sequence variation.
Debora S. Marks,Lucy J. Colwell,Robert L. Sheridan,Thomas A. Hopf,Andrea Pagnani,Riccardo Zecchina,Chris Sander +6 more
TL;DR: Surprisingly, it is found that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures, and the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.