Sequence space (evolution)

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1038/S41592-018-0138-4•

Deep generative models of genetic variation capture the effects of mutations.

[...]

Adam J. Riesselman¹, John Ingraham¹, Debora S. Marks¹•Institutions (1)

Harvard University¹

24 Sep 2018-Nature Methods

TL;DR: DeepSequence is an unsupervised deep latent-variable model that predicts the effects of mutations on the basis of evolutionary sequence information that is grounded with biologically motivated priors, reveals the latent organization of sequence families, and can be used to explore new parts of sequence space.

...read moreread less

Abstract: The functions of proteins and RNAs are defined by the collective interactions of many residues, and yet most statistical models of biological sequences consider sites nearly independently Recent approaches have demonstrated benefits of including interactions to capture pairwise covariation, but leave higher-order dependencies out of reach Here we show how it is possible to capture higher-order, context-dependent constraints in biological sequences via latent variable models with nonlinear dependencies We found that DeepSequence ( https://githubcom/debbiemarkslab/DeepSequence ), a probabilistic model for sequence families, predicted the effects of mutations across a variety of deep mutational scanning experiments substantially better than existing methods based on the same evolutionary data The model, learned in an unsupervised manner solely on the basis of sequence information, is grounded with biologically motivated priors, reveals the latent organization of sequence families, and can be used to explore new parts of sequence space

...read moreread less

639 citations

Journal Article•10.1002/IJCH.201200096•

ConSurf: Using Evolutionary Data to Raise Testable Hypotheses about Protein Function

[...]

Gershon Celniker¹, Guy Nimrod¹, Haim Ashkenazy¹, Fabian Glaser², Eric Martz³, Itay Mayrose¹, Tal Pupko¹, Nir Ben-Tal¹ - Show less +4 more•Institutions (3)

Tel Aviv University¹, Technion – Israel Institute of Technology², University of Massachusetts Amherst³

01 Apr 2013-Israel Journal of Chemistry

TL;DR: The ConSurF-DB, a new release of which is presented here, provides precalcu- lated ConSurf conservation analysis of nearly all available structures in the Protein DataBank (PDB), as well as a range of large-scale, genome-wide applications.

...read moreread less

Abstract: Many mutations disappear from the population because they impair protein function and/or stability. Thus, amino acid positions that are essential for proper function evolve more slowly than others, or in other words, the slow evolutionary rate of a position reflects its importance. Con- Surf (http://consurf.tau.ac.il), reviewed in this manuscript, exploits this to reveal key amino acid positions that are im- portant for maintaining the native conformation(s) of the protein and its function, be it binding, catalysis, transport, etc. Given the sequence or 3D structure of the query protein as input, a search for similar sequences is conducted and the sequences are aligned. The multiple sequence alignment is subsequently used to calculate the evolutionary rates of each amino acid site, using Bayesian or maximum-likelihood algorithms. Both algorithms take into account the evolution- ary relationships between the sequences, reflected in phylo- genetic trees, to alleviate problems due to uneven (biased) sampling in sequence space. This is particularly important when the number of sequences is low. The ConSurf-DB, a new release of which is presented here, provides precalcu- lated ConSurf conservation analysis of nearly all available structures in the Protein DataBank (PDB). The usefulness of ConSurf for the study of individual proteins and mutations, as well as a range of large-scale, genome-wide applications, is reviewed.

...read moreread less

600 citations

Journal Article•10.1073/PNAS.1901979116•

Machine learning-assisted directed protein evolution with combinatorial libraries.

[...]

Zachary Wu¹, S. B. Jennifer Kan¹, Russell D. Lewis¹, Bruce J. Wittmann¹, Frances H. Arnold¹ - Show less +1 more•Institutions (1)

California Institute of Technology¹

30 Apr 2019-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: It is proposed that the expense of experimentally testing a large number of protein variants can be decreased and the outcome can be improved by incorporating machine learning with directed evolution, and that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches.

...read moreread less

Abstract: To reduce experimental effort associated with directed protein evolution and to explore the sequence space encoded by mutating multiple positions simultaneously, we incorporate machine learning into the directed evolution workflow. Combinatorial sequence space can be quite expensive to sample experimentally, but machine-learning models trained on tested variants provide a fast method for testing sequence space computationally. We validated this approach on a large published empirical fitness landscape for human GB1 binding protein, demonstrating that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. We then provide an example application in evolving an enzyme to produce each of the two possible product enantiomers (i.e., stereodivergence) of a new-to-nature carbene Si–H insertion reaction. The approach predicted libraries enriched in functional enzymes and fixed seven mutations in two rounds of evolution to identify variants for selective catalysis with 93% and 79% ee (enantiomeric excess). By greatly increasing throughput with in silico modeling, machine learning enhances the quality and diversity of sequence solutions for a protein engineering problem.

...read moreread less

532 citations

Journal Article•10.1006/JMBI.2000.4474•

ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information.

[...]

Aharon Armon¹, Dan Graur¹, Nir Ben-Tal¹•Institutions (1)

Tel Aviv University¹

16 Mar 2001-Journal of Molecular Biology

TL;DR: In this paper, the authors present an alternative approach involving the use of evolutionary data in the form of multiple-sequence alignment for a protein family to identify hot spots and surface patches that are likely to be in contact with other proteins, domains, peptides, DNA, RNA or ligands.

...read moreread less

521 citations

Journal Article•10.1038/NSB0295-171•

A method to predict functional residues in proteins

[...]

Georg Casari¹, Chris Sander¹, Alfonso Valencia¹•Institutions (1)

European Bioinformatics Institute¹

01 Feb 1995-Nature Structural & Molecular Biology

TL;DR: A novel method is presented that exploits conservation patterns for the prediction of functional residues in SH2 domains and in the conserved box of cyclins, using a simple but powerful representation of entire proteins, as well as sequence residues as vectors in a generalised ‘sequence space’.

...read moreread less

Abstract: The biological activity of a protein typically depends on the presence of a small number of functional residues. Identifying these residues from the amino acid sequences alone would be useful. Classically, strictly conserved residues are predicted to be functional but often conservation patterns are more complicated. Here, we present a novel method that exploits such patterns for the prediction of functional residues. The method uses a simple but powerful representation of entire proteins, as well as sequence residues as vectors in a generalised 'sequence space'. Projection of these vectors onto a lower-dimensional space reveals groups of residues specific for particular subfamilies that are predicted to be directly involved in protein function. Based on the method we present testable predictions for sets of functional residues in SH2 domains and in the conserved box of cyclins.

...read moreread less

460 citations

...

Expand

Topic Tools

Papers published on a yearly basis

Papers

Deep generative models of genetic variation capture the effects of mutations.

ConSurf: Using Evolutionary Data to Raise Testable Hypotheses about Protein Function

Machine learning-assisted directed protein evolution with combinatorial libraries.

ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information.

A method to predict functional residues in proteins

Related Topics (5)

Performance Metrics

No. of papers in the topic in previous years
Year	Papers
2021	25
2020	16
2019	17
2018	13
2017	17
2016	12