Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments.
TL;DR: A procedure is developed to detect statistical correlations stemming from functional interaction by removing the strong phylogenetic signal that leads to the correlations of each site with many others in the sequence.
read more
Abstract: Motivation: Multiple sequence alignments of homologous proteins are useful for inferring their phylogenetic history and to reveal functionally important regions in the proteins. Functional constraints may lead to co-variation of two or more amino acids in the sequence, such that a substitution at one site is accompanied by compensatory substitutions at another site. It is not sufficient to find the statistical correlations between sites in the alignment because these may be the result of several undetermined causes. In particular, phylogenetic clustering will lead to many strong correlations. Result: Ap rocedure is developed to detect statistical correlations stemming from functional interaction by removing the strong phylogenetic signal that leads to the correlations of each site with many others in the sequence. Our method relies upon the accuracy of the alignment but it does not require any assumptions about the phylogeny or the substitution process. The effectiveness of the method wa sv erified using computer simulations and then applied to predict functional interactions between amino acids in the Pfam database of alignments. Availability: The program and supplementary figures tables are available from the site http://www.uhnres.utoronto.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Direct-coupling analysis of residue coevolution captures native contacts across many protein families
Faruck Morcos,Andrea Pagnani,Bryan Lunt,Arianna Bertolino,Debora S. Marks,Chris Sander,Riccardo Zecchina,José N. Onuchic,Terence Hwa,Martin Weigt +9 more
TL;DR: The findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, contingent on the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.
Emerging methods in protein co-evolution.
TL;DR: This work reviews the main co-evolution-based computational approaches, their theoretical basis, potential applications and foreseeable developments, and describes the current state of the art in these areas.
634
Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction
TL;DR: A rapid, simple and general method based on information theory that accurately estimates the level of background mutual information for each pair of positions in a given protein family, and correctly identifies substantially more coevolving positions in protein families than any existing method.
Using information theory to search for co-evolving residues in proteins
TL;DR: The performance of various normalizations of MI in enhancing detection of co-evolving positions was assessed and it was found that normalization by the pair entropy was optimal.
289
Deciphering a global network of functionally associated post‐translational modifications
Pablo Minguez,Luca Parca,Francesca Diella,Daniel R. Mende,Runjun D. Kumar,Manuela Helmer-Citterich,Anne-Claude Gavin,Vera van Noort,Peer Bork +8 more
TL;DR: It is found that PTM types are vastly interconnected, forming a global network that comprise in human alone >50 000 residues in about 6000 proteins, and is likely to regulate multiple functional states of many if not all eukaryotic proteins.
References
The Protein Data Bank
Helen M. Berman,John D. Westbrook,Zukang Feng,Gary L. Gilliland,Talapady N. Bhat,Helge Weissig,Ilya N. Shindyalov,Philip E. Bourne +7 more
TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
The Pfam protein families database
Marco Punta,Penny Coggill,Ruth Y. Eberhardt,Jaina Mistry,John Tate,Chris Boursnell,Ningze Pang,Kristoffer Forslund,Goran Ceric,Jody Clements,Andreas Heger,Liisa Holm,Erik L. L. Sonnhammer,Sean R. Eddy,Alex Bateman,Robert D. Finn +15 more
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Pfam: the protein families database.
Robert D. Finn,Alex Bateman,Jody Clements,Penelope Coggill,Ruth Y. Eberhardt,Sean R. Eddy,Andreas Heger,Kirstie Hetherington,Liisa Holm,Jaina Mistry,Erik L. L. Sonnhammer,John Tate,Marco Punta +12 more
TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.