TL;DR: A set of simple and physically motivated criteria for secondary structure, programmed as a pattern‐recognition process of hydrogen‐bonded and geometrical features extracted from x‐ray coordinates is developed.
Abstract: For a successful analysis of the relation between amino acid sequence and protein structure, an unambiguous and physically meaningful definition of secondary structure is essential. We have developed a set of simple and physically motivated criteria for secondary structure, programmed as a pattern-recognition process of hydrogen-bonded and geometrical features extracted from x-ray coordinates. Cooperative secondary structure is recognized as repeats of the elementary hydrogen-bonding patterns “turn” and “bridge.” Repeating turns are “helices,” repeating bridges are “ladders,” connected ladders are “sheets.” Geometric structure is defined in terms of the concepts torsion and curvature of differential geometry. Local chain “chirality” is the torsional handedness of four consecutive Cα positions and is positive for right-handed helices and negative for ideal twisted β-sheets. Curved pieces are defined as “bends.” Solvent “exposure” is given as the number of water molecules in possible contact with a residue. The end result is a compilation of the primary structure, including SS bonds, secondary structure, and solvent exposure of 62 different globular proteins. The presentation is in linear form: strip graphs for an overall view and strip tables for the details of each of 10.925 residues. The dictionary is also available in computer-readable form for protein structure prediction work.
TL;DR: A stand-alone I-TASSER Suite that can be used for off-line protein structure and function prediction and three complementary algorithms to enhance function inferences are developed, the consensus of which is derived by COACH4 using support vector machines.
Abstract: The lowest free-energy conformations are identified by structure clustering. A second round of assembly simulation is conducted, starting from the centroid models, to remove steric clashes and refine global topology. Final atomic structure models are constructed from the low-energy conformations by a two-step atomic-level energy minimization approach. The correctness of the global model is assessed by the confidence score, which is based on the significance of threading alignments and the density of structure clustering; the residue-level local quality of the structural models and B factor of the target protein are evaluated by a newly developed method, ResQ, built on the variation of modeling simulations and the uncertainty of homologous alignments through support vector regression training. For function annotation, the structure models with the highest confidence scores are matched against the BioLiP5 database of ligand-protein interactions to detect homologous function templates. Functional insights on ligand-binding site (LBS), Enzyme Commission (EC) and Gene Ontology (GO) are deduced from the functional templates. We developed three complementary algorithms (COFACTOR, TM-SITE and S-SITE) to enhance function inferences, the consensus of which is derived by COACH4 using support vector machines. Detailed instructions for installation, implementation and result interpretation of the Suite can be found in the Supplementary Methods and Supplementary Tables 1 and 2. The I-TASSER Suite pipeline was tested in recent communitywide structure and function prediction experiments, including CASP10 (ref. 1) and CAMEO2. Overall, I-TASSER generated the correct fold with a template modeling score (TM-score) >0.5 for 10 out of 36 “New Fold” (NF) targets in the CASP10, which have no homologous templates in the Protein Data Bank (PDB). Of the 110 template-based modeling targets, 92 had a TM-score >0.5, and 89 had the templates drawn closer to the native with an average r.m.s. deviation improvement of 1.05 Å in the same threadingaligned regions6. In CAMEO, COACH generated LBS predictions for 4,271 targets with an average accuracy 0.86, which was 20% higher than that of the second-best method in the experiment. Here we illustrate I-TASSER Suite–based structure and function modeling using six examples (Fig. 1b–g) from the communitywide blind tests1,2. R0006 and R0007 are two NF targets from CASP10, and I-TASSER constructed models of correct fold with a TM-score of 0.62 for both targets (Fig. 1b,c). An illustration of local quality estimation by ResQ is shown for T0652, which has an average error 0.75 Å compared to the actual deviation of the model from the native (Fig. 1h). The four LBS prediction examples (Fig. 1d–g) are from CASP10 (ref. 1) and CAMEO2; COACH generated ligand models all with a ligand r.m.s. deviation below 2 Å. COACH also correctly assigned the threeand fourdigit EC numbers to the enzyme targets C0050 and C0046 (Supplementary Table 3). In summary, we developed a stand-alone I-TASSER Suite that can be used for off-line protein structure and function prediction. The I-TASSER Suite: protein structure and function prediction
TL;DR: To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo‐electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER‐8.
Abstract: Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.
TL;DR: This chapter elaborates protein structure prediction using Rosetta, where short fragments of known proteins are assembled by a Monte Carlo strategy to yield native-like protein conformations.
Abstract: Publisher Summary This chapter elaborates protein structure prediction using Rosetta. Double-blind assessments of protein structure prediction methods have indicated that the Rosetta algorithm is perhaps the most successful current method for de novo protein structure prediction. In the Rosetta method, short fragments of known proteins are assembled by a Monte Carlo strategy to yield native-like protein conformations. Using only sequence information, successful Rosetta predictions yield models with typical accuracies of 3–6 A˚ Cα root mean square deviation (RMSD) from the experimentally determined structures for contiguous segments of 60 or more residues. For each structure prediction, many short simulations starting from different random seeds are carried out to generate an ensemble of decoy structures that have both favorable local interactions and protein-like global properties. This set is then clustered by structural similarity to identify the broadest free energy minima. The effectiveness of conformation modification operators for energy function optimization is also described in this chapter.
TL;DR: In this paper, the number of residue-residue contacts formed in a large number of protein crystal structures is estimated by means of the quasi-chemical approximation with an approximate treatment of the effects of chain connectivity.
Abstract: Effective interresidue contact energies for proteins in solution are estimated from the numbers of residue-residue contacts observed in crystal structures of globular proteins by means of the quasi-chemical approximation with an approximate treatment of the effects of chain connectivity. Employing a lattice model, each residue of a protein is assumed to occupy a site in a lattice and vacant sites are regarded to be occupied by an effective solvent molecule whose size is equal to the average size of a residue. A basic assumption is that the average characteristics of residue-residue contacts formed in a large number of protein crystal structures reflcct actual differences of interactions among residues, as if there were no significant contribution from the specific amino acid sequence in each protein as well as intraresidue and short-range interactions. Then, taking account of the effects of the chain connectivity only as imposing a limit to the size of the system, Le., the number of lattice sites or the number of effective solvent molecules in the system, the system is regarded to be the mixture of unconnected residues and effective solvent molecules. The quasi-chemical approximation, that contact pair formation resembles a chemical reaction, is applied to this system to obtain formulas that relate the statistical averages of the numbers of contacts to the contact energies. The number of effective solvent molecules for each protein is chosen to yield the total number of residue-residue contacts equal to its expected value for the hypothetical case of hard sphere interactions among residues and effective solvent molecules; the expected number of residue-residue contacts at this condition has been crudely estimated by means of a freely jointed chain distribution and an expansion originating in hard sphere interactions. Each residue is represented by the center of its side chain atom positions, and contacts among residues and effective solvent molecules are defined to be those pairs within 6.5 A, a distance that has been chosen on the basis of the observed radial distribution of residues; nearest-neighbor pairs along a chain are explicitly excluded in counting contacts. Coordination numbers, for each type of residue as well as for solvent molecules, are estimated from the mean volume of each type of residue and used to evaluate the numbers of residue-solvent and solvent-solvent contacts from the numbers of residue-residue contacts. The estimated values of contact energies have reasonable residue-type dependences, reflecting residue distributions in protein crystals; nonpolar-residue-in and polar-residue-out are seen as well as the segregation of those residue groups. In addition, there is a linear relationship between the average contact energies for nonpolar residues and their hydrophobicities reported by Nozaki and Tanford; however, the magnitudes on average are about twice as large. The relevance of results to protein folding and other applications are discussed. Introduction A complete treatment of protein conformations in solution requires inclusion of solvent effects. Solvent molecules interact with atoms in proteins not only in shortrange interactions such as hydrogen-bond formation and van der Waals interactions but modify electrostatic interactions between protein atoms. Also the entropy of water molecules around protein molecules differs from that of bulk water by forming more ordered cagelike structures or binding to specific sites. As originally pointed out by Kauzmann,' hydrophobic interactions, which would occur explicitly because of the nonspecific solvent effects, might be a principal force in leading to a collapsed protein molecule. Hydrophobic energies have been evaluated, among other ways, as the free energy changes of transfer of amino acids from ethanol or dioxane to water2 and of liquid, hydrocarbons into water.3* Chothia7-\" evaluated the contributions of hydrophobic energy to the formation of secondary, tertiary, and quaternary structures by employing the estimates in the reference2 quoted above for values of the hydrophobic energy of interfacial areas exposed to water. His and others12 estimates indicate that the hydrophobic energies, or the solvent effects, are a major contributor to the energetics of protein folding, essentially because large surface areas of protein molecules become buried in the interior upon folding. However, there is the fundamental question of whether liquid hydrocarbons and the organic solvents can completely represent a protein in t e r i~ r . ' ~ Lee14J5 has pointed out on the basis of a scaled particle theory that thermodynamic properties such as the partial molecular volume of the solute in dilute binary solution^'^ and the change in the Ben-Naim local standard chemical potential of a solute molecule upon transferring i t from the gas phase to a liquid phase15 depend significantly on both the packing density of pure solvent and the ratio of the size of the solvent molecule to that of the solute molecule. Then, he has claimed that an obvious major difference of the high packing density and solidlike rigidity of protein interiors from small nonpolar solvents and even simple polymers makes i t difficult to justify using the transfer data generally in quantitative studies of protein folding. Thus, estimates of hydrophobic interactions which are specific to protein molecules would be desirable. Protein folding processes include a wide range of protein conformations from denatured to native states. The conformational freedom of a protein is vast. This makes it difficult to simulate the whole process of protein folding, if all atoms of a protein and solvent molecules are to be included in a detailed energy calculation. The geometry of molecules and interaction potentials require some simplification. The principal purpose of the present work is to include solvent effects into effective interresidue contact energies, which can then provide a crude estimate of the long-range component of conformational energies. Tanaka and Scheraga16 estimated contact energies by a method which may appear to be similar but ignores solvent and is different in essence from the present one; incidentally, their method yielded extremely large magnitudes for contact energies. Here the effective contact energies between residues in proteins will be estimated directly from the numbers of residue-residue contacts observed in protein crystal structures by regarding them as statistical averages in the quasi-chemical with an approximate treatment of the effects of chain connectivity. Estimated contact energies will be compared with experimental values This article not subject to U.S. Copyright. Published 1985 by the American Chemical Society Macromolecules, Vol. 18, No. 3, 1985 of hydrophobic energies. Also, the relevance of results to protein folding and other applications will be discussed.