TL;DR: A new approach to fold recognition, whereby sequences are fitted directly onto the backbone coordinates of known protein structures, using a given sequence as a guide for the matching of sequences to backbone coordinates.
Abstract: THE prediction of protein tertiary structure from sequence using molecular energy calculations has not yet been successful; an alternative strategy of recognizing known motifs1 or folds2–4 in sequences looks more promising. We present here a new approach to fold recognition, whereby sequences are fitted directly onto the backbone coordinates of known protein structures. Our method for protein fold recognition involves automatic modelling of protein structures using a given sequence, and is based on the frameworks of known protein folds. The plausibility of each model, and hence the degree of compatibility between the sequence and the proposed structure, is evaluated by means of a set of empirical potentials derived from proteins of known structure. The novel aspect of our approach is that the matching of sequences to backbone coordinates is performed in full three-dimensional space, incorporating specific pair interactions explicitly.
TL;DR: The method has been applied to the genome of Mycoplasma genitalium, and analysis of the results shows that as many as 46 % of the proteins derived from the predicted protein coding regions have a significant relationship to a protein of known structure.
TL;DR: This work investigates the suitability of a fully automated, simple, objective, quantitative and reproducible method that can be used in the automatic assessment of models in the upcoming CAFASP2 experiment and concludes that MaxSub is a suitable method for the automatic evaluation of models.
Abstract: Motivation: Evaluating the accuracy of predicted models is critical for assessing structure prediction methods. Because this problem is not trivial, a large number of different assessment measures have been proposed by various authors, and it has already become an active subfield of research (Moultet al., 1999). The CASP (Moult et al., 1997, 1999) and CAFASP (Fischeret al., 1999) prediction experiments have demonstrated that it has been difficult to choose one single, ‘best’ method to be used in the evaluation. Consequently, the CASP3 evaluation was carried out using an extensive set of especially developed numerical measures, coupled with humanexpert intervention. As part of our efforts towards a higher level of automation in the structure prediction field, here we investigate the suitability of a fully automated, simple, objective, quantitative and reproducible method that can be used in the automatic assessment of models in the upcoming CAFASP2 experiment. Such a method should (a) produce one single number that measures the quality of a predicted model and (b) perform similarly to humanexpert evaluations. Results: MaxSub is a new and independently developed method that further builds and extends some of the evaluation methods introduced at CASP3. MaxSub aims at identifying the largest subset of Cα atoms of a model that superimpose ‘well’ over the experimental structure, and produces a single normalized score that represents the quality of the model. Because there exists no evaluation method for assessment measures of predicted models, it is not easy to evaluate how good our new measure is. Even though an exact comparison of MaxSub and the CASP3 assessment is not straightforward, here we use a test
TL;DR: A neural‐network‐based consensus predictor, Pcons, is presented that attempts to select the best model out of those produced by six prediction servers, each using different methods.
Abstract: During recent years many protein fold recognition methods have been developed, based on different algorithms and using various kinds of information. To examine the performance of these methods several evaluation experiments have been conducted. These include blind tests in CASP/CAFASP, large scale benchmarks, and long-term, continuous assessment with newly solved protein structures. These studies confirm the expectation that for different targets different methods produce the best predictions, and the final prediction accuracy could be improved if the available methods were combined in a perfect manner. In this article a neural-network-based consensus predictor, Pcons, is presented that attempts this task. Pcons attempts to select the best model out of those produced by six prediction servers, each using different methods. Pcons translates the confidence scores reported by each server into uniformly scaled values corresponding to the expected accuracy of each model. The translated scores as well as the similarity between models produced by different servers is used in the final selection. According to the analysis based on two unrelated sets of newly solved proteins, Pcons outperforms any single server by generating ∼8%–10% more correct predictions. Furthermore, the specificity of Pcons is significantly higher than for any individual server. From analyzing different input data to Pcons it can be shown that the improvement is mainly attributable to measurement of the similarity between the different models. Pcons is freely accessible for the academic community through the protein structure-prediction metaserver at http://bioinfo.pl/meta/.
TL;DR: The Robetta server produced quite reasonable predictions for targets in the recent CASP‐5 and CAFASP‐3 experiments, some of which were at the level of the best human predictions.
Abstract: Robetta is a fully automated protein structure prediction server that uses the Rosetta fragment-insertion method. It combines template-based and de novo structure prediction methods in an attempt to produce high quality models that cover every residue of a submitted sequence. The first step in the procedure is the automatic detection of the locations of domains and selection of the appropriate modeling protocol for each domain. For domains matched to a homolog with an experimentally characterized structure by PSI-BLAST or Pcons2, Robetta uses a new alignment method, called K*Sync, to align the query sequence onto the parent structure. It then models the variable regions by allowing them to explore conformational space with fragments in fashion similar to the de novo protocol, but in the context of the template. When no structural homolog is available, domains are modeled with the Rosetta de novo protocol, which allows the full length of the domain to explore conformational space via fragment-insertion, producing a large decoy ensemble from which the final models are selected. The Robetta server produced quite reasonable predictions for targets in the recent CASP-5 and CAFASP-3 experiments, some of which were at the level of the best human predictions.