TL;DR: A new computer program, called SHIFTX2, is described which is capable of rapidly and accurately calculating diamagnetic 1H, 13C and 15N chemical shifts from protein coordinate data and will open the door to many long-anticipated applications of chemical shift prediction to protein structure determination, refinement and validation.
Abstract: A new computer program, called SHIFTX2, is described which is capable of rapidly and accurately calculating diamagnetic 1H, 13C and 15N chemical shifts from protein coordinate data. Compared to its predecessor (SHIFTX) and to other existing protein chemical shift prediction programs, SHIFTX2 is substantially more accurate (up to 26% better by correlation coefficient with an RMS error that is up to 3.3× smaller) than the next best performing program. It also provides significantly more coverage (up to 10% more), is significantly faster (up to 8.5×) and capable of calculating a wider variety of backbone and side chain chemical shifts (up to 6×) than many other shift predictors. In particular, SHIFTX2 is able to attain correlation coefficients between experimentally observed and predicted backbone chemical shifts of 0.9800 (15N), 0.9959 (13Cα), 0.9992 (13Cβ), 0.9676 (13C′), 0.9714 (1HN), 0.9744 (1Hα) and RMS errors of 1.1169, 0.4412, 0.5163, 0.5330, 0.1711, and 0.1231 ppm, respectively. The correlation between SHIFTX2’s predicted and observed side chain chemical shifts is 0.9787 (13C) and 0.9482 (1H) with RMS errors of 0.9754 and 0.1723 ppm, respectively. SHIFTX2 is able to achieve such a high level of accuracy by using a large, high quality database of training proteins (>190), by utilizing advanced machine learning techniques, by incorporating many more features (χ2 and χ3 angles, solvent accessibility, H-bond geometry, pH, temperature), and by combining sequence-based with structure-based chemical shift prediction techniques. With this substantial improvement in accuracy we believe that SHIFTX2 will open the door to many long-anticipated applications of chemical shift prediction to protein structure determination, refinement and validation. SHIFTX2 is available both as a standalone program and as a web server (http://www.shiftx2.ca).
TL;DR: It is suggested that if NMR-derived structures could be refined using heteronuclear chemical shifts calculated by SHIFTX, their precision could approach that of the highest resolution X-ray structures.
Abstract: A computer program (SHIFTX) is described which rapidly and accurately calculates the diamagnetic 1H, 13C and 15N chemical shifts of both backbone and sidechain atoms in proteins. The program uses a hybrid predictive approach that employs pre-calculated, empirically derived chemical shift hypersurfaces in combination with classical or semi-classical equations (for ring current, electric field, hydrogen bond and solvent effects) to calculate 1H, 13C and 15N chemical shifts from atomic coordinates. The chemical shift hypersurfaces capture dihedral angle, sidechain orientation, secondary structure and nearest neighbor effects that cannot easily be translated to analytical formulae or predicted via classical means. The chemical shift hypersurfaces were generated using a database of IUPAC-referenced protein chemical shifts – RefDB (Zhang et al., 2003), and a corresponding set of high resolution (<2.1 A) X-ray structures. Data mining techniques were used to extract the largest pairwise contributors (from a list of ∼20 derived geometric, sequential and structural parameters) to generate the necessary hypersurfaces. SHIFTX is rapid (< 1 CPU second for a complete shift calculation of 100 residues) and accurate. Overall, the program was able to attain a correlation coefficient (r) between observed and calculated shifts of 0.911 (1Hα), 0.980 (13Cα), 0.996 (13Cβ), 0.863 (13CO), 0.909 (15N), 0.741 (1HN), and 0.907 (sidechain 1H) with RMS errors of 0.23, 0.98, 1.10, 1.16, 2.43, 0.49, and 0.30 ppm, respectively on test data sets. We further show that the agreement between observed and SHIFTX calculated chemical shifts can be an extremely sensitive measure of the quality of protein structures. Our results suggest that if NMR-derived structures could be refined using heteronuclear chemical shifts calculated by SHIFTX, their precision could approach that of the highest resolution X-ray structures. SHIFTX is freely available as a web server at http://redpoll.pharmacy.ualberta.ca.
TL;DR: A new chemical shift prediction program, SPARTA+, is described, based on artificial neural networking, which appears to be approaching the limit at which empirical approaches can predict chemical shifts.
Abstract: NMR chemical shifts provide important local structural information for proteins and are key in recently described protein structure generation protocols. We describe a new chemical shift prediction program, SPARTA+, which is based on artificial neural networking. The neural network is trained on a large carefully pruned database, containing 580 proteins for which high-resolution X-ray structures and nearly complete backbone and 13Cβ chemical shifts are available. The neural network is trained to establish quantitative relations between chemical shifts and protein structures, including backbone and side-chain conformation, H-bonding, electric fields and ring-current effects. The trained neural network yields rapid chemical shift prediction for backbone and 13Cβ atoms, with standard deviations of 2.45, 1.09, 0.94, 1.14, 0.25 and 0.49 ppm for δ15N, δ13C’, δ13Cα, δ13Cβ, δ1Hα and δ1HN, respectively, between the SPARTA+ predicted and experimental shifts for a set of eleven validation proteins. These results represent a modest but consistent improvement (2–10%) over the best programs available to date, and appear to be approaching the limit at which empirical approaches can predict chemical shifts.
TL;DR: It is evident that protein NMR spectroscopists are increasingly adhering to recommended IUPAC 13C and 15N chemical shift referencing conventions, however, approximately 20% of newly deposited protein entries in the BMRB are still being incorrectly referenced, cause for some concern.
Abstract: RefDB is a secondary database of reference-corrected protein chemical shifts derived from the BioMagResBank (BMRB). The database was assembled by using a recently developed program (SHIFTX) to predict protein 1H, 13C and 15N chemical shifts from X-ray or NMR coordinate data of previously assigned proteins. The predicted shifts were then compared with the corresponding observed shifts and a variety of statistical evaluations performed. In this way, potential mis-assignments, typographical errors and chemical referencing errors could be identified and, in many cases, corrected. This approach allows for an unbiased, instrument-independent solution to the problem of retrospectively re-referencing published protein chemical shifts. Results from this study indicate that nearly 25% of BMRB entries with 13C protein assignments and 27% of BMRB entries with 15N protein assignments required significant chemical shift reference readjustments. Additionally, nearly 40% of protein entries deposited in the BioMagResBank appear to have at least one assignment error. From this study it evident that protein NMR spectroscopists are increasingly adhering to recommended IUPAC 13C and 15N chemical shift referencing conventions, however, approximately 20% of newly deposited protein entries in the BMRB are still being incorrectly referenced. This is cause for some concern. However, the utilization of RefDB and its companion programs may help mitigate this ongoing problem. RefDB is updated weekly and the database, along with its associated software, is freely available at http://redpoll.pharmacy.ualberta.ca and the BMRB website.