TL;DR: PhosphoSitePlus as discussed by the authors is an open, comprehensive, manually curated and interactive resource for studying experimentally observed post-translational modifications, primarily of human and mouse proteins.
Abstract: PhosphoSitePlus (http://www.phosphosite.org) is an open, comprehensive, manually curated and interactive resource for studying experimentally observed post-translational modifications, primarily of human and mouse proteins. It encompasses 1,30,000 non-redundant modification sites, primarily phosphorylation, ubiquitinylation and acetylation. The interface is designed for clarity and ease of navigation. From the home page, users can launch simple or complex searches and browse high-throughput data sets by disease, tissue or cell line. Searches can be restricted by specific treatments, protein types, domains, cellular components, disease, cell types, cell lines, tissue and sequences or motifs. A few clicks of the mouse will take users to substrate pages or protein pages with sites, sequences, domain diagrams and molecular visualization of side-chains known to be modified; to site pages with information about how the modified site relates to the functions of specific proteins and cellular processes and to curated information pages summarizing the details from one record. PyMOL and Chimera scripts that colorize reactive groups on residues that are modified can be downloaded. Features designed to facilitate proteomic analyses include downloads of modification sites, kinase-substrate data sets, sequence logo generators, a Cytoscape plugin and BioPAX download to enable pathway visualization of the kinase-substrate interactions in PhosphoSitePlus®.
TL;DR: The in-depth phosphoproteomic study represents a significant contribution to C-HPP and identifies 3,033 "missing proteins", i.e., proteins that currently lack evidence by mass spectrometry, in the neXtProt database and 12,852 unknown phosphorylation sites not registered in the PhosphoSitePlus database.
Abstract: The Chromosome-Centric Human Proteome Project (C-HPP) is an international effort for creating an annotated proteomic catalog for each chromosome. The first step of the C-HPP project is to find evidence of expression of all proteins encoded on each chromosome. C-HPP also prioritizes particular protein subsets, such as those with post-translational modifications (PTMs) and those found in low abundance. As participants in C-HPP, we integrated proteomic and phosphoproteomic analysis results from chromosome-independent biomarker discovery research to create a chromosome-based list of proteins and phosphorylation sites. Data were integrated from five independent colorectal cancer (CRC) samples (three types of clinical tissue and two types of cell lines) and lead to the identification of 11,278 proteins, including 8,305 phosphoproteins and 28,205 phosphorylation sites; all of these were categorized on a chromosome-by-chromosome basis. In total, 3,033 "missing proteins", i.e., proteins that currently lack evidence by mass spectrometry, in the neXtProt database and 12,852 unknown phosphorylation sites not registered in the PhosphoSitePlus database were identified. Our in-depth phosphoproteomic study represents a significant contribution to C-HPP. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the data set identifier PXD000089.
TL;DR: The data suggest that phosphorylation may affect drug binding and efficacy for a significant fraction of drug target proteins.
Abstract: While it is currently estimated that 40–50% of eukaryotic proteins are phosphorylated, little is known about the frequency and local effects of phosphorylation near pharmaceutical inhibitor binding sites. In this study, we investigated how frequently phosphorylation may affect the binding of drug inhibitors to target proteins. We examined the 453 non-redundant structures of soluble mammalian drug target proteins bound to inhibitors currently available in the Protein Data Bank (PDB). We cross-referenced these structures with phosphorylation data available from the PhosphoSitePlus database. 322/453 (71%) of drug targets have evidence of phosphorylation that has been validated by multiple methods or labs. For 132/453 (29%) of those, the phosphorylation site is within 12A of the small molecule-binding site, where it would likely alter small molecule binding affinity. We propose a framework for distinguishing between drug-phosphorylation site interactions that are likely to alter the efficacy of drugs vs. those that are not. In addition we highlight examples of well-established drug targets, such as estrogen receptor alpha, for which phosphorylation may affect drug affinity and clinical efficacy. Our data suggest that phosphorylation may affect drug binding and efficacy for a significant fraction of drug target proteins.
TL;DR: PhoSigNet was designed to store and display human phosphorylation-mediated signal transduction networks, with additional information related to cancer, and is expected to be a useful database and analysis platform benefiting both proteomics and cancer studies.
Abstract: Protein phosphorylation is the most abundant reversible covalent modification. Human protein kinases participate in almost all biological pathways, and approximately half of the kinases are associated with disease. PhoSigNet was designed to store and display human phosphorylation-mediated signal transduction networks, with additional information related to cancer. It contains 11 976 experimentally validated directed edges and 216 871 phosphorylation sites. Moreover, 3491 differentially expressed proteins in human cancer from dbDEPC, 18 907 human cancer variation sites from CanProVar, and 388 hyperphosphorylation sites from PhosphoSitePlus were collected as annotation information. Compared with other phosphorylation-related databases, PhoSigNet not only takes the kinase-substrate regulatory relationship pairs into account, but also extends regulatory relationships up- and downstream (e.g., from ligand to receptor, from G protein to kinase, and from transcription factor to targets). Furthermore, PhoSigNet allows the user to investigate the impact of phosphorylation modifications on cancer. By using one set of in-house time series phosphoproteomics data, the reconstruction of a conditional and dynamic phosphorylation-mediated signaling network was exemplified. We expect PhoSigNet to be a useful database and analysis platform benefiting both proteomics and cancer studies.
TL;DR: These models are the first to predict whether PTMs are located inside or outside of PPIRs, as demonstrated by their high predictive performance.
Abstract: One very important functional domain of proteins is the protein-protein interacting region (PPIR), which forms the binding interface between interacting polypeptide chains. Post-translational modifications (PTMs) that occur in the PPIR can either interfere with or facilitate the interaction between proteins. The ability to predict whether sites of protein modifications are inside or outside of PPIRs would be useful in further elucidating the regulatory mechanisms by which modifications of specific proteins regulate their cellular functions. Using two of the comprehensive databases for protein-protein interaction and protein modification site data (PDB and PhosphoSitePlus, respectively), we created new databases that map PTMs to their locations inside or outside of PPIRs. The mapped PTMs represented only 5 % of all known PTMs. Thus, in order to predict localization within or outside of PPIRs for the vast majority of PTMs, a machine learning strategy was used to generate predictive models from these mapped databases. For the three mapped PTM databases which had sufficient numbers of modification sites for generating models (acetylation, phosphorylation, and ubiquitylation), the resulting models yielded high overall predictive performance as judged by a combined performance score (CPS). Among the multiple properties of amino acids that were used in the classification tasks, hydrophobicity was found to contribute substantially to the performance of the final predictive models. Compared to the other classifiers we also evaluated, the SVM provided the best performance overall. These models are the first to predict whether PTMs are located inside or outside of PPIRs, as demonstrated by their high predictive performance. The models and data presented here should be useful in prioritizing both known and newly identified PTMs for further studies to determine the functional relationship between specific PTMs and protein-protein interactions. The implemented R package is available online (
http://sysbio.chula.ac.th/PtmPPIR
).