TL;DR: The analyses identify several candidate biomarkers of cellular senescence that overlap with aging markers in human plasma, including Growth/differentiation factor 15 (GDF15), stanniocalcin 1 (STC1), and serine protease inhibitors (SERPINs), which significantly correlated with age in plasma from a human cohort, the Baltimore Longitudinal Study of Aging (BLSA).
Abstract: The senescence-associated secretory phenotype (SASP) has recently emerged as a driver of and promising therapeutic target for multiple age-related conditions, ranging from neurodegeneration to cancer. The complexity of the SASP, typically assessed by a few dozen secreted proteins, has been greatly underestimated, and a small set of factors cannot explain the diverse phenotypes it produces in vivo. Here, we present the "SASP Atlas," a comprehensive proteomic database of soluble proteins and exosomal cargo SASP factors originating from multiple senescence inducers and cell types. Each profile consists of hundreds of largely distinct proteins but also includes a subset of proteins elevated in all SASPs. Our analyses identify several candidate biomarkers of cellular senescence that overlap with aging markers in human plasma, including Growth/differentiation factor 15 (GDF15), stanniocalcin 1 (STC1), and serine protease inhibitors (SERPINs), which significantly correlated with age in plasma from a human cohort, the Baltimore Longitudinal Study of Aging (BLSA). Our findings will facilitate the identification of proteins characteristic of senescence-associated phenotypes and catalog potential senescence biomarkers to assess the burden, originating stimulus, and tissue of origin of senescent cells in vivo.
TL;DR: This work predicts that use of (single-cell) transcriptomics, genetic screens, genetic engineering of cellular glycosylation capacities and custom design of glycoprotein therapeutics are advancements that will ignite wider integration of gly cosylation in general cell biology.
Abstract: Glycosylation is the most abundant and diverse form of post-translational modification of proteins that is common to all eukaryotic cells. Enzymatic glycosylation of proteins involves a complex metabolic network and different types of glycosylation pathways that orchestrate enormous amplification of the proteome in producing diversity of proteoforms and its biological functions. The tremendous structural diversity of glycans attached to proteins poses analytical challenges that limit exploration of specific functions of glycosylation. Major advances in quantitative transcriptomics, proteomics and nuclease-based gene editing are now opening new global ways to explore protein glycosylation through analysing and targeting enzymes involved in glycosylation processes. In silico models predicting cellular glycosylation capacities and glycosylation outcomes are emerging, and refined maps of the glycosylation pathways facilitate genetic approaches to address functions of the vast glycoproteome. These approaches apply commonly available cell biology tools, and we predict that use of (single-cell) transcriptomics, genetic screens, genetic engineering of cellular glycosylation capacities and custom design of glycoprotein therapeutics are advancements that will ignite wider integration of glycosylation in general cell biology.
TL;DR: The utility of large-scale mapping of the genetics of the proteome is demonstrated and pQTLs are provided as a resource for future precision studies of circulating proteins in human health.
Abstract: Circulating proteins are vital in human health and disease and are frequently used as biomarkers for clinical decision-making or as targets for pharmacological intervention. Here, we map and replicate protein quantitative trait loci (pQTL) for 90 cardiovascular proteins in over 30,000 individuals, resulting in 451 pQTLs for 85 proteins. For each protein, we further perform pathway mapping to obtain trans-pQTL gene and regulatory designations. We substantiate these regulatory findings with orthogonal evidence for trans-pQTLs using mouse knockdown experiments (ABCA1 and TRIB1) and clinical trial results (chemokine receptors CCR2 and CCR5), with consistent regulation. Finally, we evaluate known drug targets, and suggest new target candidates or repositioning opportunities using Mendelian randomization. This identifies 11 proteins with causal evidence of involvement in human disease that have not previously been targeted, including EGF, IL-16, PAPPA, SPON1, F3, ADM, CASP-8, CHI3L1, CXCL16, GDF15 and MMP-12. Taken together, these findings demonstrate the utility of large-scale mapping of the genetics of the proteome and provide a resource for future precision studies of circulating proteins in human health.
TL;DR: How large-scale comparative studies are characterizing the degree to which mRNA and protein levels correlate is discussed, and how transcriptomics and proteomics provide useful non-redundant readouts of gene expression is described.
Abstract: Gene expression involves transcription, translation and the turnover of mRNAs and proteins. The degree to which protein abundances scale with mRNA levels and the implications in cases where this dependency breaks down remain an intensely debated topic. Here we review recent mRNA-protein correlation studies in the light of the quantitative parameters of the gene expression pathway, contextual confounders and buffering mechanisms. Although protein and mRNA levels typically show reasonable correlation, we describe how transcriptomics and proteomics provide useful non-redundant readouts. Integrating both types of data can reveal exciting biology and is an essential step in refining our understanding of the principles of gene expression control.
TL;DR: 5xFAD shows a proteomic signature similar to symptomatic AD but exhibits activation of autophagy and interferon response and lacks human-specific deleterious events, such as downregulation of neurotrophic factors and synaptic proteins.
TL;DR: The TGF-β pathway, known for its involvement in tissue fibrosis, was specifically dysregulated by Sars-CoV-2 ORF8 and autophagy by SARS-CoVs ORF3, and was identified as a hotspot that can be targeted by existing drugs and it can guide rational design of virus- and host-directed therapies.
Abstract: The sudden global emergence of SARS-CoV-2 urgently requires an in-depth understanding of molecular functions of viral proteins and their interactions with the host proteome Several omics studies have extended our knowledge of COVID-19 pathophysiology, including some focused on proteomic aspects1–3 To understand how SARS-CoV-2 and related coronaviruses manipulate the host we here characterized interactome, proteome and signaling processes in a systems-wide manner This identified connections between the corresponding cellular events, revealed functional effects of the individual viral proteins and put these findings into the context of host signaling pathways We investigated the closely related SARS-CoV-2 and SARS-CoV viruses as well as the influence of SARS-CoV-2 on transcriptome, proteome, ubiquitinome and phosphoproteome of a lung-derived human cell line Projecting these data onto the global network of cellular interactions revealed relationships between the perturbations taking place upon SARS-CoV-2 infection at different layers and identified unique and common molecular mechanisms of SARS coronaviruses The results highlight the functionality of individual proteins as well as vulnerability hotspots of SARS-CoV-2, which we targeted with clinically approved drugs We exemplify this by identification of kinase inhibitors as well as MMPase inhibitors with significant antiviral effects against SARS-CoV-2
TL;DR: A quantitative atlas of the transcriptomes, proteomes and phosphoproteomes of 30 tissues of the model plant Arabidopsis thaliana provides a valuable resource for plant research.
Abstract: Plants are essential for life and are extremely diverse organisms with unique molecular capabilities1. Here we present a quantitative atlas of the transcriptomes, proteomes and phosphoproteomes of 30 tissues of the model plant Arabidopsis thaliana. Our analysis provides initial answers to how many genes exist as proteins (more than 18,000), where they are expressed, in which approximate quantities (a dynamic range of more than six orders of magnitude) and to what extent they are phosphorylated (over 43,000 sites). We present examples of how the data may be used, such as to discover proteins that are translated from short open-reading frames, to uncover sequence motifs that are involved in the regulation of protein production, and to identify tissue-specific protein complexes or phosphorylation-mediated signalling events. Interactive access to this resource for the plant community is provided by the ProteomicsDB and ATHENA databases, which include powerful bioinformatics tools to explore and characterize Arabidopsis proteins, their modifications and interactions.
TL;DR: The integration of mass spectrometry-based proteomics with next-generation DNA and RNA sequencing profiles tumors more comprehensively underscores the potential of proteogenomics for clinical investigation of breast cancer through more accurate annotation of targetable pathways and biological features of this remarkably heterogeneous malignancy.
TL;DR: A highly parallel protein quantitation platform integrating nanoparticle (NP) protein coronas with liquid chromatography-mass spectrometry for efficient proteomic profiling and biomarker discovery is developed.
Abstract: Large-scale, unbiased proteomics studies are constrained by the complexity of the plasma proteome. Here we report a highly parallel protein quantitation platform integrating nanoparticle (NP) protein coronas with liquid chromatography-mass spectrometry for efficient proteomic profiling. A protein corona is a protein layer adsorbed onto NPs upon contact with biofluids. Varying the physicochemical properties of engineered NPs translates to distinct protein corona patterns enabling differential and reproducible interrogation of biological samples, including deep sampling of the plasma proteome. Spike experiments confirm a linear signal response. The median coefficient of variation was 22%. We screened 43 NPs and selected a panel of 5, which detect more than 2,000 proteins from 141 plasma samples using a 96-well automated workflow in a pilot non-small cell lung cancer classification study. Our streamlined workflow combines depth of coverage and throughput with precise quantification based on unique interactions between proteins and NPs engineered for deep and scalable quantitative proteomic studies. Large-scale, unbiased proteomics studies of biological samples like plasma are constrained by the complexity of the proteome. Herein, the authors develop a highly parallel protein quantitation platform leveraging multi nanoparticle protein coronas for deep proteome sampling and biomarker discovery.
TL;DR: A survey illuminates alternatives for sequencing proteins with the brightest prospects for displacing mass spectrometry and promise to be scalable and seem to be adaptable to bioinformatics tools for calling the sequence of amino acids that constitute a protein.
Abstract: Proteins can be the root cause of a disease, and they can be used to cure it. The need to identify these critical actors was recognized early (1951) by Sanger; the first biopolymer sequenced was a peptide, insulin. With the advent of scalable, single-molecule DNA sequencing, genomics and transcriptomics have since propelled medicine through improved sensitivity and lower costs, but proteomics has lagged behind. Currently, proteomics relies mainly on mass spectrometry (MS), but instead of truly sequencing, it classifies a protein and typically requires about a billion copies of a protein to do it. Here, we offer a survey that illuminates a few alternatives with the brightest prospects for identifying whole proteins and displacing MS for sequencing them. These alternatives all boast sensitivity superior to MS and promise to be scalable and seem to be adaptable to bioinformatics tools for calling the sequence of amino acids that constitute a protein.
TL;DR: This is the first large-scale proteogenomics analysis across traditional histological boundaries to uncover foundational pediatric brain tumor biology and inform rational treatment selection.
TL;DR: This work aims to describe a bottom-up proteomics workflow from sample preparation to data analysis, including all of its benefits and pitfalls, and describes potential improvements in this type of proteochemistry workflow for the future.
Abstract: Proteomics is the field of study that includes the analysis of proteins, from either a basic science prospective or a clinical one. Proteins can be investigated for their abundance, variety of proteoforms due to post-translational modifications (PTMs), and their stable or transient protein–protein interactions. This can be especially beneficial in the clinical setting when studying proteins involved in different diseases and conditions. Here, we aim to describe a bottom-up proteomics workflow from sample preparation to data analysis, including all of its benefits and pitfalls. We also describe potential improvements in this type of proteomics workflow for the future.
TL;DR: Improvements in MS-based clinical proteomics not only solidify their integral position in cancer research, but also accelerate the shift towards becoming a regular component of routine analysis and clinical practice.
Abstract: Cancer biomarkers have transformed current practices in the oncology clinic. Continued discovery and validation are crucial for improving early diagnosis, risk stratification, and monitoring patient response to treatment. Profiling of the tumour genome and transcriptome are now established tools for the discovery of novel biomarkers, but alterations in proteome expression are more likely to reflect changes in tumour pathophysiology. In the past, clinical diagnostics have strongly relied on antibody-based detection strategies, but these methods carry certain limitations. Mass spectrometry (MS) is a powerful method that enables increasingly comprehensive insights into changes of the proteome to advance personalized medicine. In this review, recent improvements in MS-based clinical proteomics are highlighted with a focus on oncology. We will provide a detailed overview of clinically relevant samples types, as well as, consideration for sample preparation methods, protein quantitation strategies, MS configurations, and data analysis pipelines currently available to researchers. Critical consideration of each step is necessary to address the pressing clinical questions that advance cancer patient diagnosis and prognosis. While the majority of studies focus on the discovery of clinically-relevant biomarkers, there is a growing demand for rigorous biomarker validation. These studies focus on high-throughput targeted MS assays and multi-centre studies with standardized protocols. Additionally, improvements in MS sensitivity are opening the door to new classes of tumour-specific proteoforms including post-translational modifications and variants originating from genomic aberrations. Overlaying proteomic data to complement genomic and transcriptomic datasets forges the growing field of proteogenomics, which shows great potential to improve our understanding of cancer biology. Overall, these advancements not only solidify MS-based clinical proteomics’ integral position in cancer research, but also accelerate the shift towards becoming a regular component of routine analysis and clinical practice.
TL;DR: The phosphorylated tau interactome was enriched in proteins involved in the protein ubiquitination pathway and phagosome maturation for the first time, providing novel potential pathogenic mechanisms that can be explored in future studies.
Abstract: Accumulation of phosphorylated tau is a key pathological feature of Alzheimer's disease. Phosphorylated tau accumulation causes synaptic impairment, neuronal dysfunction and formation of neurofibrillary tangles. The pathological actions of phosphorylated tau are mediated by surrounding neuronal proteins; however, a comprehensive understanding of the proteins that phosphorylated tau interacts with in Alzheimer's disease is surprisingly limited. Therefore, the aim of this study was to determine the phosphorylated tau interactome. To this end, we used two complementary proteomics approaches: (i) quantitative proteomics was performed on neurofibrillary tangles microdissected from patients with advanced Alzheimer's disease; and (ii) affinity purification-mass spectrometry was used to identify which of these proteins specifically bound to phosphorylated tau. We identified 542 proteins in neurofibrillary tangles. This included the abundant detection of many proteins known to be present in neurofibrillary tangles such as tau, ubiquitin, neurofilament proteins and apolipoprotein E. Affinity purification-mass spectrometry confirmed that 75 proteins present in neurofibrillary tangles interacted with PHF1-immunoreactive phosphorylated tau. Twenty-nine of these proteins have been previously associated with phosphorylated tau, therefore validating our proteomic approach. More importantly, 34 proteins had previously been associated with total tau, but not yet linked directly to phosphorylated tau (e.g. synaptic protein VAMP2, vacuolar-ATPase subunit ATP6V0D1); therefore, we provide new evidence that they directly interact with phosphorylated tau in Alzheimer's disease. In addition, we also identified 12 novel proteins, not previously known to be physiologically or pathologically associated with tau (e.g. RNA binding protein HNRNPA1). Network analysis showed that the phosphorylated tau interactome was enriched in proteins involved in the protein ubiquitination pathway and phagosome maturation. Importantly, we were able to pinpoint specific proteins that phosphorylated tau interacts with in these pathways for the first time, therefore providing novel potential pathogenic mechanisms that can be explored in future studies. Combined, our results reveal new potential drug targets for the treatment of tauopathies and provide insight into how phosphorylated tau mediates its toxicity in Alzheimer's disease.
TL;DR: Thermal proteome profiling provides a unique insight into protein state and interactions in their native context and at a proteome‐wide level, allowing to study basic biological processes and their underlying mechanisms.
Abstract: Thermal proteome profiling (TPP) is based on the principle that, when subjected to heat, proteins denature and become insoluble. Proteins can change their thermal stability upon interactions with small molecules (such as drugs or metabolites), nucleic acids or other proteins, or upon post-translational modifications. TPP uses multiplexed quantitative mass spectrometry-based proteomics to monitor the melting profile of thousands of expressed proteins. Importantly, this approach can be performed in vitro, in situ, or in vivo. It has been successfully applied to identify targets and off-targets of drugs, or to study protein-metabolite and protein-protein interactions. Therefore, TPP provides a unique insight into protein state and interactions in their native context and at a proteome-wide level, allowing to study basic biological processes and their underlying mechanisms.
TL;DR: This review introduces bioinformatics software and tools designed for mass spectrometry-based protein identification and quantification, and then reviews the different statistical and machine learning methods that have been developed to perform comprehensive analysis in proteomics studies.
Abstract: Recent advances in mass spectrometry (MS)-based proteomics have enabled tremendous progress in the understanding of cellular mechanisms, disease progression, and the relationship between genotype and phenotype. Though many popular bioinformatics methods in proteomics are derived from other omics studies, novel analysis strategies are required to deal with the unique characteristics of proteomics data. In this review, we discuss the current developments in the bioinformatics methods used in proteomics and how they facilitate the mechanistic understanding of biological processes. We first introduce bioinformatics software and tools designed for mass spectrometry-based protein identification and quantification, and then we review the different statistical and machine learning methods that have been developed to perform comprehensive analysis in proteomics studies. We conclude with a discussion of how quantitative protein data can be used to reconstruct protein interactions and signaling networks.
TL;DR: DQMS takes into account the inherent dependence of protein variance on the number of PSMs or peptides used for quantification, thereby providing a more accurate variance estimation and achieving better accuracy and statistical power in quantitative proteomics.
TL;DR: The structural diversity and mechanisms of the two main classes of PDB enzymes are reviewed, the biotin protein ligases (BioID) and the peroxidases (APEX) are described, and the engineering of these enzymes for PDB is described and emerging applications, including the development of P DB for coincidence detection (split-PDB).
TL;DR: An advanced proteomics workflow is used to identify 340,000 proteins from 100 taxonomically diverse species, providing a comparative view of proteomes across the evolutionary range and a large-scale case study for sequence-based machine learning.
Abstract: Proteins carry out the vast majority of functions in all biological domains, but for technological reasons their large-scale investigation has lagged behind the study of genomes Since the first essentially complete eukaryotic proteome was reported1, advances in mass-spectrometry-based proteomics2 have enabled increasingly comprehensive identification and quantification of the human proteome3-6 However, there have been few comparisons across species7,8, in stark contrast with genomics initiatives9 Here we use an advanced proteomics workflow-in which the peptide separation step is performed by a microstructured and extremely reproducible chromatographic system-for the in-depth study of 100 taxonomically diverse organisms With two million peptide and 340,000 stringent protein identifications obtained in a standardized manner, we double the number of proteins with solid experimental evidence known to the scientific community The data also provide a large-scale case study for sequence-based machine learning, as we demonstrate by experimentally confirming the predicted properties of peptides from Bacteroides uniformis Our results offer a comparative view of the functional organization of organisms across the entire evolutionary range A remarkably high fraction of the total proteome mass in all kingdoms is dedicated to protein homeostasis and folding, highlighting the biological challenge of maintaining protein structure in all branches of life Likewise, a universally high fraction is involved in supplying energy resources, although these pathways range from photosynthesis through iron sulfur metabolism to carbohydrate metabolism Generally, however, proteins and proteomes are remarkably diverse between organisms, and they can readily be explored and functionally compared at wwwproteomesoflifeorg
TL;DR: Improved single-cell proteome coverage is reported through the combination of the previously developed Nanodroplet Processing in One Pot for Trace Samples (nanoPOTS) platform with further miniaturization of liquid chromatography (LC) separations and implementation of an ultrasensitive latest-generation mass spectrometer.
Abstract: Single-cell proteomics can provide unique insights into biological processes by resolving heterogeneity that is obscured by bulk measurements. Gains in the overall sensitivity and proteome coverage...
TL;DR: An integrative database named DrLLPS for proteins involved in liquid–liquid phase separation (LLPS), which is a ubiquitous and crucial mechanism for spatiotemporal organization of various biochemical reactions, by creating membraneless organelles in eukaryotic cells is presented.
Abstract: Here, we presented an integrative database named DrLLPS (http://llps.biocuckoo.cn/) for proteins involved in liquid-liquid phase separation (LLPS), which is a ubiquitous and crucial mechanism for spatiotemporal organization of various biochemical reactions, by creating membraneless organelles (MLOs) in eukaryotic cells. From the literature, we manually collected 150 scaffold proteins that are drivers of LLPS, 987 regulators that contribute in modulating LLPS, and 8148 potential client proteins that might be dispensable for the formation of MLOs, which were then categorized into 40 biomolecular condensates. We searched potential orthologs of these known proteins, and in total DrLLPS contained 437 887 known and potential LLPS-associated proteins in 164 eukaryotes. Furthermore, we carefully annotated LLPS-associated proteins in eight model organisms, by using the knowledge integrated from 110 widely used resources that covered 16 aspects, including protein disordered regions, domain annotations, post-translational modifications (PTMs), genetic variations, cancer mutations, molecular interactions, disease-associated information, drug-target relations, physicochemical property, protein functional annotations, protein expressions/proteomics, protein 3D structures, subcellular localizations, mRNA expressions, DNA & RNA elements, and DNA methylations. We anticipate DrLLPS can serve as a helpful resource for further analysis of LLPS.
TL;DR: The results demonstrate that novel AD biomarker candidates are identified and confirmed by proteomic studies of brain tissue and biofluids, providing a rich resource for large-scale biomarker validation for the AD community.
Abstract: Based on amyloid cascade and tau hypotheses, protein biomarkers of different Aβ and tau species in cerebrospinal fluid (CSF) and blood/plasma/serum have been examined to correlate with brain pathology. Recently, unbiased proteomic profiling of these human samples has been initiated to identify a large number of novel AD biomarker candidates, but it is challenging to define reliable candidates for subsequent large-scale validation. We present a comprehensive strategy to identify biomarker candidates of high confidence by integrating multiple proteomes in AD, including cortex, CSF and serum. The proteomes were analyzed by the multiplexed tandem-mass-tag (TMT) method, extensive liquid chromatography (LC) fractionation and high-resolution tandem mass spectrometry (MS/MS) for ultra-deep coverage. A systems biology approach was used to prioritize the most promising AD signature proteins from all proteomic datasets. Finally, candidate biomarkers identified by the MS discovery were validated by the enzyme-linked immunosorbent (ELISA) and TOMAHAQ targeted MS assays. We quantified 13,833, 5941, and 4826 proteins from human cortex, CSF and serum, respectively. Compared to other studies, we analyzed a total of 10 proteomic datasets, covering 17,541 proteins (13,216 genes) in 365 AD, mild cognitive impairment (MCI) and control cases. Our ultra-deep CSF profiling of 20 cases uncovered the majority of previously reported AD biomarker candidates, most of which, however, displayed no statistical significance except SMOC1 and TGFB2. Interestingly, the AD CSF showed evident decrease of a large number of mitochondria proteins that were only detectable in our ultra-deep analysis. Further integration of 4 cortex and 4 CSF cohort proteomes highlighted 6 CSF biomarkers (SMOC1, C1QTNF5, OLFML3, SLIT2, SPON1, and GPNMB) that were consistently identified in at least 2 independent datasets. We also profiled CSF in the 5xFAD mouse model to validate amyloidosis-induced changes, and found consistent mitochondrial decreases (SOD2, PRDX3, ALDH6A1, ETFB, HADHA, and CYB5R3) in both human and mouse samples. In addition, comparison of cortex and serum led to an AD-correlated protein panel of CTHRC1, GFAP and OLFM3. In summary, 37 proteins emerged as potential AD signatures across cortex, CSF and serum, and strikingly, 59% of these were mitochondria proteins, emphasizing mitochondrial dysfunction in AD. Selected biomarker candidates were further validated by ELISA and TOMAHAQ assays. Finally, we prioritized the most promising AD signature proteins including SMOC1, TAU, GFAP, SUCLG2, PRDX3, and NTN1 by integrating all proteomic datasets. Our results demonstrate that novel AD biomarker candidates are identified and confirmed by proteomic studies of brain tissue and biofluids, providing a rich resource for large-scale biomarker validation for the AD community.
TL;DR: More work is needed to understand the longitudinal trajectory of select protein and metabolite markers, perform transomics analyses within merged datasets, and incorporate more kidney tissue-based investigation.
Abstract: In this review of the application of proteomics and metabolomics to kidney disease research, we review key concepts, highlight illustrative examples, and outline future directions. The proteome and metabolome reflect the influence of environmental exposures in addition to genetic coding. Circulating levels of proteins and metabolites are dynamic and modifiable, and thus amenable to therapeutic targeting. Design and analytic considerations in proteomics and metabolomics studies should be tailored to the investigator’s goals. For the identification of clinical biomarkers, adjustment for all potential confounding variables, particularly GFR, and strict significance thresholds are warranted. However, this approach has the potential to obscure biologic signals and can be overly conservative given the high degree of intercorrelation within the proteome and metabolome. Mass spectrometry, often coupled to up-front chromatographic separation techniques, is a major workhorse in both proteomics and metabolomics. High-throughput antibody- and aptamer-based proteomic platforms have emerged as additional, powerful approaches to assay the proteome. As the breadth of coverage for these methodologies continues to expand, machine learning tools and pathway analyses can help select the molecules of greatest interest and categorize them in distinct biologic themes. Studies to date have already made a substantial effect, for example elucidating target antigens in membranous nephropathy, identifying a signature of urinary peptides that adds prognostic information to urinary albumin in CKD, implicating circulating inflammatory proteins as potential mediators of diabetic nephropathy, demonstrating the key role of the microbiome in the uremic milieu, and highlighting kidney bioenergetics as a modifiable factor in AKI. Additional studies are required to replicate and expand on these findings in independent cohorts. Further, more work is needed to understand the longitudinal trajectory of select protein and metabolite markers, perform transomics analyses within merged datasets, and incorporate more kidney tissue–based investigation.
TL;DR: A comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex‐peptide binding prediction, and protein structure prediction, is provided.
Abstract: Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
TL;DR: LiP-Quant is developed, a drug target deconvolution pipeline based on limited proteolysis coupled with mass spectrometry that works across species, including in human cells, and demonstrates drug target identification across compound classes, including drugs targeting kinases, phosphatases and membrane proteins.
Abstract: Chemoproteomics is a key technology to characterize the mode of action of drugs, as it directly identifies the protein targets of bioactive compounds and aids in the development of optimized small-molecule compounds. Current approaches cannot identify the protein targets of a compound and also detect the interaction surfaces between ligands and protein targets without prior labeling or modification. To address this limitation, we here develop LiP-Quant, a drug target deconvolution pipeline based on limited proteolysis coupled with mass spectrometry that works across species, including in human cells. We use machine learning to discern features indicative of drug binding and integrate them into a single score to identify protein targets of small molecules and approximate their binding sites. We demonstrate drug target identification across compound classes, including drugs targeting kinases, phosphatases and membrane proteins. LiP-Quant estimates the half maximal effective concentration of compound binding sites in whole cell lysates, correctly discriminating drug binding to homologous proteins and identifying the so far unknown targets of a fungicide research compound. Proteomics is often used to map protein-drug interactions but identifying a drug’s protein targets along with the binding interfaces has not been achieved yet. Here, the authors integrate limited proteolysis and machine learning for the proteome-wide mapping of drug protein targets and binding sites.
TL;DR: The chief contributions of these techniques in understanding, diagnosing and treating NAFLD are summarized herein.
Abstract: Non-alcoholic fatty liver disease (NAFLD) is a multifaceted metabolic disorder, whose spectrum covers clinical, histological and pathophysiological developments ranging from simple steatosis to non-alcoholic steatohepatitis (NASH) and liver fibrosis, potentially evolving into cirrhosis, hepatocellular carcinoma and liver failure. Liver biopsy remains the gold standard for diagnosing NAFLD, while there are no specific treatments. An ever-increasing number of high-throughput Omics investigations on the molecular pathobiology of NAFLD at the cellular, tissue and system levels produce comprehensive biochemical patient snapshots. In the clinical setting, these applications are considerably enhancing our efforts towards obtaining a holistic insight on NAFLD pathophysiology. Omics are also generating non-invasive diagnostic modalities for the distinct stages of NAFLD, that remain though to be validated in multiple, large, heterogenous and independent cohorts, both cross-sectionally as well as prospectively. Finally, they aid in developing novel therapies. By tracing the flow of information from genomics to epigenomics, transcriptomics, proteomics, metabolomics, lipidomics and glycomics, the chief contributions of these techniques in understanding, diagnosing and treating NAFLD are summarized herein.
TL;DR: ChromID, a method for identifying the chromatin-dependent protein interactome on the basis of proximity biotinylation, is established and applied to distinct chromatin modifications in mouse stem cells, highlighting the ability of ChromID to obtain a detailed view of protein interaction networks on chromatin.
Abstract: Chromatin modifications regulate genome function by recruiting proteins to the genome. However, the protein composition at distinct chromatin modifications has yet to be fully characterized. In this study, we used natural protein domains as modular building blocks to develop engineered chromatin readers (eCRs) selective for DNA methylation and histone tri-methylation at H3K4, H3K9 and H3K27 residues. We first demonstrated their utility as selective chromatin binders in living cells by stably expressing eCRs in mouse embryonic stem cells and measuring their subnuclear localization, genomic distribution and histone-modification-binding preference. By fusing eCRs to the biotin ligase BASU, we established ChromID, a method for identifying the chromatin-dependent protein interactome on the basis of proximity biotinylation, and applied it to distinct chromatin modifications in mouse stem cells. Using a synthetic dual-modification reader, we also uncovered the protein composition at bivalently modified promoters marked by H3K4me3 and H3K27me3. These results highlight the ability of ChromID to obtain a detailed view of protein interaction networks on chromatin. The protein complexes associated with specific chromatin marks in living cells are identified using engineered binding proteins.
TL;DR: A novel proteomic aging clock comprised of proteins that were reported to change with age in plasma in three or more different studies is proposed and it is demonstrated that this clock is able to accurately predict human age.
TL;DR: Insight is provided into the potential roles of glycosylation in the pathogenesis of HGSC, with the possibility of distinguishing pathological outcomes of ovarian tumors from non-tumors, as well as classifying tumor clusters.
TL;DR: Proteogenomic methods to defining cancer signaling in-vivo starting from core needle biopsies and with application to a HER2 breast cancer focused clinical trial are developed.
Abstract: Cancer proteogenomics promises new insights into cancer biology and treatment efficacy by integrating genomics, transcriptomics and protein profiling including modifications by mass spectrometry (MS). A critical limitation is sample input requirements that exceed many sources of clinically important material. Here we report a proteogenomics approach for core biopsies using tissue-sparing specimen processing and microscaled proteomics. As a demonstration, we analyze core needle biopsies from ERBB2 positive breast cancers before and 48–72 h after initiating neoadjuvant trastuzumab-based chemotherapy. We show greater suppression of ERBB2 protein and both ERBB2 and mTOR target phosphosite levels in cases associated with pathological complete response, and identify potential causes of treatment resistance including the absence of ERBB2 amplification, insufficient ERBB2 activity for therapeutic sensitivity despite ERBB2 amplification, and candidate resistance mechanisms including androgen receptor signaling, mucin overexpression and an inactive immune microenvironment. The clinical utility and discovery potential of proteogenomics at biopsy-scale warrants further investigation. Connecting genomics and proteomics allows the development of more efficient and specific treatments for cancer. Here, the authors develop proteogenomic methods to defining cancer signaling in-vivo starting from core needle biopsies and with application to a HER2 breast cancer focused clinical trial.