TL;DR: It is concluded that the majority of sequon containing proteins will be found to be glycosylated and that more than half of all proteins are glycoproteins.
TL;DR: Glycosylation site information on human proteins is used to illustrate the contribution of glycosylations to protein function and assess how widespread this modi cation is across the human proteome.
Abstract: The addition of a carbohydrate moeity to the side chain of a residue in a protein chain in uences the physicochemical properties of the protein Gly cosylation is known to alter proteolytic resistance protein solubility stability local structure lifetime in circulation and immunogenicity Of the various forms of protein glycosylation found in eukaryotic systems the most important types are N linked O linked GalNAc mucin type and O linked GlcNAc intracellular nuclear glycosylation N linked glycosylation is a co translational process involving the transfer of the precursor oligosac charide GlcNAc Man Glc to asparagine residues in the protein chain The asparagine usually occurs in a sequon Asn Xaa Ser Thr where Xaa is not Proline This is however not a speci c consensus since not all such sequons are modi ed in the cell O linked glycosylation involves the post translational transfer of an oligosaccharide to a serine or threonine residue In this case there is no well de ned motif for the acceptor site other than the near vicinity of proline and valine residues We have developed glycosylation site prediction methods for these three types of glycosylation using arti cial neural networks that examine correla tions in the local sequence context and surface accessibility In this paper we have used glycosylation site information on human proteins to illustrate the contribution of glycosylation to protein function and assess how widespread this modi cation is across the human proteome
TL;DR: It is indicated that non-glycosylated sites tend to be found more frequently towards the C termini of glycoproteins, and that proline residues in positions X and Y in the consensus Asn-X-Thr/Ser-Y strongly reduce the likelihood of N-linked glycosylation.
Abstract: In N-glycosylated glycoproteins, carbohydrate is attached to Asn in the sequence Asn-X-Ser/Thr, where X denotes any amino acid. However, the presence of this consensus peptide does not always lead to glycosylation. We have compiled an extensive collection of glycosylated and non-glycosylated Asn-X-Thr/Ser sites and present a statistical study based on this data set. Our results indicate that non-glycosylated sites tend to be found more frequently towards the C termini of glycoproteins, and that proline residues in positions X and Y in the consensus Asn-X-Thr/Ser-Y strongly reduce the likelihood of N-linked glycosylation. Beyond this, there are no obvious local sequence features that seem to correlate with the absence or presence of N-linked glycosylation. These findings are discussed in terms of the prediction and engineering of glycosylation sites in secretory proteins.
TL;DR: Structural analysis of structural data on glycosidic linkages is extended to the glycan-protein linkage, and the peptide primary, secondary, and tertiary structures around N-glycosylation sites, and findsHydrophobic protein-glycan interactions and the low accessibility of glycosylated asparagine sites in folded proteins are common features and may be critical in mediating these functions.
Abstract: We recently reported statistical analysis of structural data on glycosidic linkages. Here we extend this analysis to the glycan-protein linkage, and the peptide primary, secondary, and tertiary structures around N-glycosylation sites. We surveyed 506 glycoproteins in the Protein Data Bank crystallographic database, giving 2592 glycosylation sequons (1683 occupied) and generated a database of 626 nonredundant sequons with 386 occupied. Deviations in the expected amino acid composition were seen around occupied asparagines, particularly an increased occurrence of aromatic residues before the asparagine and threonine at position +2. Glycosylation alters the asparagine side chain torsion angle distribution and reduces its flexibility. There is an elevated probability of finding glycosylation sites in which secondary structure changes. An 11-class taxonomy was developed to describe protein surface geometry around glycosylation sites. Thirty-three percent of the occupied sites are on exposed convex surfaces, 10% in deep recesses and 20% on the edge of grooves with the glycan filling the cleft. A surprisingly large number of glycosylated asparagine residues have a low accessibility. The incidence of aromatic amino acids brought into close contact with the glycan by the folding process is higher than their normal levels on the surface or in the protein core. These data have significant implications for control of sequon occupancy and evolutionary selection of glycosylation sites and are discussed in relation to mechanisms of protein fold stabilization and regional quality control of protein folding. Hydrophobic protein-glycan interactions and the low accessibility of glycosylation sites in folded proteins are common features and may be critical in mediating these functions.
TL;DR: A new Web-based program developed to facilitate the sequon tracking and to define patterns allowed rapid visualization of the two distinctive patterns of sequon variation found in HIV-1, HIV-2, and SIV CPZ, and two shifting sites were identified.
Abstract: Human and simian immunodeficiency viruses (HIV and SIV), influenza virus, and hepatitis C virus (HCV) have heavily glycosylated, highly variable surface proteins. Here we explore N-linked glycosylation site (sequon) variation at the population level in these viruses, using a new Web-based program developed to facilitate the sequon tracking and to define patterns (www.hiv.lanl.gov). This tool allowed rapid visualization of the two distinctive patterns of sequon variation found in HIV-1, HIV-2, and SIV CPZ. The first pattern (fixed) describes readily aligned sites that are either simply present or absent. These sites tend to be occupied by high-mannose glycans. The second pattern (shifting) refers to sites embedded in regions of extreme local length variation and is characterized by shifts in terms of the relative position and local density of sequons; these sites tend to be populated by complex carbohydrates. HIV, with its extreme variation in number and precise location of sequons, does not have a net increase in the number of sites over time at the population level. Primate lentiviral lineages have host species-dependent levels of sequon shifting, with HIV-1 in humans the most extreme. HCV E1 and E2 proteins, despite evolving extremely rapidly through point mutation, show limited sequon variation, although two shifting sites were identified. Human influenza A hemagglutinin H3 HA1 is accumulating sequons over time, but this trend is not evident in any other avian or human influenza A serotypes.