key: cord-0687085-s85wef2p authors: Gámez, Gustavo; Hermoso, Juan A.; Carrasco-López, César; Gómez-Mejia, Alejandro; Muskus, Carlos E.; Hammerschmidt, Sven title: Atypical N-glycosylation of SARS-CoV-2 impairs the efficient binding of Spike-RBM to the human-host receptor hACE2 date: 2021-04-10 journal: bioRxiv DOI: 10.1101/2021.04.09.439154 sha: 44c65613399ff25f7eaba0cdbdd9394630d0ffd6 doc_id: 687085 cord_uid: s85wef2p SARS-CoV-2 internalization by human host cells relies on the molecular binding of its spike glycoprotein (SGP) to the angiotensin-converting-enzyme-2 (hACE2) receptor. It remains unknown whether atypical N-glycosylation of SGP modulates SARS-CoV-2 tropism for infections. Here, we address this question through an extensive bioinformatics analysis of publicly available structural and genetic data. We identified two atypical sequons (sequences of N-glycosylation: NGV 481-483 and NGV 501-503), strategically located on the receptor-binding motif (RBM) of SGP and facing the hACE2 receptor. Interestingly, the cryo-electron microscopy structure of trimeric SGP in complex with potent-neutralizing antibodies from convalescent patients revealed covalently-linked N-glycans in NGV 481-483 atypical sequons. Furthermore, NGV 501-503 atypical sequon involves the asparagine-501 residue, whose highly-transmissible mutation N501Y is present in circulating variants of major concerns and affects the SGP-hACE2 binding-interface through the well-known hotspot-353. These findings suggest that atypical SGP post-translational modifications modulate the SGP-hACE2 binding-affinity affecting consequently SARS-CoV-2 transmission and pathogenesis. The Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) is the etiological agent for the coronavirus disease 2019 (COVID- 19) pandemic, which is producing hundreds of millions of infected individuals and a disproportionate death toll worldwide 1,2 . SARS-CoV-2 is particularly aggressive and lethal for the elderly and patients with comorbidities, while remaining in most cases asymptomatic among children, adolescents and young adults 3, 4 . Although an overwhelming research effort is ongoing to understand COVID-19 epidemiology, the molecular mechanisms explaining SARS-CoV-2 differential infections and severities among individuals, especially the etiology of its age-dependent risk profile, are still elusive. Here, through bioinformatics and structural analysis of publicly available data, we uncovered the likely correlation between the presence of N-glycans attached to atypical sequences of Nglycosylation (NGV-sequons) in the spike glycoprotein (SGP), and their potential to modulate SARS-CoV-2 tropism for infections. 3 The ability of SARS-CoV-2 to infect human host cells relies on the molecular interaction between its surface-exposed SGP and the dimerized receptor angiotensin-converting enzyme-2 (hACE2) [5] [6] [7] . SGP is a homo-trimer of approximately 440 kDa composed of modular protomers 16 . Each monomer of 1273 aa comprises a receptor-binding motif (RBM), inserted in a receptor-binding domain (RBD) that specifically recognizes binding-hotspots on hACE2 including the well-characterized hotspot 31 and hotspot 353 [17] [18] [19] [20] . SARS-CoV-2 RBM targets its human receptor by fully-exposing a compact loop to the N-terminal helix of hACE2 19, 20 . These structural traits of the SARS-CoV-2 RBM contributes to a higher hACE2-binding affinity, when compared to the SARS-CoV-1 RBM 19, 20 . Both coronavirus SGPs and human hACE2 are glycosylated macromolecules, whose binding affinity and selectivity depend on their molecular structures and, the quantity, quality, and distribution of the oligosaccharide chains (glycans) they expose on their surfaces 21, 22 . The N-glycosylation of coronaviral SGPs by infected human host cells is a dynamic posttranslational process. This process is essential for physiological functions and contributes to pathophysiological states 23 . In this regards, SGP shielding with human-derived N-glycans play a crucial role in infectivity, antibody recognition, and immune evasion for pathogenic coronaviruses, as is the case for HIV, Influenza and Lassa virus 8 . However, in order to ensure a more efficient protein-protein binding with their human host receptors, coronaviruses exhibit a very limited number of typical glycans in the SPG protein 8, 9 . Considering the well-known Nglycosylation consensus (NXT and NXS, where X ≠ P), the SARS-CoV-2 spike gene encodes 22 canonical N-glycan sequons 9, 10, 12, 24, 25 . Thus, its SGP homo-trimer exhibits 66 Nglycosylation sites allowing a surface shielding of approximately 40% with a content of 28% of oligo-mannose-type glycans 9, 11, 12 . Nevertheless, none of these 22 typical N-glycosylation sites are directly positioned on the SPG-RBM 10-13 . To our knowledge, both the identification and validation of atypical N-glycan motifs are not straightforward. They are experimentally identified in glycoproteins on the basis of the asparagine (N) deamidation, after N-glycosidase F (PNGase F) treatment, using mass spectrometry-based glycoproteomics 14 . However, this method can lead to false positives as deamidation can also be induced during sample preparation 14 . Recent site-specific profiling studies of glycoproteins from human sera and from the ovarian cancer cell line (OVCAR-3) overcame this technical limitation and identified two atypical glyco-site sequons with the NXC 4 and NXV motifs 14, 15, 26, 27 . In consequence, validated evidences of atypical N-glycan occupancy have been reported for major human serum glycoproteins such as von Willebrand Factor, CD69, serotransferrin, factor XI, albumin, and α−1B-glycoprotein 15, 26, 27 . Furthermore, these atypical N-glycosylations have a widespread presence in the proteome of human individuals 15, 26, 27 . Nonetheless, in the context of the COVID-19 pandemic, it remains elusive whether these atypical N-glycosylation sequons are also encoded in the genome of the original Wuhan-Hu-1 (the novel coronavirus), and how they could potentially contribute to the virus tropism of the SARS-CoV-2 variants of major concern. In this study we uncovered evidences for the existence of N-glycans attached to the atypical NGV-motifs in key structural locations of the surface-exposed spike-glycoprotein of SARS-CoV-2. Our findings strongly support the potential N-glycosylation of these atypical sequons by humans in the COVID-19 pandemic. Moreover, we propose that the atypical N-glycosylation of these NGV motifs could modulate the propensity of the novel coronavirus to infect the human cells according to the host age, therefore highlighting potential targets for effective therapeutic and prophylactic interventions to control the COVID-19 pandemic. Current bioinformatics resources are limited to predict atypical sequences for Nglycosylation. Thus, we manually identified 75 genome-wide putative atypical N-glycosylation sequons (NXV and NXC) in the SARS-CoV-2 reference strain. We observed a particular abundance of these N-glycosylation motifs among protein-encoding genes such as the surfaceexposed SGP and several non-exposed proteins (RdRP, nsp2, and nsp3) ( Table 1 ). In total, we found 58 NXV-motifs (77.3%) and 17 NXC (22.7%). Six atypical NXV-sequons have the functional NGV motif. In one NXC the X corresponded to proline, which is a forbidden residue in the convention to identify typical N-glycosylation sites as it renders the sequons defective. It is worth noting that 34/58 (58.6%) of the NXV sequons are present in only four proteins: spike (12) , nsp3 (11), RdRP (6), and nsp2 (5) . Similarly, 12/17 (70.6%) of the NXC motifs were found to be distributed among the same four proteins: RdRP (4), spike (3), nsp2 (3), and nsp3 (2) ( Table 1 ). These results show that the spike protein contains the highest number of putative atypical N-glycosylation sequons from the whole SARS-CoV-2 genome. From the 15 putative atypical N-glycosylation sequons (15.8%) present in the SGP, eleven are located in the subunit S1 and four in the subunit S2. Further, the N-terminal domain (NTD) of SGP has four atypical N-glycan motifs and the RBD bears another four sequons. In line with our expectations, two of the N-glycan sequons identified at the RBD, which exhibit the NGVmotif, are fully-exposed on its RBM domain. Moreover, both atypical N-glycan sequons (RBM positions: 481-483 and 501-503) are located in the interaction interface of SGP with the human host-receptor hACE2. In addition, the NGV 481-483 is structurally located in the middle of the ridge loop of the RBM, while the NGV 501-503 involves the critical residue N501 (Figure 1 ). This asparagine, N501, has been shown to interact with the hACE2-binding hotspot 353, and is a key position for major clinically concerning mutations present in several emerging SARS-CoV-2 variants 19, 20 . Analysis of the three-dimensional structures of SGP and RBD available in the Protein Data Figure 2B ). This structure highlights the likelihood of these atypical N-glycan sequons to be recognized and glycosylated via the human metabolism. In contrast, no atypical N-glycans were observed for the NGV 501-503 sequon among the SGP and RBD structures available in the PDB. Other coronaviruses isolated from diverse species also conserve some of the atypical Nglycosylation sequons found in SARS-CoV-2. We aligned and compared the SARS-CoV-2 SGP-RBM protein sequence with the SGP-RBM of SARS-CoV-1 and other SARS-like coronaviruses isolated from pangolins and bats in China and Cambodia 30, 31 . The ridge loop of the Pangolin-CoV-RBM is very similar to that of SARS-CoV-2, also conserving both atypical Nglycosylation NGV-sequons we identified in SARS-CoV-2 ( Figure 3A ). Nevertheless, the ridge loops of two SARS-like coronaviruses isolated from bats in Cambodia in 2010 comprise the 6 atypical N-glycan sequon NGV 501-503, while the atypical N-glycan sequon NGV 481-483 is defective 31 . However, both atypical N-glycan motifs are defective and/or absent in the ridges from SARS-CoV-1 and the RaTG13 Bat-CoV ( Figure 3A ). In addition, the correspondent RBMpositions P479, C480 and C488, which are important for the conformation of the ridge loop, remain 100% conserved in all the 48 coronavirus sequences we have aligned and compared ( Figure 3A ). We hypothesized that SGP-RBM mutations of concern affecting the RBM-hACE2 binding interface either involve the atypical N-glycosylation sequons NGV 481-483 and NGV 501-503, or are located close to them. To explore this hypothesis, we further analyzed a total of 730,744 complete genome sequences of SARS-CoV-2 (GISAID database 32 The N501Y is one of the several clinically concerning mutations located at the atypical Nglycosylation sequons. This mutation has been recently reported in the three most common 7 SARS-CoV-2 mutant strains detected in the United Kingdom (501Y.V1), South-Africa (501Y.V2) and Brazil (501Y.V3 or P.1) ( Figure 3B ). Other six circulating variants have been reported for the Asn501 residue: N501T (1,779x -Italy MB61-Aug), N501S (18x), N501I (14x), N501H (3x), N501K (1x) and N501E (1x). Similarly, eight other genetic changes affecting directly the atypical N-glycan sequon NGV 501-503 have been detected up to date, and ten mutations have been identified for the NGV 481-483 ( Figure 3B ). The variants from South-Africa and Brazil have an additional genetic change of major concern in the ridge loop: the E484K mutation, which is located nearby the atypical N-glycan sequon NGV 481-483 ( Figure 3B ). The number of reported genomes including this E484K mutation can be now counted by thousands ( Figure 4) . Moreover, the SARS-CoV-2 variant identified in Italy presented also a non-synonymous change in the position Q493, which is important for the interaction between the SGP-RBM and the hACE2-binding hotspot 31 ( Figure 3B ). These findings show that mutations at atypical N-glycosylation sites are present in each of the new clinically concerning variants. It is noteworthy to mention that many of the new variants of concern harboring mutations at the atypical N-glycosylation sequons spread faster and have higher mortality rates 33, 34 . (Figure 4 ). This would suggest an important functional role of the NGV motif in the original SARS-CoV-2 spike-glycoprotein, which is still maintained after genome replication as an adaptive advantage in the new SARS-CoV-2 mutants. Thus, the atypical Nglycan sequon NGV 481-483 seems to be intolerant of those loss-of-function variants, which is in agreement with the mutation-refractory behavior observed for its neighbor positions C480 and C488 (residues defining the compact structure of the RBM ridge-loop, Figure 4 ). In contrast, we observed highly-frequent genetic changes at the hACE2-binding interface affecting the atypical NGV 501-503 sequon (Figure 4) . N501Y is the oldest (January 18 th , 2020 in Spain) and most frequent (63.8%) SARS-CoV-2 mutation detected up to date for the SGP-RBM. It has spread to throughout continents and now accounts for more than 165,500 reported SARS-CoV-2 genomes in the GISAID database (as of March 20 th , 2021), mainly from the United Kingdom 32, 43 . This contrasting finding between both atypical SARS-CoV-2 sequons is in agreement with the well-established knowledge about N-glycans in humans in which a sequon, although necessary, is not a sufficient criterion for glycosylation 23 . Though we identified the atypical N-glycan substrate N501 (Table 1 and Figure 1 ), we failed in identifying its N-glycan occupancy in any of the SGP-structures reported in the PDB COVID-19/SARS-CoV-2 resources. Moreover, another four mutations at position 501 also destroy the atypical N-glycan sequon ( Figure 4 ) with a much less severe impact on COVID-19 pandemics than the high rates of transmissibility earned by SARS-CoV-2 due to the N501Y, N501T and probably N501S mutations, whose hydroxyl radicals seems to improve the SGP-RBM binding-affinity by stabilizing the salt bridge between D38 and K353 of hACE2 at the binding-hotspot 353 ( Figure 3B ). More worrisome, the mutation N501Y together with E484K, affecting the atypical N-glycan sequons, threaten the protective efficacy of current vaccines 44 Unlike replication, transcription, and translation, N-glycosylation in humans is age-dependent and not driven by a template 48, 49 . Protein N-glycoforms depends on several human-host parameters such as golgi structure, inflammation, metabolism, glucose availability, and the expression of glycosyltransferases, glycosidases, nucleotide-sugar transporters, and nucleotide-sugar synthetic pathways enzymes 23, 50 . N-glycosylation is a complex biochemical process that reflects the consequences of life-style influences and environmental conditions on individuals with different genetic make-up 51 . Moreover, changes in N-glycosylation patterns of plasma proteins have been observed in various aging-associated diseases 50, 51 . In this regard, our findings are also able to provide hints on how this novel coronavirus exploits human glycosylation deficiencies, homeostasis disturbance, or complex related diseases targeting the vulnerabilities of a high-risk group of individuals. Those individuals able to atypically glycosylate SARS-CoV-2 impair the efficient binding of its Spike-RBM to the host receptor hACE2, and those with clear deficiencies in these metabolic pathways are more prone to develop severe symptoms of COVID-19 ( Figure 5) . Thus, depending on the atypical N-glycan antennae displayed on the NGV 481-483 sequon, steric hindrance may come into play at the SGP-hACE2 binding interface, providing the novel coronavirus with a tool for tuning its strong hACE2-binding capacity and viral entry. This age-dependent modulation of the SARS-CoV-2 pathogenic potential will also depend on the individual genetic background underlying the biochemical pathway responsible for the atypical N-glycosylation in humans. In this scenario, humans are masking key non-self coronaviral glycoproteins with their own atypical N-glycans, and this is a plausible molecular explanation for the asymptomatic phenomenon we observed in the pandemic. Finally, despite the contributions that our exhaustive re-analysis of large collections of SARS-CoV-2 genomes and protein structures are making to improve our understanding of the biological behavior of the novel coronavirus, a new window in the molecular biology of this pandemic pathogen is opened. How these atypical N-glycan sequons in the SGP-RBM (differentially glycosylated by humans) could mask immunodominant neutralizing epitopes or how they could affect immunogenicity and lower the effectiveness of the current available vaccines against SARS-CoV-2 are questions we need to answer in the near future. It is difficult to predict the role that NGV 481-483 will play during the massive immunization of human populations, but certainly it has been a determinant in spreading SARS-CoV-2. Thus, it is evident that the atypical N-glycosylation sequons reported in this study for the SARS-CoV-2 RBM has to be considered for future vaccine designs and updates. However, these atypical Nglycan findings for the novel coronavirus need to be more extensively studied and thoroughly to ascertain the implications of their real impact on the transmission and virulence of SARS-CoV-2. From the NCBI Virus genome database, we downloaded the available SARS-CoV-2 reference genome sequence (novel coronavirus isolate Wuhan-Hu-1, Accession number: Table S1 . From the Global Initiative of Sharing All Influenza Data (GISAID) database (https://www.gisaid.org/) we retrieved a total of 730,744 SARS-CoV-2 whole genome sequences (Supplementary Table S2 ). We used the following criteria to query the database: genome evolves mainly by mutation rather than recombination, we used single nucleotide polymorphisms (SNPs) as the main types of genetic variation in SARS-CoV-2 SGP, rather than presence/absence patterns of SARS-CoV-2 encoded genes for analysis of this pathogen. In addition, the SARS-CoV-2 variants of major concern were especially classified and analyzed, according to their appearance, frequency and geographic location. From the NCBI Virus genome database, we downloaded the available SARS-CoV-1, Bat-CoV and Pangolin-CoV reference genome sequences (Supplementary Table 3 Authors declare that all data and finding supports of this study are publicly available in accessible databases and within this manuscript and its supplementary files. Conceptualization and design of the initial study: G.G., J WHO. WHO Coronavirus Disease (COVID-19): Dashboard A Pneumonia Outbreak Associated With a New Coronavirus of Probable Bat Origin Clinical Characteristics of Coronavirus Disease 2019 in China Why does COVID-19 disproportionately affect older people? Evaluation and Treatment Coronavirus (COVID-19). StatPearls [Internet Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Exploitation of glycosylation in enveloped virus pathobiology Vulnerabilities in coronavirus glycan shields despite extensive glycosylation Comprehensive characterization of N-and O-glycosylation of SARS-CoV-2 human receptor angiotensin converting enzyme 2 Analysis of the SARS-CoV-2 spike protein glycan shield reveals implications for immune recognition Site-specific glycan analysis of the SARS-CoV-2 spike Virus-Receptor Interactions of Glycosylated SARS-CoV-2 Spike and Human ACE2 Receptor Identification and Validation of Atypical N-Glycosylation Sites Site-Specific Profiling of Serum Glycoproteins Using N-Linked Glycan and Glycosite Analysis Revealing Atypical N-Glycosylation Sites on Albumin and α-1B-Glycoprotein Variations in SARS-CoV-2 Spike Protein Cell Epitopes and Glycosylation Profiles During Global Transmission Course of COVID -19 The proximal origin of SARS-CoV-2 Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Structural basis of receptor recognition by SARS-CoV-2 Cell entry mechanisms of SARS-CoV-2 Conformational dynamics of SARS-CoV-2 trimeric spike glycoprotein in complex with receptor ACE2 revealed by cryo-EM Structure, Function, and Antigenicity of the SARS-CoV-2 Deducing the N-and Oglycosylation profile of the spike protein of novel coronavirus SARS-CoV-2 Computational prediction of N-linked glycosylation incorporating structural properties and patterns Identification of novel N-glycosylation sites at non-canonical protein consensus motifs Mapping human N-linked glycoproteins and glycosylation sites using mass spectrometry The Protein Data Bank Potent neutralizing antibodies directed to multiple epitopes on SARS-CoV-2 spike A Novel Bat Coronavirus Closely Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site of the Spike Protein A novel SARS-CoV-2 related coronavirus in bats from Cambodia Global initiative on sharing all influenza data -from vision to reality Emergence of a SARS-CoV-2 variant of concern with mutations in spike glycoprotein Resurgence of COVID-19 in Manaus, Brazil, despite high seroprevalence Coronavirus disease (COVID-19): Vaccines mRNA vaccine-elicited antibodies to SARS-CoV-2 and circulating variants RCSB -Protein Data Bank Effect of somatic hypermutation on potential N-glycosylation sites in human immunoglobulin heavy chain variable regions Quantitative glycan profiling of normal human plasma derived immunoglobulin and its fragments Fab and Fc Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic Genomic determinants of pathogenicity in SARS-CoV-2 and other human coronaviruses Bat and pangolin coronavirus spike glycoprotein structures provide insights into SARS-CoV-2 evolution A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology COVID-19 Vaccines vs Variants-Determining How Much Immunity Is Enough Antibody Resistance of SARS-CoV-2 Variants B.1.351 and B.1.1.7 Sensitivity of SARS-CoV-2 B.1.1.7 to mRNA vaccine-elicited antibodies Comprehensive mapping of mutations in the SARS-CoV-2 receptorbinding domain that affect recognition by polyclonal human plasma antibodies Serum Proteins During Human Aging Age-related galactosylation of the N-linked oligosaccharides of human serum IgG Protein modification and maintenance systems as biomarkers of ageing Immunoglobulin G glycosylation in aging and diseases Structure of SARS Coronavirus Spike Receptor-Binding Domain Complexed with Receptor Molecular Evolutionary Genetics Analysis across Computing Platforms Authors declare to have no conflict and no competing financial interests. Figures Table 1