key: cord-0873741-l3qygxgi authors: Zhang, Yong; Zhao, Wanjun; Mao, Yonghong; Chen, Yaohui; Wang, Shisheng; Zhong, Yi; Su, Tao; Gong, Meng; Du, Dan; Lu, Xiaofeng; Cheng, Jingqiu; Yang, Hao title: Site-specific N-glycosylation Characterization of Recombinant SARS-CoV-2 Spike Proteins date: 2021-02-11 journal: Mol Cell Proteomics DOI: 10.1074/mcp.ra120.002295 sha: 2195a72c7f85a7e6e33d81ed555f76c67f5d5d3a doc_id: 873741 cord_uid: l3qygxgi The glycoprotein spike (S) on the surface of SARS-CoV-2 is a determinant for viral invasion and host immune response. Herein, we characterized the site-specific N-glycosylation of S protein at the level of intact glycopeptides. All 22 potential N-glycosites were identified in the S-protein protomer and were found to be preserved among the 753 SARS-CoV-2 genome sequences. The glycosites exhibited glycoform heterogeneity as expected for a human cell-expressed protein subunit. We identified masses that correspond to 157 N-glycans, primarily of the complex type. In contrast, the insect cell-expressed S protein contained 38 N-glycans, completely of the high-mannose type. Our results revealed that the glycan types were highly determined by the differential processing of N-glycans among human and insect cells, regardless of the glycosites’ location. Moreover, the N-glycan compositions were conserved among different sizes of subunits. Our study indicate that the S protein N-glycosylation occurs regularly at each site, albeit the occupied N-glycans were diverse and heterogenous. This N-glycosylation landscape and the differential N-glycan patterns among distinct host cells are expected to shed light on the infection mechanism and present a positive view for the development of vaccines and targeted drugs. The spread of a novel severe acute respiratory syndrome coronavirus (SARS-CoV-2) has caused a pandemic of coronavirus disease 2019 (COVID- 19) worldwide. Distinguished from severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV), SARS-CoV-2 transmits more rapidly and efficiently from infected individuals, even those without symptoms, to healthy humans, frequently leading to severe or lethal respiratory symptoms (1, 2) . The World Health Organization has declared the spread of SARS-CoV-2 a Public Health Emergency of International Concern. As of August 11, 2020, SARS-CoV-2 has led to over twenty million confirmed cases. From SARS-CoV to SARS-CoV-2, the periodic outbreak of highly pathogenic coronavirus infections in humans urgently calls for strong prevention and intervention measures. However, there are no approved vaccines or effective antiviral drugs for either SARS-CoV or SARS-CoV-2. Human coronaviruses, including HCoV 229E, NL63, OC43, and HKU1, are responsible for 10-30% of all upper respiratory tract infections in adults. SARS-CoV-2 can actively replicate in the throat; however, this virus predominately infects the lower respiratory tract via the envelope spike (S) protein (2, 3) . Due to its high exposure on the viral surface, the S protein can prime a protective humoral and cellular immune response, thus commonly serving as the main target for antibodies, entry inhibitors and vaccines (4) (5) (6) . It has been found that human sera from recovered COVID-19 patients can neutralize SARS-CoV-2 S protein-overexpressed pseudovirions effectively (7) . The use of convalescent sera in the clinic is actively undergoing a comprehensive evaluation. However, the passive antibody therapy with convalescent sera would be a stopgap measure and may not provide a protective immunity owing to their limited cross reactions (7, 8) . Therefore, specific neutralizing J o u r n a l P r e -p r o o f antibodies and vaccines against SARS-CoV-2 are in rapid development for providing potent and long-lasting immune protection (9) (10) (11) . A mature SARS-CoV-2 has four structural proteins, including the S protein, envelope (E) protein, membrane (M) protein, and nucleocapsid (N) protein (1) . Given its indispensable role in viral entry and infectivity, the S protein is probably the most promising immunogen, especially given the comprehensive understanding of the structure and function provided by recent studies (4, (12) (13) (14) . The S protein is comprised of an ectodomain, a transmembrane anchor, and a short C-terminal intracellular tail (15) . The ectodomain consists of a receptor-binding S1 subunit and a membrane-fusion S2 subunit. Following attachment to the host cell surface via S1, the S protein is cleaved at multiple sites by host cellular proteases, consequently mediating membrane fusion and making way for the viral genetic materials to enter the host cell (1, 4, 6) . The S protein can bind to the angiotensin-converting enzyme II (ACE2) receptor on host cells (13, 16) . The recognition of the S protein to the ACE2 receptor primarily involves extensive polar residue interactions between the receptor-binding domain (RBD) and the peptidase domain of ACE2 (13, 14) . The S protein RBD is located in the S1 subunit and undergoes a hinge-like dynamic movement to capture the receptor through three grouped residue clusters. Consequently, the S protein of SARS-CoV-2 displays an up to 10-20-fold higher affinity for the human ACE2 receptor than that of SARS-CoV, supporting the higher transmissibility of this new virus (13, 14) . Apart from the structural information at the residue level, the trimeric S protein is highly glycosylated, possessing 22 potentially N-linked glycosylation motifs (N-X-S/T, X≠P) in each protomer (17) (18) (19) . The N-glycans on the S protein play a pivotal role in proper protein folding and protein priming by host proteases. J o u r n a l P r e -p r o o f Importantly, glycosylation is an underlying mechanism for coronavirus to evade both innate and adaptive immune responses of their hosts, as the glycans might shield the amino acid residues of viral epitopes from cell and antibody recognition (4, 5, 20) . Cryo-EM has revealed the N-glycosylation on 14-16 of 22 potential sites in the SARS-CoV-2 S protein protomer (4, 13) . However, these glycosites and their glycan occupancies need to be experimentally identified in detail. Glycosylation analysis via glycopeptides can provide insight into the N-glycan microheterogeneity of a specific site (21) . Therefore, further identification of site-specific N-glycosylation information of the SARS-CoV-2 S protein, including that regarding intact N-glycopeptides, glycosites, glycan compositions, and the site-specific number of glycans, could be meaningful to obtain a deeper understanding of the mechanism of the viral invasion and provide guidance for vaccine design and antiviral therapeutics development (4, 22) . Thus, Watanabe et al. (2020) first identified the 22 potential N-glycosites and their linked N-glycans on the human cell-expressed recombinant S protein variant (17) . In this study, we characterized the site-specific N-glycosylation of both human and insect cell-produced SARS-CoV-2 S proteins by comparative analysis of the intact glycopeptides using tandem mass spectrometry (MS/MS). Based on an integrated method(23), we identified 22 potential N-glycosites and their corresponding N-glycans from the recombinant S protein. All of these glycosites were found to be highly conserved among SARS-CoV-2 genome sequences. The glycosite-specific occupancy by different glycoforms was resolved and compared among S protein subunits expressed in human cells and insect cells. These detailed glycosylation profiles decoded from MS/MS analysis are expected to facilitate the development of vaccines and therapeutic drugs against SARS-CoV-2. J o u r n a l P r e -p r o o f Experimental design and statistical rationale-Recombinant SARS-CoV-2 spike proteins or the subunits (100 µg) expressed in insect and human cells were digested using trypsin, Glu-C, and a combination of trypsin and Glu-C. Digestion products were enriched by Zic-HILIC and digested using PNGase F. Finally, the intact N-glycopeptides before enrichment, the intact N-glycopeptides after enrichment, and deglycosylated peptides were analyzed by stepped collision energy-higher-energy collisional dissociation (SCE-HCD)-MS/MS. Data analysis was performed using the Byonic software (version 3.6.0, Protein Metrics, Inc.) and was verified manually. Three technical replicates were performed. The number of intact N-glycopeptides and N-glycans identified from triplicate replicates was analyzed by Student's t-test for statistical comparison between the two groups, before and after enrichment. Data were presented as mean ± SD. A P-value < 0.05 was considered significant. Data Analysis-The raw data files were searched against the SARS-CoV-2 S protein sequence using Byonic software (version 3.6.0, Protein Metrics, Inc.), with the mass tolerance for precursors and fragment ions set at ±10 ppm and ±20 ppm, respectively. Two missed cleavage sites were allowed for trypsin or/and Glu-C digestion. The fixed modification was carbamidomethyl (C), and variable modifications included oxidation (M), acetyl (protein N-term), and deamidation (N). In addition, 38 insect N-glycans or 182 human N-glycans were specified as N-glycan modifications for intact N-glycopeptides before or after enrichment. We then checked the protein database options, including the decoy database. All other parameters were set at the default values, and protein groups were filtered to a 1% false discovery rate based on the number of hits obtained for searches against these databases. Stricter quality control J o u r n a l P r e -p r o o f methods for intact N-glycopeptides and peptide identification were implemented, requiring a score of no less than 200 and identification of at least six amino acids. Furthermore, all peptide spectrum matches (PSMs) and glycopeptide-spectrum matches (GPSMs) were examined manually. Quantitative analysis of the intact N-glycopeptide was performed as described previously (24) . Quantification information (MS1 peak intensity) of N-glycopeptides spectra was acquired from the "allpeptide.txt" file in MaxQuant (Max Planck Gesellschaft, Munich, Germany) based on their unique MS/MS scan numbers from Byonic results". N-glycosite conservation analysis was performed using R software packages. Model building based on the Cryo-EM structure (PDB: 6VSB) of SARS-CoV-2 S protein was performed using PyMOL. revealed that the glycosylated coronavirus S protein plays a critical role in the induction of neutralizing antibodies and protective immunity. However, the glycans on S protein might also surround the protein surface and form an immunologically inert "self" glycan shield for virus evasion from the immune system (5, 20, 25) . Herein, S1A ). The missing potential N-glycosites were found back by introducing the endoproteinase Glu-C (Fig. S1B) . Hence, we took advantage of this complementary trypsin and Glu-C digestion approach by using either a single or dual enzyme (Fig. S1C ). Meanwhile, the recombinant SARS-CoV-2 S protein S1 subunit expressed in human cells was obtained for analysis of the site-specific N-glycans, as the N-glycan compositions in insect cells would be different from those in native human host cells (26) . The S1 subunit contains 681 amino acids (residues 16-685) and 13 potential N-glycosites. Trypsin alone or dual digestion could cover all potential N-glycosites ( Fig. S1C) . By doing so, each N-glycosylation sequon of the different recombinant proteins was covered by glycopeptides of a suitable length for achieving good ionization and fragmentation. To raise the abundance of intact glycopeptides, zwitterionic hydrophilic interaction liquid chromatography (Zic-HILIC) materials were used to enrich glycopeptides. Concurrently, the enrichment of intact N-glycopeptides can reduce signal suppression J o u r n a l P r e -p r o o f from unglycosylated peptides. However, there are no materials available that can capture all glycopeptides without preference. For these reasons, site-specific glycosylation was determined based on a combined analysis of the intact N-glycopeptides before and after enrichment. Furthermore, the deglycosylated peptides following enrichment were used to confirm or retrieve the N-glycosites by removing potential interferences from glycans. In brief, the integration of complementary digestion and N-glycoproteomic analysis at three levels (before and after enrichment, and at the deglycopeptides levels) is a promising approach to comprehensively and confidently profile the site-specific N-glycosylation of recombinant SARS-CoV-2 S proteins ( Fig. 1) . contains 22 potential N-glycosites. Using our integrated analysis method, 21 glycosites were assigned unambiguously with high-quality spectral evidence ( Fig. 2A and Table S1 ). One glycosite, N1134, was ambiguously assigned with relatively lower spectral scores (score < 200) (Fig. S2) . Nevertheless, the N1134 glycosite has been observed in the Cryo-EM structure of the SARS-CoV-2 S protein(4). The relatively low spectral evidence of this glycosite indicates that a low-frequency glycosylation may occur, because our integrated methods, including glycopeptide enrichment and deglycosylation, failed to improve the spectra. Apart from the canonical N-glycosylation sequons, three non-canonical motifs of N-glycosites (N164, N334, and N536) involving N-X-C sequons were not identified as glycosylated. Before enrichment, an average of 15 N-glycosites from trypsin-digested peptides and 13 N-glycosites from Glu-C-digested peptides were assigned. In contrast, hydrophilic enrichment resulted in a significant increase of these glycosites to 18 and 16, respectively (Table S1 ). The representative spectra of one intact N-glycopeptide J o u r n a l P r e -p r o o f (N149) before and after enrichment are shown in Fig. S3 . Complementary digestion with trypsin and Glu-C promoted the confident identification of four N-glycosites (N603, N616, N709 and N717) on two intact N-glycopeptides (Table S1 and Fig. S1C ). The introduction of Glu-C digestion resulted in the production of two short intact N-glycopeptides containing 23 and 36 amino acids, respectively. These peptides are more suitable for achieving better ionization and fragmentation than the long peptide of 48 and 57 amino acids obtained from trypsin digestion (Fig. S4) . Deglycopeptides are suitable for verifying glycosylation sites (Fig. S5) . Unexpectedly, deglycopeptide peptides led to the loss of a few glycosites, presumably because of peptide loss during deglycosylation procedures. However, almost all glycosites were confidently confirmed using trypsin and Glu-C dual digestion (Table S1 ). For the recombinant protein S1 subunit expressed in human cells, all 13 N-glycosites were assigned unambiguously (Table S2 ). Finally, we profiled all 22 potential N-glycosites of S protein (Table S3 and S4). These sites were preferentially distributed in the S1 subunit of the N-terminus and the S2 subunit of the C-terminus, including two sites in the RBD ( Fig. 2A and 2B ). To visualize the N-glycosylation on the protein structure, all of the experimentally determined N-glycosites were hand-marked on the surface of the trimeric S protein following refinement of the recently reported SARS-CoV-2 S protein Cryo-EM structure (PDB: 6VSB) ( (Table S5) identified in SARS-CoV-2 S in this study were not detected in SARS-CoV S (N1140 and N1155) in previous studies (5, 28) . Our results suggest that the preferential change of the glycosylation landscape of the S1 subunit tends to change the distribution of glycan shield, especially in the N terminal half of S1 (Fig. S6 ). N-glycopeptide analysis can provide N-glycoproteomic information, including the composition and number of N-glycans decorating a specific N-glycopeptide or N-glycosite. The potential N-glycopeptides in the S protein sequence are shown in Fig. S1 . Hundreds of non-redundant intact N-glycopeptides were identified from the recombinant S ectodomain (Table S3 ) and S1 subunit (Table S4 ). Representative and high-quality spectra of intact N-glycopeptides are shown in Fig. S7 . Following glycopeptide enrichment, the number of intact N-glycopeptides and N-glycans significantly increased (P<0.05) (Fig. 3A and Fig. 3B) . Furtherly, the intact N-glycopeptides were further quantified, and the top five N-glycan compositions on J o u r n a l P r e -p r o o f each site are shown in Fig. S8, Fig. S9 and Table S9 . Although not all identified intact N-glycopeptides can be quantified, those highly N-glycosylated sites and attached N-glycans can be highlighted. S protein expressed in insect cells had smaller and fewer complex N-glycans attached to intact N-glycopeptides than S1 subunit produced in human cells. Both recombinant products contained the common N-acetylglucosamine (HexNAc) as a canonical N-glycan characteristic ( Fig. 3C and 3D ). S protein expressed in insect cells were decorated with 38 N-glycans, with the majority preferentially containing oligomannose (Hex) and fucose (Fuc) (Fig. 3C and Table S3 ). By contrast, the S1 subunit expressed in human cells were attached to up to 157 N-glycans, mainly containing extra N-acetylglucosamine (HexNAc) and galactose (Hex), variably terminating with sialic acid (NeuAc) (Fig. 3D and Table S4 ). Returning to the glycosite level, most of the N-glycosites in the S protein were modified with 17-35 types of N-glycans and classified into a high proportion of high-mannose N-glycans (~65%) and a lower proportion (~23%) of hybrid N-glycans. Almost all N-glycosites contained no more than 10% of complex N-glycans (Fig. 3E ). For the S1 subunit expressed in human cells, the occupancy of N-glycans on each N-glycosite was quite nonuniform. Surprisingly, six N-glycosites (N122, N165, N282, N331, N343, and N657) were decorated with markedly heterogeneous N-glycans of up to 139 types. The average occupancies of all glycosites presented as an overwhelming proportion (~75%) of complex N-glycans and a small proportion of hybrid (~13%) or high-mannose (~12%) N-glycans (Fig. 3F) . The glycan occupancy on two N-glycosites (N331 and N343) of RBD were identified ( Fig. 3E and 3F) . The high occupancy of RBD glycosites by various N-glycan compositions implies that N-glycosylation might be associated with the J o u r n a l P r e -p r o o f recognition of RBD to ACE2 receptor, since the interaction between RBD and ACE2 mainly depends on polar residue interactions (14) . Our results suggest that S proteins expressed in different cells display distinct N-glycosylation patterns. In particular, the glycosylation of the S protein in human cells exhibits remarkable heterogeneity on N-glycosites. However, the N-glycan types on each glycosite is primarily determined by the host cells rather than the location of different glycosites (Fig. 3E-3F, Fig. S10 ). confirm site-specific N-glycan occupancy and exploit the potential impact on N-glycosylation by different protein sizes, recombinant RBDs (residues 319-541) from both human and insect cells were further analyzed (Table S6 ). The representative glycan compositions and deduced structures are shown on each site (Fig. 4A) . Intriguingly, the number of glycan compositions and their types on each glycosite ( Fig. 4B and 4C) are very close to those found in the S ectodomain and S1 subunit ( Fig. 3E and Fig. 3F ). The human cell-produced RBDs displayed more N-glycan compositions and complex glycan types, compared to insect cell-expressed proteins (Fig. 4B and 4C ). Moreover, more than 80% of glycan compositions are identical at each site among different lengths of insect cell-expressed proteins (Fig. 4D) . Similarly, over 75% of the glycan compositions at each site were found to be shared by human cell-produced products (Fig. 4E) . The N-glycosylation of RBDs was verified by SDS-PAGE (Fig.S11) . These results suggest that the N-glycan compositions are conserved among different sizes of RBD proteins. Thus, our data reveal a regular N-glycan occupancy on S protein, despite the heterogeneity in N-glycan compositions. Intriguingly, the N-glycan types on S protein subunits are predominantly determined by host cells, regardless of the location of glycosites. The global outbreak and rapid spread of COVID-19 caused by SARS-CoV-2 urgently call for specific prevention and intervention measures (29) . The development of preventative vaccines and neutralizing antibodies remains a chief goal in the efforts to control viral spread and stockpile candidates for future use. However, this work greatly relies on the understanding of the antigen structure and state of glycosylation for the rational determination of accessible epitopes. The S protein is posited to be the main or even the only antigen on viral surfaces for priming the immune system to produce an effective response (20, 25) . Previous studies have revealed the structural information of the SARS-CoV-2 S protein and found the coverage of N-glycans (4, 13, 17) . In this study, we profiled and compared the site-specific N-glycosylation of the recombinant SARS-CoV-2 S protein expressed in insect and human cells. Glycosylation promotes proper glycoprotein folding; however, the glycans obstruct receptor binding and proteolytic processing during antigen presentation (20, 22, 30) . Characterization of the landscape of N-glycans on the SARS-CoV-2 S protein is crucial for promoting immunogen design and prevention of the potential viral evasion of intervention measures (5, 20, 31) . Precise characterization of intact N-glycopeptides can reveal the occupancy of each glycosite by different glycoforms (32, 33) . In this J o u r n a l P r e -p r o o f study, all 22 N-glycosites of the S protein were identified ( Fig. 2A and 2B ). By comparison, the alteration of N-glycosites among SARS-CoV-2 and SARS-CoV S proteins focused on glycosites in the S1 subunit (Fig. S6) . The N-glycosites located in the S2 subunit are completely conserved, and seven out of nine sites, along with glycans have been disclosed by previous studies (4, 5) . Moreover, we found that N-glycosites in the S protein were highly preserved among 145 SARS-CoV-2 S protein variants (Table S5) , which is advantageous for circumventing potential viral immune evasion from the vaccines and neutralizing antibodies currently being developed. Glycosylation of proteins is intricately processed by various enzymes coordinated in the endoplasmic reticulum and Golgi apparatus. The glycan composition and structure decoration at specific sites occurred in a non-templated manner governed by host cells, thus frequently resulting in a heterogeneous glycan occupancy on each glycosite (34) . Glycosylation processing in insect and human cells can yield a common intermediate N-glycan, which will further elongate in human cells but is only trimmed in insect cells to form end products (26) . Consequently, the S protein expressed in human cells displayed a larger size and a much higher proportion of complex N-glycans than that expressed in insect cells, owing to the additional elongation of the glycan backbone with multiple oligosaccharides (Fig. 3C-3F, Fig. S8-9 , Table S7 ). (17, 18) . In contrast, S protein expression in insect cells led to a high ratio of high-mannose N-glycans ( Fig. 3C and 3E, Fig. S8-9 , Table S7 ), which J o u r n a l P r e -p r o o f has also been found in the insect cell-produced HCoV-NL63 S protein, despite expression in a different insect cell, the Drosophila S2 cell (20) . In accordance with our study, Watanabe et al. (2020) also identified 22 N-glycosites and 110 N-glycans in human-cell-produced S protein variants. However, eight sites (61, 122, 234, 603, 709, 717, 801, and 1074 ) out of 22 were mainly modified by the oligomannose-type N-glycan. Moreover, other sites were occupied by the complex-type N-glycan. In this study, we found that the human cell-expressed native S1 subunit using human cell-expressed S1 and S2 subunits. These glycosites were mainly modified by both high mannose-and complex-type N-glycans. Moreover, two O-glycosites (323 and 325) were identified and predominantly modified by Core-1 and Core-2 O-glycans (18) . A detailed comparison between these studies on S protein N-glycosylation is shown in Table S8 . Intriguingly, during the review of our manuscript, the native glycans were identified by analyzing the virus sample using mass spectrometer. Almost all 22 glycosites were modified with the complex type J o u r n a l P r e -p r o o f N-glycans, which is highly consistent with the findings in our study (36) . Furthermore, in another study, we have revealed that the SARS-CoV-2 S protein is also a mucin-type glycoprotein (37) . Despite of the occupancy by the high-abundant complex type N-glycans, a low ratio (~12%) of high-mannose glycans ubiquitously exist on human cell-expressed S protein ( Fig. 3F and Fig. S9 ). The HIV envelope glycoprotein gp120 is heavily decorated with the immature intermediate, high-mannose glycans. The high-density glycans surrounding HIV glycoproteins limit the accessibility of glycan biosynthetic processing enzymes, terminating the synthesis of more complex end products (22, 31) . By contrast, the high ratio of complex N-glycans in the SARS-CoV-2 S protein were successfully processed on most glycosites by the enzymes, without extensive obstruction by the on-going synthesis of glycan shields ( Fig. 3F and Fig. 4C ). Therefore, we posit that the glycan coverage on the SARS-CoV-2 S protein could leave relatively accessible antigens and epitopes, although the complex N-glycans might mask some surface immunogens. These features may provide a promising landscape on the SARS-CoV-2 S protein for immune recognition. This potential is bolstered by the findings that the convalescent sera from COVID-19 patients contains antibodies against the SARS-CoV-2 S protein (7, 10) . A previous study on SARS-CoV has revealed that the oligomannose on the S protein can be recognized by mannose-binding lectin (MBL) and may interfere with viral entry into host cells by inhibition of S protein function (38) . Besides the direct neutralization effect, MBL, as a serum complement protein, can initiate the complement cascade. Complement hyper activation in lung tissues of COVID-19 patients has been revealed by a recent preprint study (39) . However, it remains unknown whether MBL can bind to oligomannose on SARS-CoV-2 S protein to impact the viral spread or initiate complement activation in J o u r n a l P r e -p r o o f patients. Glycans also play crucial and multifaceted roles in B cell and T cell differentiation via cell-surface or secreted proteins, including selectins, galectins, and siglecs, which can further connect SARS-CoV-2 to immune response and immune regulation (34) . In particular, the complex N-glycans are ligands for galectins, which are able to engage different glycoproteins to regulate immune cell infiltration and activation upon virus infection (40, 41) . These mechanisms underlying SARS-CoV-2 infection and spread are worth further clarification based on a detailed analysis of clinical characteristics in humoral and cellular immunity. The remarkable heterogeneity of N-glycosylation in the S protein subunit expressed in human cells was revealed in our study ( Fig. 3F and Fig. S9 ) and other studies (17, 18) . By contrast, the N-glycosylation of the S protein subunit in insect cells showed less heterogeneity and complexity than that of human cell-derived proteins (Fig. 3E) . Moreover, the site-specific glycan occupancy tended to be identical in the same host cell, regardless of protein length ( Fig. 4D and 4E ). These results indicate the N-glycan compositions and types on S protein largely attribute to different host cells with the differential processing pathways of glycosylation. We can expect that the native N-glycosylation profile of the SARS-CoV-2 S protein in humans tends to be consistent with that of the recombinant protein expressed in human cells, unless the virus buds off early in the glycosylation processing pathway and produces immature glycans (26, 34, 42) . Intriguingly, the immature N-glycans such as high mannose are regarded as non-self-like glycans when occurring in dense patches (42, 43) . Therefore, the insect cell-expressed recombinant antigens predominantly decorated with high mannose may be more immunogenic than those produced in human cells (44, 45) . To prime a robust humoral immune reaction upon vaccination against SARS-CoV-2, insect cell-produced antigens could be promising candidates for vaccine J o u r n a l P r e -p r o o f development (46) . Apart from amino acid epitopes, the glycopeptide can be presented by major histocompatibility complex (MHC) and recognized by a CD4+ T-cell population to help B cells produce antibodies against glycans. The glycoconjugate has been used to boost the immune response against infections (30, 31) . The insect cell produced glycopeptides as antigens might prime immune response against glycans, in cases that immature N-glycans occur on the native envelope proteins of SARS-CoV-2, which seems to have emerged during SARS-CoV replication (42) . In contrast, the human cell-expressed S protein subunits as vaccines mimic the "self" glycans in humans, which could impair immune response to the antigens (47) . However, the remaining accessible and non-glycosylated regions of S protein can serve as the antigens to produce vaccines in human cells (11, 48) . The rational design of antigens to prime potent and broad immune responses against accessible epitopes on SARS-CoV-2 S protein is essential and promising. The RBD-containing subunit is an ideal immunogen since antibodies against the receptor-binding motif within RBD could directly block the engagement of S protein to the receptor and inhibit viral infections of host cells. Vaccination with SARS-CoV-2 RBD has been demonstrated to induce protective immunity against SARS-COV-2 (46) . Meanwhile, the subunit vaccines are posited to minimize the potentially undesired immunopotentiation of the full-length S protein, which might induce severe acute injury in the lungs (49) . Intriguingly, SARS-CoV-2 is missing one N-glycosite in RBD compared to SARS-CoV. The remaining two N-glycosites were outside of the motifs essential for direct interaction with the ACE2 receptor(14) (Fig. 2C) . The glycan compositions of RBD are highly identical in the same host cell, regardless of the length of the RBD-containing proteins (Fig. 4B-4E ). These features of the RBD, along with its highly exposed structure, endow more antigens and accessible epitopes for J o u r n a l P r e -p r o o f vaccine design and immune recognition. The RBD-containing proteins, especially the insect cell-expressed products, could become promising candidates for SARS-CoV-2 vaccine development. However, drug discovery related to glycosylation inhibition is supposed to be performed based on human cell-expressed products. In this study, we decoded the site-specific profile of N-glycosylation on SARS-CoV-2 S proteins expressed in insect and human cells, revealing a regular N-glycan site occupancy on S protein, despite the heterogeneity of N-glycan compositions on each site. All glycosites were conserved among the 753 public SARS-CoV-2 genome sequences. In conclusion, our data indicate that differential N-glycan occupancies among distinct host cells might help elucidate the infection mechanism and develop an effective vaccine and targeted drugs. Nevertheless, the implication of S protein site-specific N-glycosylation in immunogenicity, receptor binding, and viral infectivity should be investigated further. Table S1 and Table S2 . Virological assessment of hospitalized patients with COVID-2019 Coronavirus Infections-More Than Just the Common Cold 2020) Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Unexpected Receptor Functional Mimicry Elucidates Activation of Coronavirus Fusion Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV The convalescent sera option for containing COVID-19 Potent Neutralizing Antibodies against SARS-CoV-2 Identified by High-Throughput Single-Cell Sequencing of Convalescent Patients' B Cells 2020) Safety, tolerability, and immunogenicity of a recombinant adenovirus type-5 vectored COVID-19 vaccine: a dose-escalation, open-label, non-randomised Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: implication for development of RBD protein as a viral attachment inhibitor and vaccine Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Structure, Function, and Evolution of Coronavirus Spike Proteins Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus Site-specific glycan analysis of the SARS-CoV-2 spike Deducing the N-and Oglycosylation profile of the spike protein of novel coronavirus SARS-CoV-2 Comprehensive characterization of N-and O-glycosylation of SARS-CoV-2 human receptor angiotensin converting enzyme 2 Glycan shield and epitope masking of a coronavirus spike protein observed by cryo-electron microscopy Human plasma protein N-glycosylation Why Glycosylation Matters in Building a Better Flu Vaccine Comparative Glycoproteomic Profiling of Human Body Fluid between Healthy Controls and Patients with Papillary Thyroid Carcinoma Characterization of N-linked intact glycopeptide signatures of plasma IgGs from patients with prostate carcinoma and benign prostatic hyperplasia for diagnosis pre-stratification The spike protein of SARS-CoV--a target for vaccine and therapeutic development Baculovirus as versatile vectors for protein expression in insect and mammalian cells A new coronavirus associated with human respiratory disease in China Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains A precision medicine approach to managing 2019 novel coronavirus pneumonia Adaptive immune activation: glycosylation does matter Protein and Glycan Mimicry in HIV Vaccine Design Capturing site-specific heterogeneity with large-scale N-glycoproteome analysis Site-Specific Profiling of Serum Glycoproteins Using N-Linked Glycan and Glycosite Analysis Revealing Atypical N-Glycosylation Sites on Albumin and alpha-1B-Glycoprotein Glycosylation in health and disease Identification of N-linked carbohydrates from severe acute respiratory syndrome (SARS) spike glycoprotein Mucin-type O-glycosylation Landscapes of SARS-CoV-2 Spike Proteins. bioRxiv A single asparagine-linked glycosylation site of the severe acute respiratory syndrome coronavirus spike glycoprotein facilitates inhibition by mannose-binding lectin through multiple mechanisms The Sweet-Side of Leukocytes: Galectins as Master Regulators of Neutrophil Function. Front Immunol The role of galectins in virus infection -A systemic literature review Exploitation of glycosylation in enveloped virus pathobiology Glycan clustering stabilizes the mannose patch of HIV-1 and preserves vulnerability to broadly neutralizing antibodies Altered Glycosylation Patterns Increase Immunogenicity of a Subunit Hepatitis C Virus Vaccine, Inducing Neutralizing Antibodies Which Confer Protection in Mice Antigenicity and Immunogenicity of Differentially Glycosylated Hepatitis C Virus E2 Envelope Proteins Expressed in Mammalian and Insect Cells Host protein glycosylation in nucleic acid vaccines as a potential hurdle in vaccine design for nonviral pathogens Anti-spike IgG causes severe acute lung injury by skewing macrophage responses during acute SARS-CoV infection Chengdu Science and Technology Department Foundation (grant number 2020-YF05-00240-SN), and the Science and Technology Department of directed and designed research; Y.Z. and W.Z. directed and performed analyses of mass spectrometry data coordinated acquisition, distribution and quality evaluation of samples J o u r n a l P r e -p r o o f