key: cord-0853618-un1yl499 authors: Hiono, Takahiro; Tomioka, Azusa; Kaji, Hiroyuki; Sasaki, Michihito; Orba, Yasuko; Sawa, Hirofumi; Kuno, Atsushi title: Combinatorial approach with mass spectrometry and lectin microarray dissected glycoproteomic features of virion-derived spike protein of SARS-CoV-2 date: 2021-04-12 journal: bioRxiv DOI: 10.1101/2021.04.10.439300 sha: 3d204c52a1bbf4a20e7721da0f212472b920d7b6 doc_id: 853618 cord_uid: un1yl499 The COVID-19 pandemic caused by the novel coronavirus, SARS-CoV-2, has a global impact on public health. Since glycosylation of the viral envelope glycoproteins is known to be deeply associated with their immunogenicity, intensive studies on the glycans of its major glycoprotein, S protein, have been conducted. Nevertheless, the detailed site-specific glycan compositions of virion-associated S protein have not yet been clarified. Here, we conducted intensive glycoproteomic analyses of SARS-CoV-2 S protein using a combinatorial approach with two different technologies: mass spectrometry (MS) and lectin microarray. Using our unique MS1-based glycoproteomic technique, Glyco-RIDGE, in addition to MS2-based Byonic search, we identified 1,759 site-specific glycan compositions. The most frequent was HexNAc:Hex:Fuc:NeuAc:NeuGc = 6:6:1:0:0, suggesting a tri-antennary N-glycan terminating with LacNAc and having bisecting GlcNAc and a core fucose, which was found in 20 of 22 glycosylated sites. The subsequent lectin microarray analysis emphasized intensive outer arm fucosylation of glycans, which efficiently complemented the glycoproteomic features. The present results illustrate the high-resolution glycoproteomic features of SARS-CoV-2 S protein and significantly contribute to vaccine design, as well as the understanding of viral protein synthesis. The COVID-19 pandemic caused by the novel coronavirus, SARS-CoV-2, has a global impact on public health 1 . 33 As a countermeasure against the COVID-19 pandemic, great efforts have been made to establish vaccines targeting the 34 spike (S) protein of SARS-CoV-2 2 . The coronavirus S protein is a glycoprotein required for virus attachment to host cells 35 and is mainly targeted by the host immune system. Since glycosylation of the viral envelope glycoproteins is known to be 36 strongly associated with their immunogenicity 3 , intensive studies on the glycans of SARS-CoV-2 S protein have been 37 To date, analyses of the glycoforms of viral proteins have been mainly conducted using mass spectrometry (MS) . 45 different taxonomic groups, including human immunodeficiency virus 9,10 , influenza virus 11-13 , and SARS coronavirus 14 . 47 MS-based physical analyses are beneficial in that they provide detailed site-specific glycan compositions. We previously 48 established a liquid chromatography (LC)/MS-based glycoproteomic approach 15,16 . In general, analyses of site-specific 49 glycoforms of glycopeptides require tandem MS-based approaches. Tandem MS analyses of glycopeptides often showed 50 9 LC/MS, and the data were searched using Mascot to identify the spike protein and confirm peptides with the N-146 glycosylation potential consensus sequence, N-!P-(S|T) (!P = any amino acid except proline). In tryptic and chymotryptic 147 digests, no consensus sequence-containing peptide was identified by Mascot search even considering semi-digestion, 148 suggesting high glycan-occupancy for all 22 potential sites (Supplementary Tables S2-4 Table S5 ). In addition, 16 and 11 sites were identified in the tryptic digest and α-Lytic protease digest, 152 respectively (Supplementary Tables S6-7) . Therefore, all 22 potential sites were identified as previously glycosylated sites. 153 Next, using LC/MS2 analysis results of amide-bound glycopeptides of each digest, a Byonic search was carried 154 out to identify glycopeptide forms using an MS/MS spectrum-based approach. The identified glycopeptides are listed in 155 Supplementary Table S8 . A total of 919 site-specific glycans were identified for all 22 potential sites (Supplementary Table 156 S9). Finally, using the same LC/MS analysis data, Glyco-RIDGE analyses were performed to identify glycopeptide forms 157 (site-specific glycan compositions). All matched results are shown in Supplementary HexNAc:Hex:Fuc:NeuAc:NeuGc = 6:6:1:0:0, which appeared at 20 out of 22 glycosylated sites. The most plausible 163 structure predicted from this composition is a tri-antennary glycan having a bisecting GlcNAc and a fucose, probably a 164 core fucose. Please note that, in the following text, Figures, and Tables, "N:N:N:N: N" means the actual glycan composition 165 of HexNAc:Hex:Fuc:NeuAc:NeuGc. To present glycan compositions indicating their structure intuitively, we used the 166 number of saccharides on the trimannosyl core (Man(3)GlcNAc (2)) and showed the number in the order of Hex-HexNAc-167 Fuc-NeuAc-NeuGc, e.g., glycan id 20000 for 2:5:0:0:0, or 22100 for 4:5:1:0:0. The top 21 most frequent compositions 168 contained three high-mannose compositions, (N0000), and complex type compositions, including one or more fucoses and 169 no sialic acid (except one, 32010). The emerging frequency of glycan compositions shows that glycans on the S protein 170 were relatively well processed, i.e., highly branched and fucosylated, moderately sialylated, and low extension 171 (polyLacNAc). Fig. 1 lists the number and rate of compositions at each site. Except for three C-terminal sites, high-mannose 172 compositions were found at each site. To estimate the abundance of the high-mannose compositions, extracted ion 173 chromatograms of the high-mannose composition (20000, 30000, 40000) and the top three compositions of each site were 174 obtained for the common core peptide (data not shown). According to the results, the content of high-mannose composition 175 was classified into three categories: rare (or not detected), moderate, and abundant ( Fig. 1, Supplementary Fig. S1 ). Eight 176 sites were found to contain abundant high-mannose glycans: Asn-61, 122, 234, 603, 709, 717, 801, and 1074 . These sites 177 also contained a relatively higher content of hybrid compositions (#HexNAc=1, over 15%). Conversely, other sites have a 178 tendency to show higher rates of branching, fucosylation, and sialylation. 179 180 Glycan modification susceptibility at each site was estimated by counting the compositions containing the focused 182 motif or their rate. For example, when the fucosylation rates between sites X and Y are compared, the ratio of the number 183 of fucosylated compositions for total number of compositions was compared. On the other hand, since high-mannose type 184 glycans are limited in number, i.e., there are only five compositions for high-mannose glycans (20000-60000), we 185 compared the rate of high-mannose type glycans, by the count of the high-mannose type compositions and the relative 186 intensities of major high-mannose glycans to those of the top three glycopeptides of each site using extracted ion 187 chromatography of the same core glycopeptides. For Asn-61 (NVT), peak intensities of 20000, 30000, and 40000 of 188 glycopeptide positions 59-65 were compared with those of 22200, 21100, and 31100 of the same core glycopeptides. 189 Because intensity of 30000 was significantly high compared to the others, the high-mannose rate of the site was classified 190 as "abundant." Conversely, for Asn-74 (NGT), the intensities of 20000, 30000, 40000, 23300, 13300, and 22300 191 glycopeptides (core positions 66-79) were compared. Slight peaks were observed for high-mannose compositions; thus, 192 the rate is "rare (or not detected)." By comparison, eight sites were found to be high-mannose abundant, 61, 122, 234, 603, 193 709, 717, 801, and 1074 (Fig. 1). 194 195 Glycan stem distribution 196 Next, we moved our attention to the glycan modification pathway from high-mannose to highly modified 197 complex-type glycans in Vero/TMPRSS2 cells. Glycan modifications can be divided into two categories: extension or 198 excision of the glycan stem, such as the addition of GlcNAc, Gal, GalNAc, and end-capping (leaves) with fucose and sialic 199 acid, where polysialylation is not considered. From the identified compositions, it is clear that fucosylation actively occurs 200 in Vero/TMPRSS2 cells; therefore, the formation of a glycan stem will be illustrated on #Hex-#HexNAc matrix considering 201 the biosynthetic pathway. The N-glycan precursor is attached to the Asn side-chain en bloc as Glc(3)Man(9)GlcNAc(2), so 202 glycan information with N-glycan suggested the presence of disialyl T-, sialyl T-, and T-antigens indicated by signals in 241 jacalin, ACA, MPA, and MAH. This was confirmed by the enhanced signals of HPA, a Tn-antigen-specific lectin, after β-242 galactosidase digestion. These O-glycan signals derived from the interaction-based assay were relatively weak compared 243 to those of N-glycan signals, reflecting the difference in density and accessibility between N-and O-glycans against the 244 surrounding molecules. further acquisition of potential glycosylation in the SARS-CoV-2 S protein during evolution should be monitored. 257 In the present study, we utilized two approaches, MS and LMA, for glycoproteomic analyses of the SARS-CoV-260 2 S protein. MS-based analyses reveal site-specific glycans with accurate glycan composition (Supplementary Table S9 , 261 11-12). This approach is especially useful for differentiating complex or high-mannose types or for analyzing the site 262 occupancy of glycans at each site. In contrast, the glycoforms obtained in the present study and the other study were based For in-depth analysis, we further assigned the site-specific glycan compositions using two mass spectrometric 273 approaches. One is the MS2-based approach using the Byonic database search engine, and the other is the MS1-based 274 Glyco-RIDGE approach. The results identified by the two MS approaches are different from each other, and are increased 275 in number when combined (Supplementary Table S14 ); however, it is therefore difficult to estimate the relative quantity 276 between the identified members, especially when they were identified by different approaches. Similarly, when the 277 compositions were identified from different core peptide sequences, the abundance ratio between the members was unclear. 278 Thus, we compared glycan compositions between sites by counting the compositions containing the focused motif or their 279 rate ( Fig. 1) . 280 Our results with MS-based glycoproteomics are in good agreement with the results of Yao et al. for the analysis 281 of intact viral particles and of Watanabe et al. for the recombinant protein 5,8 . The major reason for the retention of high-282 mannose glycans at specific sites is thought to be the low accessibility of glycan-modifying enzymes (mannosidases, 283 GlcNAc transferases, sialyl transferases, etc.) to the glycan. The fact that the rate of hybrid-type glycans (#HexNAc = 284 #GlcNAc = 1) for all compositions is higher at the high-mannose-abundant sites supports this observation. In addition, 285 these sites also tend to have low rates of branching (#HN ≥ 4) and sialylation, e.g., at 122, 234, 603, 709, and 801. 286 This low accessibility appears to be caused by steric hindrance of the lipid bilayer (ER or Golgi membrane) around the site 287 or by burial of the glycan in the protein cleft or the interface between subunits. In fact, sites 61, 122, and 234 were located 288 in the cleft of the protein of the 3D model of the S protein trimer ( Supplementary Fig. S1 ). In contrast, sites 709, 801, and 289 1047 were located on the surface of the protein. These sites are located on the bottom surface of the S protein, which is 290 distant from the head domain. On the other hand, the C-terminus Asn-1158, 1173, 1194 were exclusively decorated with 291 complex-type glycans. Since maturation of glycan and core-protein should proceed simultaneously, the higher frequency 292 of sites 709, 801, and 1047 may reflect unknown intracellular events (e.g., host protein interactions or conformational 293 It is important to note that, at these sites, the high-mannose type is major but not exclusive, as indicated in soluble 295 recombinant protein-based analyses by Watanabe et al 5 . On the eight sites, we also found many branched, highly 296 fucosylated, or sialylated glycan compositions, as well as the results of Yao et al., who analyzed parental Vero cell-derived 297 viral particles 8 . The reason for this difference is unclear; however, the sensitivity of the detection (depth/coverage of the 298 glycome of each site) and molecular state (viral particle or over-expressed protein) may influence the distribution. In 299 contrast, at the low high-mannose sites, the rate of HexNAc ≥ 3 (suggesting branching, including addition of bisecting 300 GlcNAc) is relatively high, especially at Asn-1158, 1173, and 1194 (Fig. 1) . Similarly, Asn-17, 149, 331, 657, 1158, and 301 1173 showed high fucosylation rates (> 80%). As described above, although there is a bias in the glycan composition for 302 each site, there is no strict exclusion, and many complex-type glycans have been found at all sites. This may be partly 303 because S proteins are not in exactly the same three-dimensional structural environment (symmetry) in the trimer of S 304 It is of interest that the glycoform of SARS-CoV-2 S protein was quite different from the previously reported 306 glycoform of the SARS virus S protein 14 . Most of the complex-type N-glycans attached to the SARS virus S protein were 307 represented in the agalactosyl form, suggesting incomplete maturation. In contrast, complex-type N-glycans of the SARS-308 CoV-2 S protein were terminated with either galactose or sialic acid. The difference in the glycoforms of S proteins of 309 SARS virus and SARS-CoV-2 should arise from either a differential impact on the host glycosylation machinery or a 310 differential route of the S protein synthesis pathway. It should also be noted that a murine coronavirus, mouse hepatitis 311 virus, buds from the intermediate compartment between the endoplasmic reticulum and Golgi complex 29 . Virion formation 312 of coronavirus during the early to mid-stage of N-glycan maturation is closely related to the glycoform of coronavirus S 313 proteins. On the other hand, the Golgi localization of mannosyl (alpha-1,3-)-glycoprotein beta-1,2-N-314 acetylglucosaminyltransferase (MGAT1) 30 , which is the initiating enzyme for the synthesis of complex-type N-glycan, 315 contradicts the fact that coronavirus S proteins are decorated with complex-type N-glycans. How the coronavirus S protein 316 encounters MGAT1 is one of the key questions in understanding their post-translational modifications. Perhaps, analyzing 317 the factors associated with the differential glycoforms of SARS and SARS-CoV-2 S proteins may unveil the widely 318 unknown late-stage of the coronavirus lifecycle. 319 The glycoprotein of the viral protein is strongly associated with its immunogenicity 3 . Our previous study 320 demonstrated that virions propagated in different cell lines showed different glycan profiles 20 . Accordingly, optimizing the 321 glycoform of vaccine antigens by selecting the appropriate host cell line is one of the candidate strategies for developing 322 better vaccines. Our approach, a combination of MS-based and lectin interaction-based glycoproteomics, provided a highly 323 accurate glycan profile of SARS-CoV-2 S protein using a relatively lower amount of viral antigen, resulting in the 324 exploration of meta-heterogeneity, to describe a higher level of glycan regulation: the variation in glycosylation across 325 multiple sites of SARS-CoV-2 S protein. The platform also enables the quick provision of glycoform data of vaccine 326 antigens. In summary, our new concept for glycoproteomic analyses of viral proteins should significantly contribute to 327 establishing effective countermeasures against COVID-19, as well as future viral pandemics. 328 Funding 336 The present work was supported in part by the Japan Society for the Promotion of Science (JSPS) KAKENHI 337 Grant Number 18K15176 to TH, and the Japan Agency for Medical Research and Development (AMED) Grant Numbers 338 16809263 and 20he0522002j0001. The funders had no role in the study design, data collection and analysis, decision to 339 publish, or preparation of the manuscript. 340 341 The authors declare that there are no conflicts of interest. 343 344 The present work does not contain any studies with human participants or animals performed by any of the authors. Rates of glycan types such as high-mannose type, hybrid type, and branched type are estimated based on the number of glycan compositions assigned on each site. The cells in which the numbers or rate are larger than criteria presented in the lowest cells, are colored. The rate of HM type is categorized from the extracted ion chromatograms (XIC) as described in the main text. Numbers and rate of terminal modifications such as fucosylation and sialylation are obtained by the same way for type analyses. The cells less than criteria are colored. In the antibody-overlay method, immunoprecipitated viral antigens were detected by overlaying an anti-S specific monoclonal antibody and fluorescent labeled streptavidin. In the direct labeling method, immunoprecipitated viral antigens were directly labeled with fluorophores and S proteins were further purified by second-round immunoprecipitation. -I AOL AAL MAL-I SNA SSA TJA-I PHA-L ECA RCA120 PHA-E DSA GSL-II NPA ConA GNA HHL ACG TxLC-I BPL TJA-II EEL ABA LEL STL UDA PWM Jacalin PNA WFA ACA MPA HPA VVA DBA SBA Calsepa PTL-I MAH WGA GSL-I- 831 VEQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADA SARS-CoV-2/WHU-01 772 831 891 GFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAG SARS-CoV-2/WHU-01 832 891 951 AALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDV SARS-CoV-2/WHU-01 892 951 1011 VNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQ SARS-CoV-2/WHU-01 952 1011 S2 (continued) hCoV-19/England/MILK-9E05B3/2020 769 828 891 GFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAG SARS-CoV-2/WHU-01 832 891 951 AALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDV SARS-CoV-2/WHU-01 892 951 1011 VNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQ SARS-CoV-2/WHU-01 952 1011 1071 LIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQ SARS-CoV-2/WHU-01 1012 1071 Rc-o319 1034 1191 IVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAK SARS-CoV-2/WHU-01 1132 1191 1251 NLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCG SARS-CoV-2/WHU-01 1192 1251 We thank Ms. Misugi Nagai, Ms. Kaori Ohki, and Ms. Masako Sukegawa for their technical help. We thank the 331 National Institute of Infectious Diseases for providing SARS-CoV-2, 2019-nCoV/Japan/TY/WK-521/2020 strain. We 332 would like to thank all researchers who kindly deposited and shared genomic data on GISAID. 333