key: cord-0904948-cvoev4hu authors: Roberts, David S.; Mann, Morgan W.; Melby, Jake A.; Larson, Eli J.; Zhu, Yanlong; Brasier, Allan R.; Jin, Song; Ge, Ying title: Structural O-Glycoform Heterogeneity of the SARS-CoV-2 Spike Protein Receptor-Binding Domain Revealed by Native Top-Down Mass Spectrometry date: 2021-03-01 journal: bioRxiv DOI: 10.1101/2021.02.28.433291 sha: fef2f4f3e748a91ca4a5de88c22ebe68140cd7cc doc_id: 904948 cord_uid: cvoev4hu Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) utilizes an extensively glycosylated surface spike (S) protein to mediate host cell entry and the S protein glycosylation is strongly implicated in altering viral binding/function and infectivity. However, the structures and relative abundance of the new O-glycans found on the S protein regional-binding domain (S-RBD) remain cryptic because of the challenges in intact glycoform analysis. Here, we report the complete structural characterization of intact O-glycan proteoforms using native top-down mass spectrometry (MS). By combining trapped ion mobility spectrometry (TIMS), which can separate the protein conformers of S-RBD and analyze their gas phase structural variants, with ultrahigh-resolution Fourier transform ion cyclotron resonance (FTICR) MS analysis, the O-glycoforms of the S-RBD are comprehensively characterized, so that seven O-glycoforms and their relative molecular abundance are structurally elucidated for the first time. These findings demonstrate that native top-down MS can provide a high-resolution proteoform-resolved mapping of diverse O-glycoforms of the S glycoprotein, which lays a strong molecular foundation to uncover the functional roles of their O-glycans. This proteoform-resolved approach can be applied to reveal the structural O-glycoform heterogeneity of emergent SARS-CoV-2 S-RBD variants, as well as other O-glycoproteins in general. The novel zoonotic 2019 coronavirus disease (COVID- 19) global pandemic 1 has led to more than 102 million reported cases and over 2 million deaths as of February 2021. 2 The causative pathogen of COVID-19, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), utilizes an extensively glycosylated spike (S) protein that protrudes from the viral surface to bind receptor angiotensin-converting enzyme 2 (ACE2) for cell entry. 3, 4 The evolution of the surface S protein is particularly important as it initiates pathogenesis, 5 is the main target of vaccination 6,7 and antibody therapeutic design, 8, 9 and a substantial number of predicted mutations in the S protein regional-binding domain (RBD) can potentially enhance ACE2 binding. 10 Moreover, the more than 636,000 viral genomic sequences generated from global molecular epidemiology studies (as of February 28, 2021, www.gisaid.org) 11 reveal the emergence of a number of variants in the S protein, indicating that SARS-CoV-2 may be evolving to acquire selective replication advantages, drug insensitivity and immunological resistance, thereby underscoring the need for new technologies capable of rapid and deep structural profiling of the virus. [12] [13] [14] Importantly, glycans flank the polybasic furin cleavage site of S protein necessary for S stability during biogenesis and influence conformational dynamics by masking protein regions from cleavage during ACE2 binding. 3, 15, 16 How these different sites are glycosylated and how they influence ACE2 binding are likely to affect cell infectivity and could shield certain epitopes from antibody neutralization. [17] [18] [19] [20] The SARS-CoV-2 S protein carries 22 N-glycosylation sequons per protomer which have been characterized in detail. 21, 22 However, characterization of Oglycosylation has so far been limited, with only putative O-glycosites reported, 16, 22, 23 and the glycan microhetereogeneity, including molecular compositions, structures, and relative abundance, remains cryptic due to the challenges in O-glycan analysis. 5, 17, [24] [25] [26] [27] [28] [29] Importantly, individual glycosites can give rise to many glycan structural variants, i.e. glycoforms, and these differences in glycan structures, even at a single glycosite, can have critical implications on biological functions. 20, [30] [31] [32] It is crucial to decipher the intact glycoforms of various O-glycosites recently found in the S-RBD because they are expected to play a role in viral function and viral binding with ACE2. 17, 22, 23, 33 Therefore, new methods capable of the comprehensive structural elucidation of O-glycans are essential for understanding the functional roles of O-glycans on SARS-CoV-2 pathology and provide new avenues for rational therapeutic development. 28, [34] [35] [36] Conventional methods to analyze protein glycosylation rely on the bottom-up MS approach of analyzing glycopeptides obtained through enzymatic digestion. 31, 37, 38 Although the bottom-up MS approach is a high throughput method for global proteome identification of glycosites and Nglycan heterogeneity from complex samples, the relative abundance of various intact glycoforms and the overall post-translational modification (PTM) compositions of different co-appearing 'proteoforms' 39 are lost. [40] [41] [42] In contrast, by combining glycoproteomics with the top-down MS approach, which preserves the intact glycoprotein enabling high-resolution proteoform-resolved analysis, [43] [44] [45] we could achieve the simultaneous characterization of the molecular structures, the site specificity, and the relative abundance of various glycoforms. Furthermore, by integrating native MS, which has recently emerged as a powerful structural biology tool to study protein structure-function relationships, 46-52 with trapped ion mobility spectrometry (TIMS), 53-55 we can investigate the gas phase structural variants to achieve the direct quantification of individual glycoproteoforms. Here we developed a method for the molecularly detailed analysis of intact O-glycan proteoforms of the S-RBD by top-down MS (Figure 1) . For the first time, we harness the capabilities of a hybrid TIMS quadrupole time-of-flight (QTOF) mass spectrometer (timsTOF Pro) ( Figure 1C ), which provides high resolving power and sensitivity for both selective and comprehensive ion mobility separations of various protein structural variants, [56] [57] [58] [59] and the ultrahigh-resolution of a 12T Fourier Transform Ion Cyclotron Resonance (FTICR) MS ( Figure 1D ), to comprehensively characterize the O-glycoforms of the S-RBD, including the exact glycan structures and relative molecular abundance ( Figure 1E ). We demonstrate that this native top-down MS approach can provide a high-resolution and wholistic proteoform-resolved landscape of diverse O-glycoforms to enable future structure-function studies of the S-RBD. Native TIMS-MS analysis of the S-Protein RBD. We first prepared a native S-RBD protein expressed from HEK 293 cells to perform native top-down MS for comprehensive glycoform analysis (Figure 1) . After ensuring reproducible native S-RBD protein sample preparation ( Figure S1 ), we used the TIMS, the front-end of the timsTOF Pro (see Methods for details), to achieve high resolution ion mobility separation of protein conformers. TIMS, similar to other quantitative ion mobility techniques, reports the rotationally average collision cross section (CCS) of proteins, which relates to their overall size and shape, and consequently can be used to evaluate changes in three-dimensional structure. 57 The S protein is known to carry many complex-type N-glycans, with two glycosites (Asp331 and Asp343) previously reported and extensively characterized on the S-RBD. 21, 22 By contrast, the O-glycans of the S-RBD are less studied and their glycoforms poorly understood, despite the potential insights they may provide to understand viral structure-function relationship. 17,21-23 Since our focus is on O-glycans of the S-RBD, we completely removed the Nglycans from the S-RBD using a native PNGase F treatment (see Methods for details) to minimize the interference posed by the enormous N-glycan heterogeneity ( Figure 3A ). The removal of Nglycans constitutes >10 kDa of molecular weight loss compared to the fully glycosylated S-RBD, as expected from the multiple proteoforms detected by TIMS analysis (Figures 2A,C) . Fortunately, we found that after complete removal of N-glycans, the remaining O-glycans on the S-RBD were now fully resolvable by native MS (Figure 3B ). Although the remaining PNGase F was found coexisting with the S-RBD, further optimization of the deglycosylation treatment We next used TIMS to separate and analyze the various S-RBD glycoforms after the Nglycan removal by taking full advantage of the TIMS front-end of the timsTOF Pro instrument. After careful TIMS tuning and optimization of the ion mobility parameters (Figure S2) , the timsTOF was sufficiently sensitive to allow high resolution ion mobility separation of intact S-RBD O-glycoforms ( Figure 4A , Figure S3 ), in contrast to the poorly resolved native S-RBD ( Figure 2) . The TIMS analysis revealed two distinct protein gas-phase conformers separated in regional mobility between 1.1 to 1.45 1/K0. By interrogating each of the resolved mobility regions, MS 1 analysis revealed that the two gas-phase conformers show shifts in the relative abundance of the S-RBD charge states ( Figure 4B ). The shift in protein charging along with the regional mobility shift together imply possible structural changes which can influence the protein ionization between the mobility regions. This difference in protein conformers was further investigated by plotting the corresponding ion mobility spectrum of the various regions against their calculated CCS values ( Figure 4C ). Applying this method, we further found two distinct S-RBD O-glycoform mobility regions in the mobility Region 2 (Figure 4C lower panel) . The smallest mobility region of the S-RBD Oglycoforms (Region 2a, 2511±5 Å 2 ) is slightly higher than the theoretical value (~2200 Å 2 ) calculated using the IMPACT method 60 from the S-RBD X-ray crystal structure without glycosylation. 61 This discrepancy is likely a result of the influence of the additional O-glycans. 62 CCS measurements of the various ion mobility regions of the S-RBD revealed progressively larger structures between the highlighted mobility regions and further illustrates the separation of the distinct gas-phase conformers ( Figure 4D ). Region 1 (2999±26 Å 2 ) shows the largest calculated CCS value compared to Region 2b (2705±25 Å 2 ) and Region 2a (2511±5 Å 2 ), implying that the S-RBD glycoforms in Region 1 experience additional protein unfolding which increases the overall protein conformer size and signal abundance under electrospray ionization. 63 The variations in S-RBD O-glycoforms CCSs are mobility region specific and consistent with protein conformer changes, illustrating the potential of this approach for investigating the gas-phase structural variations of glycoproteins. Although S-RBD glycoforms can be inferred from the timsTOF analysis, we cannot assign the glycan structures or occupancy from only the results presented in Figures 3,4 due to the mass degeneracy and microheterogeneity of O-glycans. 40, 62 To achieve in-depth glycoform analysis, we further utilized an ultrahigh-resolution 12T FTICR capable of baseline and isotopically resolving the S-RBD O-glycoforms for MS/MS analysis (see Methods for details, Figure S4 ). Following further MS 1 analysis, individual glycoforms were directly visualized with high resolution ( Figure 5A,B) . Following specific quadrupole isolation centered at appropriate m/z widths to capture individual S-RBD proteoforms (Figure 5C) , the resolved intact glycoforms can be characterized by MS/MS using a combination of fragmentation strategies to achieve confident protein sequence assignments ( Figure 5D ). The most abundant charge state (15+) is isolated. Theoretical isotope distributions (red circles) are overlaid on the experimentally obtained mass spectrum to illustrate the high mass accuracy. All individual ion assignments are within 1 ppm from the theoretical mass. (C) Illustration of the isolation of a specific S-RBD glycoform using ultrahigh resolution FTICR to enable glycan site and structure characterization from intact glycoforms. (D) MS/MS characterization of S-RBD proteoform isolated from quadrupole window centered at 1760.5 m/z. The assignments of glycan structures are marked in the spectrum with the legends shown on the right side. Glycoform characterization reveals the specific S-RBD proteoform to have Core 2 type GalNAcGal(GalNeuAc)(GlcNAcGalFuc) glycan. Neutral loss glycan products are labeled. Nterminal acetylation (Ac) is labeled and corresponds to a +42 Da mass shift. The solid star represents the 15+ charge state precursor ion corresponding to 1760.5 m/z, and the hollow star represents the 15+ charge state precursor ion corresponding to 1757.7 m/z (-Ac). The asterisk "*" denotes an oxonium ion loss. By applying this method on all detectable S-RBD proteoforms, a suite of MS/MS spectra were generated for sequence analysis (Figure S5 ). MS/MS data were output from the Bruker DataAnalysis software and analyzed in targeted protein analysis mode using MASH Explorer. 64 This approach allowed for the direct characterization of the glycan structures and their microheterogeneity to reveal multiple S-RBD glycoforms with Core 1 and Core 2 O-glycan structures ( Figure 5 , and full details in Figures S6-11) . This high-resolution method enables the structural characterization of glycoforms that have not been previously observed. As shown for one such example in Figure 5D (highlighted in the red box on the right), neutral loss mapping with the corresponding proteoform sequence mapping confirms the glycan structure assignment with high mass accuracy ( Figure S6) . Additionally, the GalNAcGal(GalNeuAc)(GlcNAcGalFuc) related proteoform without N-terminal acetylation (15+ most abundant charge state, centered at 1757.7 m/z) was also characterized ( Figure S6) . Therefore, the high resolution glycoform mapping presented in Figure 5 illustrates a unique strength of this native top-down MS approach to achieve isotopically resolved and high accuracy characterization of highly heterogenous glycoproteins, a major challenge in intact glycoprotein analysis. 40, 65 We also identified and characterized core 1 (Galβ1-3GalNAc-Ser/Thr) O-glycan structures such as GalNAcGalNeuAc (15+ most abundant charge state, centered at 1723.8 m/z) (Figure S7) , and GalNAcGal(NeuAc)2 (15+ most abundant charge state, centered at 1743.2 m/z) ( Figure S8) . Interestingly, these two Core 1 O-glycans were previously reported for the S-RBD as potential modifications, however the previous studies were not able to resolve the exact glycoforms due to the challenges arising from inferring intact glycoprotein structures from peptide digests and the signal low abundance of O-glycans under conventional MS analysis. 22, 23, 37 However, our native top-down MS strategy enables us to unambiguously reveal the exact glycoform corresponding to each of these Core 1 O-glycans. Two additional Core 2 glycan structures were identified GalNAcGal(GalNeuAc)(GlcNAcGal) (15+ most abundant charge state, centered at 1748.1 m/z, with additional N-terminal acetylated proteoform found at 1750.5 m/z) (Figures S9,10) and GalNAcGal(GalNeuAc)(GlcNAcGalNeuAc) (centered at 1767.6 m/z) (Figure S11 ). This native top-down MS approach enables the proteoform-resolved characterization of S-RBD O-glycoforms to identify and characterize multiple S-RBD O-glycoforms as well as a novel fucose-containing glycoform with Core 2 GalNAcGal(GalNeuAc)(GlcNAcGalFuc) glycan structure ( Figure 6A) . Figure 6A shows all of the 7 identified O-glycoforms of the S-RBD, which are also listed in the table in Figure 6B . As a unique advantage of the top-down MS approach, the molecular abundance of each intact protein glycoform can be relatively quantified. We found that the relative abundance of Core 1 to Core 2 S-RBD O-glycan structures was roughly 70:30, with the Core 1 GalNAcGal(NeuAc)2 being the most abundant O-glycoform (~67% relative abundance) ( Figure 6B) . This ability to unambiguously elucidate the structure of a specific S-RBD glycoform with high accuracy and quantify its relative abundance demonstrates the distinct advantages of this native top-down MS strategy over glycopeptide-based bottom-up MS approaches. 47, 62 Together, this proteoform-resolved intact glycoprotein analysis strategy enables the simultaneous characterization of O-glycan structures of the S-RBD and its microheterogeneity, including structure and relative molecular abundance ( Figure 6C) . Importantly, the accurate determination of the relative abundance of intact glycoforms provides the technical foundation to understand the functional significance of these distinct glycoforms of S protein in the future. Although these identified O-glycoforms are specifically for S-RBD expressed from HEK293 systems, the HEK293 cell expression system is known to reflect the glycosylation sites expected for the viron [21] [22] [23] 27 and is currently the antigen of choice for vaccine candidates and virus functional studies. 66 Samples were directly infused using the nanoElute, injecting 5 μL of native protein sample with a flow rate of 1 μL/min. For the MS inlet, the endplate offset and capillary voltage were set to 500 V and 3800 V, respectively. The nebulizer gas pressure (N2) was set to 1.5 bar, with a dry gas flow rate of 6 L/min at 180 °C. The tunnel out, tunnel in, and TOF vacuum pressures were set to 0.8584 mBar, 2.577 mBar, and 1.752E-07 mBar. To calibrate the MS and trapped ion mobility spectrometry (TIMS) device, Agilent tune mix was directly infused to provide species of known mass and reduced mobility. 37 Data Analysis. All data were processed and analyzed using Compass DataAnalysis 4.3/5.3 and MASH Explorer. 64 Maximum Entropy algorithm (Bruker Daltonics) was used to deconvolute all mass spectra with the resolution set to 80,000 for the timsTOF Pro or with instrument peak width set to 0.05 for the 12T FTICR. Sophisticated Numerical Annotation Procedure (SNAP) peakpicking algorithm (quality factor: 0.4; signal-to-noise ratio (S/N): 3.0; intensity threshold: 500) was applied to determine the monoisotopic mass of all detected ions. The relative abundance for each protein isoform was determined using DataAnalysis. To quantify protein modifications, the relative abundances of specific modifications were calculated as their corresponding percentages among all the detected protein forms in the deconvoluted averaged mass. MS/MS data were output from the DataAnalysis software and analyzed using MASH Explorer 64 for proteoform identification and sequence mapping. All the program-processed data were manually validated. For peak picking, the sophisticated numerical annotation procedure (SNAP) algorithm from Bruker DataAnalysis 5.3 was used with a quality threshold of 0.5 and an S/N lower threshold of 3. All fragment ions were manually validated using MASH Explorer. Peak extraction was performed using a signal-to-noise ratio of 3 and a minimum fit of 60%, and all peaks were subjected to manual validation. All identifications were made with satisfactory numbers of assigned fragment (>10), and a 25-ppm mass tolerance was used to match the experimental fragment ions to the calculated fragment ions based on amino acid sequence. Where μ is the reduced mass of the ion-gas pair ( = !" (!$") , where m and M are the ion and gas particle masses), kb is Boltzmann's constant, T is the drift region temperature, z is the ionic charge, e is the charge of an electron, N0 is the buffer gas density, and K0 is the reduced mobility. Equation 1 was selected to agree with previously published CCS calculations. 57, 58, 68 Theoretical CCS were calculated using the IMPACT method. 60 The Supporting Information is available free of charge on the ACS Publications website. A new coronavirus associated with human respiratory disease in China World Health Organization. Coronavirus Disease (COVID-19) Weekly Epidemiological Update for Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Research and Development on Therapeutic Agents and Vaccines for COVID-19 and Related Human Coronavirus Diseases Safety and immunogenicity of the ChAdOx1 nCoV-19 vaccine against SARS-CoV-2: a preliminary report of a phase 1/2, singleblind, randomised controlled trial An mRNA Vaccine against SARS-CoV-2 -Preliminary Report SARS-CoV-2 Neutralizing Antibody LY-CoV555 in Outpatients with Covid-19 Studies in humanized mice and convalescent humans yield a SARS-CoV-2 antibody cocktail Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding Global initiative on sharing all influenza data -from vision to reality Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care associated COVID-19: a prospective genomic surveillance study. The Lancet Infectious Diseases Circulating SARS-CoV-2 spike N439K variants maintain fitness while evading antibody-mediated immunity Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus Unexpected Receptor Functional Mimicry Elucidates Activation of Coronavirus Fusion The proximal origin of SARS-CoV-2 Beyond Shielding: The Roles of Glycans in the SARS-CoV-2 Spike Protein Glycan Shield and Fusion Activation of a Deltacoronavirus Spike Glycoprotein Fine-Tuned for Enteric Infections Glycan shield and epitope masking of a coronavirus spike protein observed by cryoelectron microscopy Analysis of the SARS-CoV-2 spike protein glycan shield reveals implications for immune recognition Site-specific glycan analysis of the SARS-CoV-2 spike Deducing the N-and O-glycosylation profile of the spike protein of novel coronavirus SARS-CoV-2 N-and O-Glycosylation of the SARS-CoV-2 Spike Protein Mucin-type O-glycosylation -putting the pieces together Analysis of Mammalian O-Glycopeptides-We Have Made a Good Start, but There is a Long Way to Go* Global aspects of viral glycosylation Biological roles of glycans Molecular Basis of SARS-CoV-2 Infection and Rational Design of Potential Antiviral Agents: Modeling and Simulation Approaches N-glycan microheterogeneity regulates interactions of plasma proteins Capturing site-specific heterogeneity with large-scale N-glycoproteome analysis Integrated Omics and Computational Glycobiology Reveal Structural Basis for Influenza A Virus Glycan Microheterogeneity and Host Interactions* The Answer Lies in the Energy: How Simple Atomistic Molecular Dynamics Simulations May Hold the Key to Epitope Prediction on the Fully Glycosylated SARS-CoV-2 Spike Protein Unravelling the Role of O-glycans in Influenza A Virus Infection Sialic Acids on Varicella-Zoster Virus Glycoprotein B Are Required for Cell-Cell Fusion Hepatitis C Virus Envelope Glycoprotein E2 Glycans Modulate Entry, CD81 Binding, and Neutralization A Pragmatic Guide to Enrichment Strategies for Mass Spectrometry-Based Glycoproteomics Current Methods for the Characterization of O-Glycans Proteoform: a single term describing protein complexity Enhancing Accuracy in Molecular Weight Determination of Highly Heterogeneously Glycosylated Proteins by Native Tandem Mass Spectrometry Direct Monitoring of Protein O-GlcNAcylation by High-Resolution Native Mass Spectrometry Hybrid mass spectrometry approaches in glycoprotein analysis and their usage in scoring biosimilarity Nanoproteomics enables proteoform-resolved analysis of low-abundance proteins in human serum Top-down Proteomics: Challenges, Innovations, and Applications in Basic and Clinical Research Top-Down Proteomics: Ready for Prime Time? Comparative Structural Analysis of 20S Proteasome Ortholog Protein Complexes by Native Mass Spectrometry Higher-order structural characterisation of native proteins and complexes by top-down mass spectrometry An integrated native mass spectrometry and top-down proteomics method that connects sequence to structure and function of macromolecular complexes Native mass spectrometry combined with enzymatic dissection unravels glycoform heterogeneity of biopharmaceuticals Collision induced unfolding and dissociation differentiates ATP-competitive from allosteric protein tyrosine kinase inhibitors Native Mass Spectrometry: What is in the Name? Determining the stoichiometry and interactions of macromolecular assemblies from mass spectrometry Structural Analysis of the Glycoprotein Complex Avidin by Tandem-Trapped Ion Mobility Spectrometry-Mass Spectrometry (Tandem-TIMS/MS) Accurate Identification of Isomeric Glycans by Trapped Ion Mobility Spectrometry-Electronic Excitation Dissociation Tandem Mass Spectrometry Fundamentals of Trapped Ion Mobility Spectrometry Trapped Ion Mobility Spectrometry of Native Macromolecular Assemblies Ion Mobility Spectrometry: Fundamental Concepts, Instrumentation, Applications, and the Road Ahead On the Preservation of Noncovalent Peptide Assemblies in a Tandem-Trapped Ion Mobility Spectrometer-Mass Spectrometer (TIMS-TIMS-MS) Towards the analysis of high molecular weight proteins and protein complexes using TIMS-MS Collision Cross Sections for Structural Proteomics Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Correlating Glycoforms of DC-SIGN with Stability Using a Combination of Enzymatic Digestion and Ion Mobility Mass Spectrometry Signal Response of Coexisting Protein Conformers in Electrospray Mass Spectrometry MASH Explorer: A Universal Software Environment for Top-Down Proteomics Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy A vaccine targeting the RBD of the S protein of SARS-CoV-2 induces protective immunity SARS-CoV-2 induces robust germinal center CD4 T follicular helper cell responses in rhesus macaques Theory of plasma chromatography/gaseous electrophoresis This research is supported by NIH R01 GM117058 (to S.J. and Y.G.). Y.G. would like to acknowledge NIH R01 GM125085, R01 HL096971, and S10 OD018475. We would like to thank Guillaume Tremintin, Yue Ju, Conor Mullins, Michael Greig, Gary Kruppa, Paul Speir and Rohan Thakur of Bruker Daltonics for their helpful discussion and provision of the Bruker timsTOF Pro used in this work. The authors declare no competing financial interest.