key: cord-0692460-4d4fewsa authors: Espinosa, Luis Ariel; Ramos, Yassel; Andújar, Ivan; Torres, Enso Onill; Cabrera, Gleysin; Martín, Alejandro; González, Diamilé; Chinea, Glay; Becquet, Mónica; González, Isabel; Canaán-Haden, Camila; Nelson, Elías; Rojas, Gertrudis; Pérez-Massón, Beatriz; Pérez-Martínez, Dayana; Boggiano, Tamy; Palacio, Julio; Lozada-Chang, Sum Lai; Hernández, Lourdes; de la Luz Hernández, Kathya Rashida; Markku, Saloheimo; Marika, Vitikainen; Valdés-Balbín, Yury; Santana-Medero, Darielys; Rivera, Daniel G.; Vérez-Bencomo, Vicente; Emalfarb, Mark; Tchelet, Ronen; Guillén, Gerardo; Limonta, Miladys; Pimentel, Eulogio; Ayala, Marta; Besada, Vladimir; González, Luis Javier title: In-solution buffer-free digestion for the analysis of SARS-CoV-2 RBD proteins allows a full sequence coverage and detection of post-translational modifications in a single ESI-MS spectrum date: 2021-05-10 journal: bioRxiv DOI: 10.1101/2021.05.10.443404 sha: bec0fe6113eff9fe06caeee38f5764eebb4d6e3d doc_id: 692460 cord_uid: 4d4fewsa Subunit vaccines based on the receptor-binding domain (RBD) of the spike protein of SARS-CoV-2, are among the most promising strategies to fight the COVID-19 pandemic. The detailed characterization of the protein primary structure by mass spectrometry (MS) is mandatory, as described in ICHQ6B guidelines. In this work, several recombinant RBD proteins produced in five expression systems were characterized using a non-conventional protocol known as in-solution buffer-free digestion (BFD). In a single ESI-MS spectrum, BFD allowed very high sequence coverage (≥ 99 %) and the detection of highly hydrophilic regions, including very short and hydrophilic peptides (2-8 amino acids), the His6-tagged C-terminal peptide carrying several post-translational modifications at Cys538 such as cysteinylation, glutathionylation, cyanilation, among others. The analysis using the conventional digestion protocol allowed lower sequence coverage (80-90 %) and did not detect peptides carrying some of the above-mentioned post-translational modifications. The two C-terminal peptides of a dimer [RBD(319-541)-(His)6]2 linked by an intermolecular disulfide bond (Cys538-Cys538) with twelve histidine residues were only detected by BFD. This protocol allows the detection of the four disulfide bonds present in the native RBD and the low-abundance scrambling variants, free cysteine residues, O-glycoforms and incomplete processing of the N-terminal end, if present. Artifacts that might be generated by the in-solution BFD protocol were also characterized. BFD can be easily implemented and we foresee that it can be also helpful to the characterization of mutated RBD. The new coronavirus SARS-CoV-2 reported for the first time in Wuhan, China (1) has rapidly spread all over the world, causing more than 3.3 million deaths and 159 million of infected people until May 2021 (2). Currently, this pandemic represents a major threat for the health, the economy and the whole society. The development of effective vaccines as well as the universal access for their massive introduction are urgently needed to control the COVID-19 pandemic worldwide (3) . Nowadays, there are several vaccine platforms being evaluated according to the draft landscape published by the World Health Organization (4), including inactivated and live attenuated virus, nonreplicating and replicating viral vectors, DNA-, mRNA-, virus-like particles, and protein subunit vaccines. Some of them have already been approved by the WHO and regulatory authorities and introduced with favorable results in the clinic (5) . SARS-CoV-2 uses the receptor-binding domain (RBD) of the spike (S) protein for entry into the host cells through a high affinity interaction with its cell-surface expressed receptor, the angiotensin-converting enzyme 2 (ACE2) (6, 7) . The RBD has been proposed for the rational development of protective vaccines against SARS-CoV-2 (8, 9) and nowadays subunit vaccines are well-represented among the candidates investigated in preclinical studies and clinical trials, according a recent WHO report (4) . For a successful introduction of vaccines into the clinic, the immunogens need to be produced at scale and prices affordable for all, including middle-and low-income countries (3) . Probably this is one of the reasons why RBD of SARS-CoV-2, besides its production in mammalian cells (10) , has also been produced in several systems such as insect cells (11) , yeast (12, 13) , C1 Thermothelomyces heterothallica (formerly Myceliophthora thermophila) (14, 15) , baculovirus-silkworm system (16) and bacteria (17) despite of the challenge that represents the expression of a non-globular protein with four disulfide bonds and the requirement of the Nglycosylation for its proper expression and folding (12) . The ICHQ6B guidelines (18), harmonized among most important regulatory agencies worldwide, state the key role of mass spectrometry (MS) to develop well-characterized products in the biotechnology industry. MS is the analytical tool of choice for the verification of the amino acid sequence, to demonstrate the integrity of the N-and C-terminal ends, and to detect posttranslational modifications (PTMs) in natural and recombinant proteins. The presence of PTMs may modify the physico-chemical, and immunological properties of the proteins. In particular, a disulfide bonds arrangement identical to the present in the native protein, is mandatory for biotherapeutics as well as for vaccine development in cases where the antigen should be well folded to raise conformational and topological neutralizing antibodies (19, 20) . Sample processing prior to MS analysis also plays a determinant role in the quality of the results. In the characterization of recombinant proteins, an efficient proteolytic digestion and the recovery of the proteolytic peptides is mandatory to obtain the highest sequence coverage, including the identification of PTMs. In particular, if electrospray ionization mass spectrometry (ESI-MS) is used, a desalting step is needed to ionize properly the proteolytic peptides. This step, although necessary, often comprises the recovery of highly hydrophilic and hydrophobic peptides when, either commercial or in-house prepared micro-columns, based on reverse phase chromatography are used. were not verified by mass spectrometry because of the reduction and S-alkylation steps included in the sample processing, probably introduced to make more efficient the tryptic digestion, impaired their detection (12) . The detection of non-desired free cysteine residues in the molecule, even present as low-abundance species, is also a matter of great importance because they may promote disulfide exchange and generate scrambling variants (22) . In our laboratory, we initially demonstrated that proteins separated by SDS-PAGE can be efficiently in-gel desalted and digested in water with trypsin in absence of traditional saline buffers (23) . This procedure avoids a desalting step of the proteolytic peptides and allows their direct analysis by ESI-MS (23) . This non-conventional procedure moves away from the conventional canons of biochemistry, but it avoids the loss of hydrophobic tryptic peptides and guaranteed sequence coverage higher than the achieved by the traditional in-gel digestion protocol. Our procedure has allowed full-sequence coverage of ATPase subunit 9, the most hydrophobic protein of S. cerevisiae proteome by an in-gel digestion procedure (23) . In the biotechnology industry, for the quality control of recombinant proteins, in-solution protein digestion is more frequently used than in-gel digestion procedures due to its greater efficiency and simplicity and also because sample amount is generally not a concern. Recently, the principles of the in-gel buffer-free digestion protocol (23) were extended to insolution buffer-free digestion (BFD) of other proteins (24) . In-solution BFD protocol improved the sequence coverage of certain regions of proteins represented by short and hydrophilic peptides including some N-glycopeptides, short peptides linked by disulfide bonds as well as hydrophilic C-terminal peptides of proteins that contain a tandem repeat of six histidine residues (24) . In particular, the introduction of a tandem repeat of six histidine residues at the N-or at the Cterminal end of the recombinant proteins is frequently used to facilitate the purification by immobilized metal affinity chromatography. This procedure although efficient, can turn the Nand C-terminal peptides, into more hydrophilic species and it makes more difficult their recovery from the desalting step prior ESI-MS analysis. The integrity of the N-and C-terminal ends is an aspect requested by the ICHQ6B due to their propensity to be proteolytic processed by the host cell proteases and at the same time they carry PTMs (18). Considering that subunit vaccines based on the recombinant RBD are very well-represented among the different strategies for the development of vaccines against COVID -19, in this work, we adapted the in-solution BFD protocol (24) to the analysis of the products of six gene constructs containing the RBD sequence from SARS-CoV-2 obtained in five different expression systems, such as mammalian cells (CHO and HEK-293T), yeast (P. pastoris), bacteria (E. coli) and fungus (Thermothelomyces heterothallica). Unlike the standard protocol that uses salt buffers and desalting through reverse phase microcolumns, the implemented BFD method avoids buffers and desalting is carried out by precipitation, allowing very high sequencecoverage (≥ 99 %) of these RBDs, and the detection of PTMs including those located at the Nand the C-terminal end. The BFD protocol allowed the identification, in a single mass spectrum, of the four native disulfide bonds as well as scrambled disulfide bonds, the presence of free cysteine residues, N-and O-glycosylation, and other PTMs of known and unknown nature linked to an unpaired cysteine residue present in some of the analyzed RBD molecules. Seven RBD recombinant proteins, produced at laboratory scale in a wide range of host cells, were used as model antigens to develop and refine suitable analytical methods for RBD characterization. Table 1 summarizes their sequences. The procedures for cloning, expression and purification are further described below. RBD (319-541) -HEK_A 3 and RBD (319-541) -HEK from transient transfection of HEK-293T cells The coding sequence for RBD 319-541 was optimized for mammalian cell expression (hamster, Cricetulus griseus), using the online gene optimization tools provided by Eurofins (Germany). The optimized nucleotide sequence was assembled and amplified by PCR using gene fragments synthesized by Eurofins and oligonucleotides synthesized at CIGB (Cuba), and cloned into pCMX/His vector through BssHII and NotI restriction sites. The use of NotI site resulted in the introduction of three alanine residues between RBD and His 6 tag in the coded protein. A second version of the genetic construct (coding for a recombinant protein without additional Ala residues) was obtained by assembling RBD 319-541 sequence directly followed by the gene ofHis 6 tag and a stop codon, and cloning that insert in the same expression vector. HEK-293T cells adapted to grow in suspension in Freestyle F-17 medium were transfected with the resulting genetic constructs, using linear polyethyleneimine as the transfection agent. Transfected cells were fed with fresh media plus tryptone (0.5% final concentration) 48 h post-transfection. Supernatant was harvested five days after feeding, and the recombinant protein was purified from cell culture supernatant by IMAC with Ni-NTA Sepharose. Proteins were buffer-exchanged with PBS pH 7.4 using PD-10 columns (GE Healthcare). The whole expression cassette including the gene coding for RBD 319-541 (from CMV promoter to His 6 tag and stop codon) was amplified by PCR from pCMX/His (see the previous section) and re-cloned into the lentiviral vector pL6WBlast (CIGB, Cuba) with the restriction enzymes XhoI and EcoRV. Lentiviral particles containing the gene of interest were produced and used for stable transduction of CHO-K1 cells. Cells producing the recombinant protein were grown in protein-free medium (a mixture of a proprietary Center for Molecular Immunology medium with PFHMII) with shaking. Recombinant RBD 319-541 was purified from cell culture supernatant by IMAC with Ni-IDA Sepharose, followed by SP cation exchange chromatography. Dimeric and monomeric fractions were subsequently isolated by size exclusion chromatography with Superdex 200 in PBS pH 7.4. A sequence coding for residues 331-530 of the Spike protein from SARS-CoV-2 was obtained by reverse transcription from nasopharyngeal swabs of COVID-19 patients and amplification by PCR, adding a Nhe I site at the 5' end and a stop codon and a Sal I site at the 3' end. The resulting amplicon, after digestion with these enzymes, was cloned in-frame into Nhe I/Sal Idigested pET28a+ (Novagen, USA). Transformation of the resulting construct into NiCo21(DE3) cells (New England Biolabs, USA) and culture in a ZY5052 medium (25) resulted in the expression of an N-terminally His-tagged RBD, which was purified from inclusion bodies using immobilized metal affinity chromatography (IMAC) under denaturing conditions. Protein was eluted using 16 mM phosphate-buffered, 300 mM NaCl, 150 mM imidazole, pH 6.8. Fractions were pooled and buffer-exchanged to sodium carbonate/hydrogen carbonate buffer pH 10 using a Sephadex G-25 column (GE Healthcare, UK). Recombinant protein was stored at -20 o C until required. An aliquot of 200 g was buffer-exchanged to PBS (12 mM Na 2 HPO 4 .2H 2 O, 3 mM NaH 2 PO 4 .2H 2 O, 150 mM NaCl pH 7.4) using Sephadex PD-10 columns (Pharmacia). A sequence coding for a C1 endogenous signal sequence, the residues 333-527 of the Spike protein from SARS-CoV-2 (RBD), a GlySer-linker and the C-tag flanked by PmeI sites was amplified by PCR from a synthetic fragment provided by GenScript (USA), codon-optimized for expression in a low protease background C1, Thermothelomyces heterothallica (formerly M. thermophila) strain. The fragment was cloned into a C1 expression vector under a C1 endogenous promoter. The ready expression vector and a mock vector partner needed for a complete marker gene were digested with PmeI and co-transformed to a low protease background C1 strain. The transformation was done with protoplast/PEG method (14) and transformants were selected for nia1+ phenotype and hygromycin resistance. Transformants were streaked once to selective medium and screened by Western blotting of culture supernatant samples from 24-well plate cultures. C1 transformants producing RBD-C-tag were purified by single colony plating and analyzed by PCR for correct integration of the expression cassette and clone purity. A purified clone was then selected for cultivation in a 1 L bioreactor in a fed-batch process with a medium containing glucose as the carbon source and (NH 4 ) 2 SO 4 and yeast extract as the nitrogen sources. The fermentation was carried out for 5 days at pH 6.8 at 38 o C. RBD-C-tag was purified from 116 h time point of the fermentation by C-tag affinity chromatography with the CaptureSelect C-tagXL resin (Thermo Fisher Scientific) according to the manufacturer's protocols. The eluted product was dialyzed against PBS. A sequence coding for residues 331-530 of the Spike protein from SARS-CoV-2 flanked by EcoR I and Xba I sites was amplified by PCR from a synthetic fragment provided by Eurofins (Germany), codon-optimized for expression into Saccharomyces cerevisiae (S. cerevisiae). After digestion with these enzymes, the fragment was cloned in-frame with the S. cerevisiae alpha factor pre-pro peptide into EcoR I/Xba I-digested pPICK-αA (a pPICZ-αA derivative bearing a G418 resistant selection marker). Transformation of the resulting construct into P. pastoris GS115, selection of clones resistant to high G418 concentrations, and culture of one representative clone in BMGY/BMMY (Invitrogen, USA) resulted in the secretion into the supernatant of an RBD variant tagged C-terminally with C-myc and His 6 sequences, which was further purified by IMAC on Ni-NTA agarose and gel filtration. Seven micrograms of the proteins deglycosylated with PNGase F as described above and the N- RBD (319-541) -HEK_A 3 , (RBD (319-541) -CHO) 2 and RBD (331-529) -Ec proteins were separated by SDS-PAGE as described by Laemmli (26) , under reducing and non-reducing conditions. Two micrograms of N-glycosylated and deglycosylated proteins were applied in a 12.5%T, 3%C acrylamidebisacrylamide separating gel at 30 mA/gel until the tracking dye left the gel. Proteins were detected by silver staining (27) or Coomassie Brilliant Blue G-250, gel images were analyzed with a GS-900 calibrated imaging densitometer (Bio-Rad) and processed with Image Lab v6.0 software (Bio-Rad). N-glycosylation profile was determined by using the procedure described by Guile et al. (28) . Briefly, the N-glycans released by PNGase F treatment were derivatized with 2-amino benzamide (2AB) by reductive amination. The chromatographic separation was carried out in an HPLC Prominence-Shimadzu (Japan) using a linear gradient from 20% to 53% of 50 mM, pH 4.4 ammonium formate (solution A) and pure acetronitrile (solution B). 2AB N-glycans separation was performed on an Amide-80 column (TSKgel 250x46 mm, 5 µm, Tosohaas, Japan) and the derivatized oligosaccharides were detected on-line by fluorescence using an excitation and detection wavelengths of 330 nm and 420 nm, respectively. The structural assignment was performed by comparing the experimental GU values with the GlycoStore database (https://glycostore.org/). GU values were calculated from the retention time of each peak using as reference an HPLC separation ran under similar conditions for the 2AB derivatives of a dextran ladder generated by acid partial hydrolysis. Glycans structures were represented according to GlycoStore nomenclature. Both, the SD (Fig. 1a) and the in-solution BFD (24) (Fig. 1b) protocols, start with the S-alkylation of free cysteine residues by adding an excess of N-ethylmaleimide (NEM) or iodoacetamide (IAA). This step blocks the free thiol groups that can be present either because the RBD contains an odd number of cysteine residues or these groups were not quantitatively linked and thus remain partially free by a non-correct folding. At the same time, the alkylating agent added at the beginning of the protocol avoids artifacts due to the disulfide bond exchange during the subsequent steps (22) . This could be more critical in the conventional protocol using a basic pH during tryptic digestion (29, 30) . The use of a slightly acidic pH for trypsin digestion (pH 5.5-6.0) with BFD minimizes artificial modifications introduced during sample preparation such as scrambling due to the presence of free Cys in the analyzed protein. The S-alkylating agent introduce an artificial mass tag that facilitates the assignment when any Cys is partially free and differentiate them from species modified with natural thiol-blocking groups due to alkylating species present in the culture media (31) . For the in-solution SD protocol (Fig. 1a) , the pH of the solution is adjusted at basic pH and the deglycosylated RBD is digested with trypsin during 16 hours due to our interest to guarantee an efficient digestion. Also note that even after disulfide reduction this protein has been digested overnight by other authors (12, 32) . Finally, the digestion is quenched by adding formic acid and the resultant tryptic peptides are desalted by using C 18 -ZipTips and eluted in a solution compatible with ESI-MS analysis. For the in-solution BFD protocol (Fig. 1b) , a desalting step is achieved at the protein level by conventional precipitation protocols using either cold acetone (33) or ethanol (34) . Here, washing steps are included to minimize inorganic ions that may provoke adduct signals in the mass spectra. Protein resuspension is guaranteed by vigorous vortex and ultrasonic bath in 20% acetonitrile, before adding trypsin previously dissolved in water. There are no appreciable differences on both workflows ( Fig. 1a and 1b) respect to the processing time before MS analysis. Table 1 ) expressed in HEK-293T mammalian cell line has four disulfide bonds and a free cysteine residue (Cys 538 ) located towards the C-terminal region of the protein. The high reactivity of Cys 538 can be used for either site-directed chemical conjugation to highly immunogenic carrier proteins such as tetanus toxoid (35) . (Fig. 2a, lane 2) showing an intense and diffuse band at 33.3 kDa corresponding to the monomer with the heterogeneity of the N-glycosylation. Also, a band detected at 59.7 kDa representing approximately 13% was assigned to the dimer. After treatment with PNGase F, and analyzed under non-reducing conditions these bands migrated at 29.3 and 43.9 kDa (Fig. 2a, Table 2) . Other groups that also expressed RBD molecules in HEK-293 with an odd number of cysteine residues reported cysteinylation (32, 35) . O-glycosylation has been reported for the native RBD of SARS-Cov-2 (21, 36) as well as for several RBD expressed in mammalian cells (32, 35) . Also, of other signals observed in Fig. 2c (see inset) and summarized in Table 2 suggest the presence of other modified species of the N-deglycosylated RBD (319-541) -HEK_A 3 . Separately, the N-deglycosylated protein was digested in-solution with trypsin by using the SD (Fig. 1a) and BFD ( Fig. 1b) protocols and the resultant ESI-MS spectra are shown in Fig. 2d and Fig. 2e, respectively. The sequence assignment based on the agreement between the expected and experimental m/z of tryptic peptides are summarized in Table 3 . The four disulfide bonds present in the native RBD of S protein of SARS-CoV-2 were identified by both protocols (Fig. 2d and 2e) and confirmed by MS/MS analysis ( Fig. S1a-S1d ). In the SD protocol only the N-terminal peptide R 319 -R 328 containing HexNAc:Hex:NeuAc 2 was detected ( Fig. 2d, Table 3 ), presumably linked to either at Thr 323 or Ser 325 according to previous reports (32, 36) . However, in the in-solution BFD protocol the peptides R 319 -R 328 and V 320 -R 328 linked to HexNAc; HexNAc:Hex; HexNAc:Hex:NeuAc; HexNAc:Hex:NeuAc 2 ; HexNAc 2 :Hex 2 :NeuAc and HexNAc 2 :Hex 2 :NeuAc 2 were detected (Fig. 2e, Table 3 Full-sequence of RBD (319-541) -HEK_A 3 was verified by using in-solution BFD protocol, while using the SD protocol 82% of sequence coverage was achieved ( Table 2) . Several signals in the low-mass region (m/z 200-700) were exclusively detected when RBD (319-541) -HEK_A 3 was analyzed by the BFD protocol and they were assigned to short and hydrophilic were only detected in the RBD expressed in HEK-293T, probably included in the sequences of more hydrophobic peptides containing missed cleavages sites. The C-terminal peptide with the C 538 alkylated with NEM ( 538 CVNF 541 -AAAHHHHHH, m/z 548.24, 3+, Fig. 2g and Table 3 ) was detected by both protocols (Fig. 1a-b) . It confirmed that a fraction of this RBD contains an unpaired free C 538 residue. However, the low-intensity of the signal assigned to the C-terminal peptide with a C 538 alkylated with NEM (m/z 548.27, 3+; Fig. 2g and Table 3 ) when BFD protocol was applied suggested that Cys 538 should be modified with other 3a-3c) . A signal detected at m/z 565.26, 3+ was also only observed when RBD (319-541) -HEK_A 3 was analyzed by BFD ( Table 3) . MS/MS analysis demonstrated that it corresponded to the same C-terminal peptide (C 538 +CG) 3+ with the C 538 linked to a truncated variant of glutathion (Fig. S3a ) . The alkylation with NEM, inserted in our protocols (Fig. 1a, b) , transformed the hydrophilic Cterminal peptide (containing the unpaired C 538 ) in a more hydrophobic species and in consequence it was detected even using the SD protocol. On the contrary, the remaining Cyscapping modifications (31) mentioned above (cyanylation, glutathionylation, cysteinylation and truncated glutathionylation) did not increase the hydrobicity of the C-terminal peptide sufficiently to be retained by ZipTip-C 18 and they were detected exclusively when in-solution BFD was applied. Signals detected at m/z Exp 517.24, 3+ (Fig. 2f) and m/z Exp 541.91, 3+ (Fig. 2g) were assigned as (C 538 +32 Da) 3+ and (C 538 +106 Da) 3+ , corresponding to the C-terminal peptide with C 538 linked to modifying groups of unknown chemical nature. These signals were only detected when insolution BFD was applied to the characterization of RBD (319-541) -HEK_A 3 . We also found thirteen other different variants of the C-terminal peptide (confirmed by MS/MS, see Fig. S3 ) that were not assigned to a defined chemical structure of Cys 538 (see Table 3 ). Opossed to the thesis proposing that oxidoreductase-mediated protein disulfide bonding with free cysteine or glutathione in the lumen of endoplasmic reticulum (38) (39) (40) Interestingly, the authors also proposed the addition of chemically modified drugs to the culture medium to obtain homogeneous drug conjugates in a single step avoiding the need for an uncapping step prior to conjugation (31) . Cysteinylation at Cys 538 has been reported by other authors [31, 34] , but to our knowledge the other modifying groups ( Table 3) have not previously been reported for RBD. The species with Cys 538 modifications and O-glycoforms detected at protein level ( Table 2) were further confirmed at tryptic peptide level by the in-solution BFD ( Table 3) . The usage of culture media with defined composition and a well-characterized downstream process would avoid unexpected modifications of free cysteine residues (41, 42) , otherwise, the structural assignment for these PTMs would be more difficult according to our knowledge (43) . Although Cys-capping modifications protect the molecule from aggregation and scrambling mediated by inter-and intra-molecular disulfide bonds, respectively, it need to be adressed if the final outcome is to use the unpaired Cys for further modification as for example drug conjugation process (44, 45) . Another issue also to be adressed is the potential protein heterogeneity if the final intention is the use of the dimer molecule through disulphide bonds (38, 39, 46, 47) . A low-intensity signal at m/z Exp 607.28, 5+ and assigned to (S-S 5+ 538-538 ) in Fig. 2h , was exclusively detected when in-solution BFD protocol was applied. It suggests that a minor fraction of this molecule is a dimer mediated by an intermolecular disulfide bond between two Cys 538 residues. MS/MS of this signal confirmed this assignment (Fig. 3d) . This result match with SDS-PAGE of RBD (319-541) -HEK_A 3 ran under reducing and non reducing conditions showing that a dimer of this molecule repesents 13% of this preparation (Fig. 2a) . The presence of three low abundance scrambling variants (C 538 -C 379 , C 538 -C 391 , C 538 -C 432 ) and the homodimer (C 538 -C 538 ) of this molecule agrees with the presence of a free Cys 538 detected in this preparation ( Table 3) . All the previous scrambled species were detected by using the in-solution BFD protocol, while only the scrambling C 538 -C 391 was detected by the SD protocol. Also, a lowabundance population of the protein with free C 336 , C 391 , C 432 and C 538 was detected by both protocols. All the above-mentioned assignments of scrambled and free Cys variants were confimed by the MS/MS spectra ( Fig. S4-S5) . The presence of an unpaired Cys residues may also promote disulfide exchange (22) and in consequence generates low-abundance scrambling variants of the desired molecule. Our results indicate that Cys reduction and S-alkylation of the RBD protein before MS analysis is not convenient as important information is lost. The most striking results obtained with the BFD protocol are the detection of the disulfide-containing peptides (including low-abundance scrambled variants) and the finding of several modifications linked to free cysteines that probably, some of them would be missed if reduction of disulfides takes place during sample preparation. The analysis of the same gene construct (RBD (319-541) -HEK) for the expression in HEK-293T of the same protein without the C-terminal spacer arm of three alanines ( Table 1 ) by in-solution SD and BFD protocol (Fig. S6, Table 2 and S1) yields similar results to that described here for RBD (319-541) -HEK_A 3 , at protein and peptide level (Fig. 2, Table 2 and 3) . Full-sequence coverage was achieved in the analysis of RBD (319-541) -HEK by using in solution BFD protocol while using the SD protocol 85% was achieved ( Table 2) . C-terminal peptide containing the six histidine tail was only detected by using BFD protocol (data not shown). The RBD dimer (RBD (319-541) -CHO) 2 resulting from an intermolecular disulfide bond Cys 538 -Cys 538 was originally obtained as a by-product during the attempt to obtain RBD (319-541) -CHO. Unpaired Cys 538 was introduced in order to use it for site selective conjugation to tetanus toxoid (35) . The increased immunogenicity of RBD-dimer (9, 48) promoted their use in at least two vaccines currently in clinical trials. The (RBD (319-541) -CHO) 2 protein non-treated (lane 2, Fig. 4a) and treated (lane 3, Fig. 4b) with The ESI-MS spectrum of the PNGase F deglycosylated dimer showed the typical multiplycharged ions and suggested certain heterogeneity in the molecule (Fig. 4b) . It was more easily appreciated after the deconvolution process (Fig. 4c) due to the presence of three major signals corresponding to the three combinations of two short O-glycan chains linked to the dimer as indicated in Fig. 4c (21) . The good agreement observed between the expected and experimental masses for all these O-glycoforms of (RBD (319-541) -CHO) 2 is summarized in Table 2 . ESI-MS analysis of the (RBD (319-541) -CHO) 2 (Fig. 4c) suggest that this molecule is more homogeneous than the monomers RBD (319-541) -HEK_A 3 Cys 538 is fully compromised in the intermolecular disulfide bond and is not linked to other blocking groups present in the culture media. The N-deglycosylated protein was digested with trypsin by using the in-solution SD and BFD protocols and the resultant ESI-MS spectra are shown in Fig. 4d and 4e , respectively. Fullsequence coverage was achieved for the in-solution BFD protocol while using the SD protocol only 80.6% of the sequence was verified ( Table 2) . The assignments for all tryptic peptides generated by both protocols are summarized in Table S2 . The four disulfide bonds present in the native RBD of SARS-CoV-2 were detected by applying both protocols (Fig 4d and Fig 4e) . O-glycosylated N-terminal peptides (R 319 -R 328 and V 320 -R 328 ) with O-glycosylation sites located at Thr 323 /Ser 325 residues (21) 4d and Fig 4e) . The mass shift provoked by these O-glycans observed for the N-deglycosylated protein (Fig 4c) agreed with the one observed at the peptide level (Fig 4d and 4e) . Additionally, two low-intensity signals at m/z Exp 616.33, 2+ and 697.36, 2+ assigned to peptide V 320 -R 328 linked to HexNAc and HexNAc:Hex were detected only by in-solution BFD protocol ( Table S2) . The most striking differences between both ESI-MS spectra (Fig 4d and Fig 4e) Fig. 4f and Fig. 4g) . These signals that also enabled the verification of the C-terminal end of this molecule were exclusively detected by applying the in-solution BFD protocol. Probably, the presence of twelve histidine residues in the structure of [C 538 -H 547 ]-S-S-[C 538 -H 547 ] (assigned as S 538 -S 538 5+ and S 538 -S 538 4+ and in Fig. 4e and 4f, respectively) makes difficult its retention in the C 18 -ZipTip during the desalting step when the in-solution SD protocol is applied. The verification of the C-terminal end of proteins is a very important aspect included in the ICHQ6B guidelines and requested by regulatory authorities (18). The Table S3 ). The MS/MS spectra that confirmed these assignments are shown in Fig. S11 . Table S3 shows a summary for the assignment of all signals observed in the ESI-MS spectrum of Fig. 5c . The charge state (z) of all these signals assigned to peptides linked by scrambled disulfide bonds was equal or higher than 3+ (S-S, in Fig. 5c ). It agrees with a previous observation (49, 50) showing that in ESI-MS analysis cross-linked peptides generated by tryptic digestion are ionized preferably with z≥ 3+. Intermolecular disulfide linked peptides are a particular case of crosslinked peptides. The results shown here demonstrated that in-solution BFD protocol (24) in combination with ESI-MS analysis of RBD enabled in a single mass spectrum the detection of native disulfide bonds, the scrambled variants, as well as free cysteine residues that might be responsible for promoting disulfide exchange and protein aggregation (22) . This example is important for a future validation of this technique according to the ICHQ2R1 guidelines (51) . These aspects fulfills the requirements of regulatory agencies for the development of a well-characterized product taking into account the importance of this PTM for the ICHQ6B guidelines (18). Ninetynine percent of sequence coverage for RBD (331-529) -Ec was achieved when used the in-solution BFD protocol. Thermothelomyces heterothallica, was engineered to develop an industrialized protein production host expression system with high yields (> 10 g/L) of several recombinant proteins including Mabs, Fc-fusion proteins, virus like particles, etc (14). This engineered T. heterothallica C1 host expression system has a very significant reduction of the protease load thus minimizing unwanted degradation during fermentation (14) . For these reasons, the RBD (333-527) -C1 (see Table 1 ) was expressed in T. heterothallica C1 host expression system and characterized using the in-solution BFD protocol. Unlike other proteins characterized in this work, RBD (333-527) -C1 has only one N-glycosylation site located at Asn 343 . NP-HPLC profile showed the structural assignment based on the GU indexes for the individual N-glycans released with PNGase F and labeled with 2AB (Fig. 6a) . ESI-MS spectrum deconvoluted with MaxEnt1 of the intact RBD (333-527) -C1 confirmed the typical heterogeneity of N-glycoproteins indicating the presence of several non-fucosylated glycoforms (Fig. 6b) . The experimental and expected molecular masses agreed very well (Fig. 6b, Table 2 ). In both analyses, the three predominant glycoforms were Man4, Man4A1 and Man5A1, being the last one the most-abundant. This protein, after N-deglycosylation with PNGase F and ESI-MS analysis (Fig. S8a) , also showed several multiply charged signals with z=8+ to 15+, that once deconvoluted with MaxEnt1 ( Fig. S8b ) yields an intense and unique signal with an experimental molecular mass of 22590.33 Da ( The N-deglycosylated protein digested with trypsin by in-solution BFD protocol (Fig. 6c) and analyzed by the ESI-MS allowed a full-sequence coverage ( Table 2) Table S4 ). Very low-abundance signals (Fig. 6c) Table S4 ) that were assigned However, when the N-glycosylated RBD (333-527) -C1 was reduced and S-alkylated with iodoacetamide and digested using the in-solution BFD all cysteine containing peptides were detected (Fig. 6d, Table S4 ) including the N-terminal peptide T 333 -R 346 containing Cys 336 and several glycoforms as shown in the inset of Fig. 6d . MS/MS spectra supporting these assignments are shown in Fig. S10 . The in-solution BFD protocol achieved 100% sequence coverage ( Table 2) when the RBD (333-527) -C1 was N-deglycosylated under non-reduced conditions (Fig. 6c) and also when the Nglycoprotein was reduced and carbamidomethylated (Fig. 6d) . RBD of SARS-CoV-2 was also expressed in P. pastoris with a tandem repeat of six histidine residues and the Cmyc tag fused at the C-terminal end (RBD (331-530) -Cmyc-Pp, see Table 1 ) to be used for analytical purposes. The ESI-MS spectrum of RBD (331-530) -Cmyc-Pp deglycosylated with PNGase F (Fig. 7a) with charge-states from 9+ to 17+ was deconvoluted (Fig. 7b) , yielding an intense signal with a molecular mass of 25835.29 Da that is 400.88 Da higher than expected (25434.41 Da) considering the sequence of RBD (331-530) -Cmyc-Pp in Table 1 with four disulfide bonds and two N-deglycosylated sites. The N-deglycosylated protein was digested with trypsin by the insolution BFD protocol (Fig. 1b) and the resultant ESI-MS spectrum (Fig. 7c) showed an unexpected signal of appreciable intensity at m/z Exp 1219.32, 4+. The MS/MS spectrum of this signal (Fig. 7d) Fig 7b) . were detected and sequence coverage of 99% ( Table 2) was achieved for the in-solution BFD protocol. The α-mating factor prepro peptide secretion signal from Saccharomyces cerevisiae is still the most commonly used signal sequence for recombinant proteins expressed in P. pastoris (53) . Processing of the alpha mating factor should occur in three steps, in particular the last step involves the Ste13 protein that cleaves the Glu-Ala repeats in the Golgi (54), therefore we speculate that this step was interrupted in some way, as all the purified protein was detected exclusively with the EAEA-linked to the N-terminal end. Probably the high expression level of this protein (40 g/L) impaired the complete processing of the signal peptide. In-solution BFD enabled the detection of incomplete processing of the signal peptide of RBD expressed in P. pastoris, an expression system of choice for vaccine development due to the ease of genetic manipulation, the capability to perform complex post-translational modifications and, at the same time, obtain high expression yield (12) . The characterization of the N-terminal end is also one of the aspects requested by the ICHQ6B guidelines (18). In the characterization of all RBDs by using in-solution BFD protocol we initially used acetone for protein precipitation (Fig. 1b) . We noticed in the ESI-MS spectra an unexpected doublycharged signal at m/z Exp 629.81 (Fig 8a) having a variable intensity. This signal was not detected when RBDs were processed by using in-solution SD protocol (Fig. 2d, 4d and S6c) and when the protein precipitation step (Fig. 1b) was carried out with cold ethanol (Fig. 8b) instead of acetone (Fig. 8a) . Also, in the same MS/MS spectra (Fig. 8d) (Fig. 8d) . Although in literature a structure for this modification has not been proposed yet, a previous work noticed that it is specific only for those peptides having Gly at position n+2 that were derived from tryptic digests of proteins previously precipitated with acetone (55) . Fig. 8a ) and the C-terminal peptide ( 538 CVNF 541 -AAAHHHHHH) carrying a +374 Da modification at Cys 538 (Fig. 8b) were partially overlapped thus it impaired its detection. This modification at Cys 538 was only detected when the protein RBD (319-541) -HEK_A 3 was precipitated with ethanol and analyzed by in-solution BFD (Fig. 8b) . Another artifact originated by the sample processing was the partial addition of NEM to the Nterminal end of the RBD proteins despite maleimide has 1000 fold selectivity for thiols over amine groups at neutral pH (56) . The addition of NEM was verified by ESI-MS analyses of the RBD deglycosylated with PNGase F ( Table 3) (Fig. 8e) . . Several daughter ions originated by the asymmetric fragmentation of the disulfide bond (1a 2 , 1b 2 1b 4 1b 2 b 3SH ) confirmed that the N-terminal end of 538 CVNF 541 -AAAHHHHHH is free while two other fragment ions (2  and 2  ) confirm that NEM is located at the N-terminal end of the cysteine residue that is linked by an intermolecular disulfide bond to Cys 538 . Using the in-solution BFD protocol (Fig. 1b) and Lys 529 modified with NEM (+125 Da) were detected using the in-solution BFD protocol and confirmed by MS/MS (Fig. S12) . On the contrary, when the RBD is digested in-solution by using the SD protocol and NEM is present even at a very low concentration (≤5 mM) during all sample processing it will be added to the N-terminal end of most of the internal tryptic peptides (data not shown). NEM is added in excess at a concentration of 5 mM and it remains during the N-deglycosylation step (2 h at 37 o C) at a pH slightly over neutral (7.2-7.4) . It seems that these conditions make favorable this side reaction in a considerable fraction of the N-terminal end of the deglycosylated RBD as well as for the cysteine linked by disulfide bond to Cys 538 . In a minor extension the epsilon amino groups of Lys residues were modified. Therefore, the partial addition of NEM at the N-terminal end of the protein is a side reaction to be considered when in-solution BFD is used. The hydrolysis of thiosuccinimide ring after adding the NEM to free Cys residues in protein was also observed probably more appreciable when the digestion of the analyzed RBD was carried out by the SD protocol at basic pH ( Fig. 1a) (57) . This side reaction increased the molecular masses of peptides by 18 Da. In-solution BFD allowed in a single ES-MS spectrum, the full-sequence coverage for most recombinant RBD sequences characterized in this work and outperformed, in this aspect, the in-solution SD protocol. Hydrophobic and the hydrophilic peptides were simultaneously and efficiently detected in the same ESI-MS spectrum only when BFD protocol was used. The in-solution BFD protocol implemented in combination with ESI-MS analysis has demonstrated to be sensitive for the detection of PTMs (N-and O-glycosylation, and several modifications linked to the free Cys 538 ) present in the recombinant RBDs produced in different expression systems (mammalian cells, fungus, yeast and bacteria). In an RBD with the unpaired Cys 538 several PTMs were only detected when in-solution BFD was applied. In particular, the identification of the C-terminal end of proteins containing a tandem repeat of six histidine residues, an important aspect requested in the ICHQ6B guidelines, was always possible by the in-solution BFD while with the SD sample processing the identification was not achieved in all proteins. Artifacts introduced during the application of the in-solution BFD protocol like the one provoked by acetone (+40 Da) and the N-terminal modification with NEM (+125 Da) were well characterized and the sources that originate them were readily identified. In particular, the artifact of +40 Da in an internal peptide of RBD ( 445 VG*GNYNYLYR 454 ) can be eliminated if ethanol precipitation is used instead of acetone. Other S-alkylating agents might be also used alternatively depending the preferences of the labs, but was not explored here. We believe that the results obtained in this study, suggest that in-solution BFD protocol in combination with ESI-MS deserve to be validated for the characterization of RBDs used as active pharmaceutical ingredients of SARS-CoV-2 subunit-based vaccines. This protocol is simple and can be easily implemented because it essentially employs reagents frequently used for the characterization of proteins by MS. MALDI-MS in principle, can also be used in combination with the in-solution BFD, however the detection of some low molecular mass peptides in somehow might be troublesome due to matrix interference. Probably the usage of blank mass spectra might be useful to obtain reliable results. We foresee that the current BFD protocol can also be applicable to the analysis of active pharmaceutical ingredients of other recombinant RBD-based subunit vaccines customized for SARS-CoV-2 with point mutations (58) Table 3 . A detailed assignment for all tryptic peptides in this figure is summarized in Table S2 . Signals assigned as (S-S) n+ correspond to the peptides containing disulfide bonds between the cysteines described. The signals labeled with (Nt-His 6 ) n+ correspond to the N-terminal peptide containing the tandem repeat of six histidine residues in its amino acid sequence. A detailed assignment for all tryptic peptides in this figure is summarized in Table S3 . Table S4 . A novel coronavirus outbreak of global health concern. The lancet Challenges in ensuring global access to COVID-19 vaccines: production, affordability, allocation, and deployment. The Lancet WHO. Target product profiles for COVID-19 vaccines Genetic and biological characterization of a Cuban tick strain from Rhipicephalus sanguineus complex and its sensitivity to different chemical acaricides A pneumonia outbreak associated with a new coronavirus of probable bat origin Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation A vaccine targeting the RBD of the S protein of SARS-CoV-2 induces protective immunity Molecular Aspects Concerning the Use of the SARS-CoV-2 Receptor Binding Domain as a Target for Preventive Vaccines Structural basis of receptor recognition by SARS-CoV-2 SARS-CoV-2 spike produced in insect cells elicits high neutralization titres in non-human primates. Emerging microbes & infections Structural and functional comparison of SARS-CoV-2-spike receptor binding domain produced in Pichia pastoris and mammalian cells SARS-CoV-2 RBD219-N1C1: A yeast-expressed SARS-CoV-2 recombinant receptor-binding domain candidate vaccine stimulates virus neutralizing antibodies and T-cell immunity in mice Development of a mature fungal technology and production platform for industrial enzymes based on a Myceliophthora thermophila isolate, previously known as Chrysosporium lucknowense C1 On the reproducibility of label-free quantitative cross-linking/mass spectrometry Efficient production of recombinant SARS-CoV-2 spike protein using the baculovirus-silkworm system Bacterial expression and purification of functional recombinant SARS-CoV-2 spike receptor binding domain Linear epitopes of SARS-CoV-2 spike protein elicit neutralizing antibodies in COVID-19 patients Two linear epitopes on the SARS-CoV-2 spike protein that elicit neutralising antibodies in COVID-19 patients N-and O-glycosylation of the SARS-CoV-2 spike protein Recent mass spectrometry-based techniques and considerations for disulfide bond characterization in proteins An in-gel digestion procedure that facilitates the identification of highly hydrophobic proteins by electrospray ionization-mass spectrometry analysis Targeting the hydrophilic regions of recombinant proteins by MS via in-solution buffer-free trypsin digestion Protein production by auto-induction in high density shaking cultures Cleavage of structural proteins during the assembly of the head of bacteriophage T4 Characterization of a solvent system for separation of water-insoluble poliovirus proteins by reversed-phase high-performance liquid chromatography A rapid high-resolution high-performance liquid chromatographic method for separating glycan mixtures and analyzing oligosaccharide profiles Steric effects in peptide and protein exchange with activated disulfides Effect of pH and temperature on protein unfolding and thiol/disulfide interchange reactions during heat-induced gelation of whey proteins Mechanistic understanding of the cysteine capping modifications of antibodies enables selective chemical engineering in live mammalian cells Structural and functional characterization of SARS-CoV-2 RBD domains produced in mammalian cells. bioRxiv. 2021. 33. Crowell AM, Wall MJ, Doucette AA. Maximizing recovery of water-soluble proteins through acetone precipitation Comparison of ethanol plasma-protein precipitation with plasma ultrafiltration and trichloroacetic acid protein precipitation for the measurement of unbound platinum concentrations SARS-CoV-2 RBD-Tetanus toxoid conjugate vaccine induces a strong neutralizing immunity in preclinical studies Deducing the N-and O-glycosylation profile of the spike protein of novel coronavirus SARS-CoV-2 Use of CID/ETD mass spectrometry to analyze glycopeptides Identification of cysteinylation of a free cysteine in the Fab region of a recombinant monoclonal IgG1 antibody using Lys-C limited proteolysis coupled with LC/MS analysis Engineering a therapeutic IgG molecule to address cysteinylation, aggregation and enhance thermal stability and expression Removal of cysteinylation from an unpaired sulfhydryl in the variable region of a recombinant monoclonal IgG1 antibody improves homogeneity, stability, and biological activity Abundant cysteine side reactions in traditional buffers interfere with the analysis of posttranslational modifications and protein quantification-how to compromise ROSics: chemistry and proteomics of cysteine modifications in redox biology Isolation and characterization of modified species of a mutated (Cys125-Ala) recombinant human interleukin-2 Rapid identification of reactive cysteine residues for site-specific labeling of antibody-Fabs Site-specific conjugation on serine→ cysteine variant monoclonal antibodies Charge-based analysis of antibodies with engineered cysteines: from multiple peaks to a single main peak Experimental Assignment of Disulfide-Bonds in Purified Proteins. Current protocols in protein science A universal design of betacoronavirus vaccines against COVID-19, MERS, and SARS A study into the collision-induced dissociation (CID) behavior of cross-linked peptides Architecture of the RNA polymerase II-TFIIF complex revealed by cross-linking and mass spectrometry Q2 (R1) validation of analytical procedures. ICH Quality guidelines Structure of a yeast pheromone gene (MFα): a putative α-factor precursor contains four tandem copies of mature α-factor The effect of αmating factor secretion signal mutations on recombinant protein expression in Pichia pastoris Alphafactor-directed synthesis and secretion of mature foreign proteins in Saccharomyces cerevisiae Acetone precipitation of proteins and the modification of peptides Straightforward thiolmediated protein labelling with DTPA: Synthesis of a highly active 111 In-annexin A5-DTPA tracer Limiting the Hydrolysis and Oxidation of Maleimide-Peptide Adducts Improves Detection of Protein Thiol Oxidation Escape of SARS-CoV-2 501Y.V2 from neutralization by convalescent plasma Genomic recombination events may reveal the evolution of coronavirus and the origin of SARS-CoV-2 Symbol nomenclature for graphical representations of glycans Fragmentation of intra-peptide and inter-peptide disulfide bonds of proteolytic peptides by nanoESI collision-induced dissociation