Mass spectrometry-based sequencing of the anti-FLAG-M2 antibody using multiple proteases and a dual fragmentation scheme 1 Mass spectrometry-based sequencing of the anti-FLAG-M2 antibody using multiple 1 proteases and a dual fragmentation scheme 2 3 Authors: 4 Weiwei Peng1#, Matti F. Pronker1#, Joost Snijder1* 5 6 #equal contribution 7 *corresponding author: j.snijder@uu.nl 8 9 Affiliation: 10 1 Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research 11 and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584 CH 12 Utrecht, The Netherlands 13 14 Keywords: 15 mass spectrometry, antibody, de novo sequencing, EThcD, stepped HCD, Herceptin, FLAG tag, 16 anti-FLAG-M2. 17 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 2 Abstract: 18 Antibody sequence information is crucial to understanding the structural basis for antigen binding 19 and enables the use of antibodies as therapeutics and research tools. Here we demonstrate a 20 method for direct de novo sequencing of monoclonal IgG from the purified antibody products. The 21 method uses a panel of multiple complementary proteases to generate suitable peptides for de 22 novo sequencing by LC-MS/MS in a bottom-up fashion. Furthermore, we apply a dual 23 fragmentation scheme, using both stepped high-energy collision dissociation (stepped HCD) and 24 electron transfer high-energy collision dissociation (EThcD) on all peptide precursors. The method 25 achieves full sequence coverage of the monoclonal antibody Herceptin, with an accuracy of 99% 26 in the variable regions. We applied the method to sequence the widely used anti-FLAG-M2 mouse 27 monoclonal antibody, which we successfully validated by remodeling a high-resolution crystal 28 structure of the Fab and demonstrating binding to a FLAG-tagged target protein in Western blot 29 analysis. The method thus offers robust and reliable sequences of monoclonal antibodies. 30 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 3 Introduction 31 Antibodies can bind a great molecular diversity of antigens, owing to the high degree of sequence 32 diversity that is available through somatic recombination, hypermutation, and heavy-light chain 33 pairings 1-2. Sequence information on antibodies therefore is crucial to understanding the 34 structural basis of antigen binding, how somatic hypermutation governs affinity maturation, and 35 an overall understanding of the adaptive immune response in health and disease, by mapping 36 out the antibody repertoire. Moreover, antibodies have become invaluable research tools in the 37 life sciences and ever more widely developed as therapeutic agents 3-4. In this context, sequence 38 information is crucial for the use, production and validation of these important research tools and 39 biopharmaceutical agents 5-6. 40 Antibody sequences are typically obtained through cloning and sequencing of the coding mRNAs 41 of the paired heavy and light chains 7-9. The sequencing workflows thereby rely on isolation of the 42 antibody-producing cells from peripheral blood monocytes, or spleen and bone marrow tissues. 43 These antibody-producing cells are not always readily available however, and cloning/sequencing 44 of the paired heavy and light chains is a non-trivial task with a limited success rate 7-9. Moreover, 45 antibodies are secreted in bodily fluids and mucus. Antibodies are thereby in large part 46 functionally disconnected from their producing B-cell, which raises questions on how the secreted 47 antibody pool relates quantitatively to the underlying B-cell population and whether there are 48 potential sampling biases in current antibody sequencing strategies. 49 Direct mass spectrometry (MS)-based sequencing of the secreted antibody products is a useful 50 complementary tool that can address some of the challenges faced by conventional sequencing 51 strategies relying on cloning/sequencing of the coding mRNAs 10-17. MS-based methods do not 52 rely on the availability of the antibody-producing cells, but rather target the polypeptide products 53 directly, offering the prospect of a next generation of serology, in which secreted antibody 54 sequences might be obtained from any bodily fluid. Whereas MS-based de novo sequencing still 55 has a long way to go towards this goal, owing to limitations in sample requirements, sequencing 56 accuracy, read length and sequence assembly, MS has been successfully used to profile the 57 antibody repertoire and obtain (partial) antibody sequences beyond those available from 58 conventional sequencing strategies based on cloning/sequencing of the coding mRNAs 10-17. 59 Most MS-based strategies for antibody sequencing rely on a proteomics-type bottom-up LC-60 MS/MS workflow, in which the antibody product is digested into smaller peptides for MS analysis 61 14, 18-23. Available germline antibody sequences are then often used either as a template to guide 62 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 4 assembly of de novo peptide reads (such as in PEAKS Ab) 23, or used as a starting point to 63 iteratively identify somatic mutations to arrive at the mature antibody sequence (such as in 64 Supernovo) 21. To maximize sequence coverage and aid read assembly, these MS-based 65 workflows typically use a combination of complementary proteases and aspecific digestion to 66 generate overlapping peptides. The most straightforward application of these MS-based 67 sequencing workflows is the successful sequencing of monoclonal antibodies from (lost) 68 hybridoma cell lines, but it also forms the basis of more advanced and challenging applications to 69 characterize polyclonal antibody mixtures and profile the full antibody repertoire from serum. 70 Here we describe an efficient protocol for MS-based sequencing of monoclonal antibodies. The 71 protocol requires approximately 200 picomol of the antibody product and sample preparation can 72 be completed within one working day. We selected a panel of 9 proteases with complementary 73 specificities, which are active in the same buffer conditions for parallel digestion of the antibodies. 74 We developed a dual fragmentation strategy for MS/MS analysis of the resulting peptides to yield 75 rich sequence information from the fragmentation spectra of the peptides. The protocol yields full 76 and deep sequence coverage of the variable domains of both heavy and light chains as 77 demonstrated on the monoclonal antibody Herceptin. As a test case, we used our protocol to 78 sequence the widely used anti-FLAG-M2 mouse monoclonal antibody, for which no sequence 79 was publicly available despite its described use in 5000+ peer-reviewed publications 24-25. The 80 protocol achieved full sequence coverage of the variable domains of both heavy and light chains, 81 including all complementarity determining regions (CDRs). The obtained sequence was 82 successfully validated by remodeling the published crystal structure of the anti-FLAG-M2 Fab and 83 demonstrating binding of the synthetic recombinant antibody following the experimental sequence 84 to a FLAG-tagged protein in Western blot analysis. The protocol developed here thus offers robust 85 and reliable sequencing of monoclonal antibodies with prospective applications for sequencing 86 secreted antibodies from bodily fluids. 87 88 Results 89 We used an in-solution digestion protocol, with sodium-deoxycholate as the denaturing agent, to 90 generate peptides from the antibodies for LC-MS/MS analysis. Following heat denaturation and 91 disulfide bond reduction, we used iodoacetic acid as the alkylating agent to cap free cysteines. 92 Note that conventional alkylating agents like iodo-/chloroacetamide generate +57 Da mass 93 differences on cysteines and primary amines, which may lead to spurious assignments as glycine 94 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 5 residues in de novo sequencing. The +58 Da mass differences generated by alkylation with 95 iodoacetic acid circumvents this potential pitfall. 96 We chose a panel of 9 proteases with activity at pH 7.5-8.5, so that the denatured, reduced and 97 alkylated antibodies could be easily split for parallel digestion under the same buffer conditions. 98 These proteases (with indicated cleavage specificities) included: trypsin (C-terminal of R/K), 99 chymotrypsin (C-terminal of F/Y/W/M/L), α-lytic protease (C-terminal of T/A/S/V), elastase 100 (unspecific), thermolysin (unspecific), lysN (N-terminal of K), lysC (C-terminal of K), aspN (N-101 terminal of D/E), and gluC (C-terminal of D/E). Correct placement or assembly of peptide reads 102 is a common challenge in de novo sequencing, which can be facilitated by sufficient overlap 103 between the peptide reads. This favors the occurrence of missed cleavages and longer reads, so 104 we opted to perform a brief 4-hour digestion. Following digestion, SDC is removed by precipitation 105 and the peptide supernatant is desalted, ready for LC-MS/MS analysis. The resulting raw data 106 was used for automated de novo sequencing with the Supernovo software package. 107 As peptide fragmentation is dependent on many factors like length, charge state, composition and 108 sequence 26, we needed a versatile fragmentation strategy to accommodate the diversity of 109 antibody-derived peptides generated by the 9 proteases. We opted for a dual fragmentation 110 scheme that applies both stepped high-energy collision dissociation (stepped HCD) and electron 111 transfer high-energy collision dissociation (EThcD) on all peptide precursors 27-29. The stepped 112 HCD fragmentation includes three collision energies to cover multiple dissociation regimes and 113 the EThcD fragmentation works especially well for higher charge states, also adding 114 complementary c/z ions for maximum sequence coverage. 115 We used the monoclonal antibody Herceptin (also known as Trastuzumab) as a benchmark to 116 test our protocol 30-31. From the total dataset of 9 proteases, we collected 4408 peptide reads 117 (defined as peptides with score >=500, see methods for details), 2866 of which with superior 118 stepped HCD fragmentation, and 1722 with superior EThcD fragmentation (see Table S1). 119 Sequence coverage was 100% in both heavy and light chains across the variable and constant 120 domains (see Figures S1 and S2). The median depth of coverage was 148 overall and slightly 121 higher in the light chain (see Table S1 and Figure S1-2). The median depth of coverage in the 122 CDRs of both chains ranged from 42 to 210. 123 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 6 124 Figure 1. mass spectrometry-based de novo sequencing of the monoclonal antibody Herceptin. The 125 variable regions of the Heavy (A) and Light Chains (B) are shown. The MS-based sequence is shown 126 alongside the known Herceptin sequence, with differences highlighted by asterisks (*). Exemplary MS/MS 127 spectra supporting the assigned sequences of the Heavy and Light Chain CDRs are shown below the 128 alignments. Peptide sequence and fragment coverage are indicated on top of the spectra, with b/c ions 129 indicated in blue and y/z ions in red. The same coloring is used to annotate peaks in the spectra, with 130 additional peaks such as intact/charge reduced precursors, neutral losses and immonium ions indicated in 131 green. Note that to prevent overlapping peak labels, only a subset of successfully matched peaks is 132 annotated. 133 134 The experimentally determined de novo sequence is shown alongside the known Herceptin 135 sequence for the variable domains of both chains in Figure 1, with exemplary MS/MS spectra for 136 the CDRs. We achieved an overall sequence accuracy of 99% with the automated sequencing 137 procedure of Supernovo, with 3 incorrect assignments in the light chain. In framework 3 of the 138 light chain, I75 was incorrectly assigned as the isomer Leucine (L), a common MS-based 139 sequencing error. In CDRL3 of the light chain, an additional misassignment was made for the 140 dipeptide H91/Y92, which was incorrectly assigned as W91/N92. The dipeptides HY and WN 141 have identical masses, and the misassignment of W91/N92 (especially W91) was poorly 142 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 7 supported by the fragmentation spectra, in contrast to the correct H91/Y92 assignment (see c6/c7 143 in fragmentation spectra, Figure 1). Overall, the protocol yielded highly accurate sequences at a 144 combined 230/233 positions of the variable domains in Herceptin. 145 146 147 Figure 2. Mass spectrometry based de novo sequence of the mouse monoclonal anti-FLAG-M2 antibody. 148 The variable regions of the Heavy (A) and Light Chains (B) are shown. The MS-based sequence is shown 149 alongside the previously published sequenced in the crystal structure of the Fab (PDB ID: 2G60), and 150 germline sequence (IMGT-DomainGapAlign; IGHV1-04/IGHJ2; IGKV1-117/IGKJ1). Differential residues 151 are highlighted by asterisks (*). Exemplary MS/MS spectra in support of the assigned sequences are shown 152 below the alignments. Peptide sequence and fragment coverage are indicated on top of the spectra, with 153 b/c ions indicated in blue, y/z ions in red. The same coloring is used to annotate peaks in the spectra, with 154 additional peaks such as intact/charge reduced precursors, neutral losses and immonium ions indicated in 155 green. Note that to prevent overlapping peak labels, only a subset of successfully matched peaks is 156 annotated. 157 158 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 8 We next applied our sequencing protocol to the mouse monoclonal anti-FLAG-M2 antibody as a 159 test case 24. Despite the widespread use of anti-FLAG-M2 to detect and purify FLAG-tagged 160 proteins 32, the only publicly available sequences can be found in the crystal structure of the Fab 161 33. The modelled sequence of the original crystal structure had to be inferred from germline 162 sequences that could match the experimental electron density and also includes many 163 placeholder Alanines at positions that could not be straightforwardly interpreted. The full anti-164 FLAG-M2 dataset from the 9 proteases included 3371 peptide reads (with scores >= 500); 1983 165 with superior stepped HCD fragmentation spectra, and 1388 with superior EThcD spectra. We 166 achieved full sequence coverage of the variable regions of both heavy and light chains, with a 167 median depth of coverage in the CDRs ranging from 32 to 192 (see Table S1). As for Herceptin, 168 the depth of coverage was better in the light chain compared to the heavy chain (see Figure S1-169 S2). The full MS-based anti-FLAG-M2 sequences can be found in FASTA format in the 170 supplementary information. 171 172 173 Figure 3. Validation of the MS-based anti-FLAG-M2 sequence. A) the previously published crystal structure 174 of the anti-FLAG-M2 FAb was remodeled with the experimentally determined sequence, shown in surface 175 rendering with CDRs and differential residues highlighted in colors. B) 2Fo-Fc electron density of the new 176 refined map contoured at 1 RMSD is shown in blue and Fo-Fc positive difference density of the original 177 deposited map contoured at 1.7 RMSD in green around the CDR loops of the heavy and light chains. 178 Differential residues between the published crystal structure and the model based on our antibody 179 sequencing are indicated in purple. C) Western blot validation of the synthetic recombinant anti-FLAG-M2 180 antibody produced with the experimentally determined sequence demonstrate equivalent FLAG-tag binding 181 compared to commercial anti-FLAG-M2 (see also Figure S3). 182 183 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 9 The MS-based sequences of anti-FLAG-M2 are shown alongside the crystal structure sequences 184 and the inferred germline precursors with exemplary MS/MS spectra for the CDRs in Figure 2. 185 The experimentally determined sequence reveals that anti-FLAG-M2 is a mouse IgG1, with an 186 IGHV1-04/IGHJ2 heavy chain and IGKV1-117/IGKJ1 kappa light chain. The experimentally 187 determined sequence differs at 34 and 9 positions in the heavy and light chain of the Fab crystal 188 structure, respectively. To validate the experimentally determined sequences, we remodeled the 189 crystal structure using the MS-based heavy and light chains, resulting in much improved model 190 statistics (see Figure 3 and Table S2). The experimental electron densities show excellent support 191 of the MS-based sequence (as shown for the CDRs in Figure 3B). A notable exception is L51 in 192 CDRH2 of the heavy chain. The MS-based sequence was assigned as Leucine, but the 193 experimental electron density supports assignment of the isomer Isoleucine instead (see Figure 194 S3). In contrast to the original model our new MS-based model reveals a predominantly positively 195 charged paratope (see Figure S4), which potentially complements the -3 net charge of the FLAG 196 tag epitope (DYKDDDDK) to mediate binding. The experimentally determined anti-FLAG-M2 197 sequence, with the L51I correction, was further validated by testing binding of the synthetic 198 recombinant antibody to a purified FLAG-tagged protein in Western blot analysis (see Figure 3C 199 and S5). The synthetic recombinant antibody showed equivalent binding compared to the original 200 antibody sample used for sequencing, confirming that the experimentally determined sequence 201 is reliable to obtain the recombinant antibody product with the desired functional profile. 202 203 Discussion 204 There are four other monoclonal antibody sequences against the FLAG tag publicly available 205 through the ABCD (AntiBodies Chemically Defined) database 34-36. Comparison of the CDRs of 206 anti-FLAG-M2 with these additional four monoclonal antibodies reveals a few common motifs that 207 may determine FLAG-tag binding specificity (see Table S3). In the heavy chain, the only common 208 motif between all five monoclonals is that the first three residues of CDRH1 follow a GXS 209 sequence. In addition, the last three residues of CDRH3 of anti-FLAG-M2 are YDY, similar to 210 MDY in H28, and YDF in EEh13.6 (and EEh14.3 also ends CDRH3 with an aromatic F residue). 211 In contrast to the heavy chain, the CDRs of the light chain are almost completely conserved in 212 4/5 monoclonals with only minimal differences compared to germline. The anti-FLAG-M2 and H28 213 monoclonals were specifically raised in mice against the FLAG-tag epitope 24, 35, whereas the 214 computationally designed EEh13.6 and EEh14.3 monoclonals contain the same light chain from 215 an EE-dipeptide tag directed antibody 34. This suggests that the IGKV1-117/IGKJ1 light chain may 216 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 10 be a common determinant of binding to a small negatively charged peptide epitope like the FLAG-217 tag and is readily available as a hardcoded germline sequence in the mouse antibody repertoire. 218 The availability of the anti-FLAG-M2 sequences may contribute to the wider use of this important 219 research tool, as well as the development and engineering of better FLAG-tag directed antibodies. 220 This example illustrates that our MS-based sequencing protocol yields robust and reliable 221 monoclonal antibody sequences. The protocol described here also formed the basis of a recent 222 application where we sequenced an antibody directly from patient-derived serum, using a 223 combination with top-down fragmentation of the isolated Fab fragment 37. The dual fragmentation 224 strategy yields high-quality spectra suitable for de novo sequencing and may further contribute to 225 the exciting prospect of a new era of serology in which antibody sequences can be directly 226 obtained from bodily fluids. 227 228 229 Methods 230 Sample preparation 231 Anti-Flag M2 antibody was purchased from Sigma (catalogue number F1804). Herceptin was 232 provided by Roche (Penzberg, Germany). 27 μg of each sample was denatured in 2% sodium 233 deoxycholate (SDC), 200 mM Tris-HCl, 10 mM tris(2-carboxyethyl)phosphine (TCEP), pH 8.0 at 234 95°C for 10 min, followed with 30 min incubation at 37°C for reduction. Sample was then alkylated 235 by adding iodoacetic acid to a final concentration of 40 mM and incubated in the dark at room 236 temperature for 45 min. 3 μg Sample was then digested by one of the following proteases: trypsin, 237 chymotrypsin, lysN, lysC, gluC, aspN, aLP, thermolysin and elastase in a 1:50 ratio (w:w) in a 238 total volume of 100 uL of 50 mM ammonium bicarbonate at 37°C for 4 h. After digestion, SDC 239 was removed by adding 2 uL formic acid (FA) and centrifugation at 14000 g for 20 min. Following 240 centrifugation, the supernatant containing the peptides was collected for desalting on a 30 µm 241 Oasis HLB 96-well plate (Waters). The Oasis HLB sorbent was activated with 100% acetonitrile 242 and subsequently equilibrated with 10% formic acid in water. Next, peptides were bound to the 243 sorbent, washed twice with 10% formic acid in water and eluted with 100 µL of 50% 244 acetonitrile/5% formic acid in water (v/v). The eluted peptides were vacuum-dried and 245 reconstituted in 100 µL 2% FA. 246 247 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 11 Mass Spectrometry 248 The digested peptides (single injection of 0.2 ug) were separated by online reversed phase 249 chromatography on an Agilent 1290 UHPLC (column packed with Poroshell 120 EC C18; 250 dimensions 50 cm x 75 µm, 2.7 µm, Agilent Technologies) coupled to a Thermo Scientific Orbitrap 251 Fusion mass spectrometer. Samples were eluted over a 90 min gradient from 0% to 35% 252 acetonitrile at a flow rate of 0.3 μL/min. Peptides were analyzed with a resolution setting of 60000 253 in MS1. MS1 scans were obtained with standard AGC target, maximum injection time of 50 ms, 254 and scan range 350-2000. The precursors were selected with a 3 m/z window and fragmented by 255 stepped HCD as well as EThcD. The stepped HCD fragmentation included steps of 25%, 35% 256 and 50% NCE. EThcD fragmentation was performed with calibrated charge-dependent ETD 257 parameters and 27% NCE supplemental activation. For both fragmentation types, ms2 scan were 258 acquired at 30000 resolution, 800% Normalized AGC target, 250 ms maximum injection time, 259 scan range 120-3500. 260 261 MS Data Analysis 262 Automated de novo sequencing was performed with Supernovo (version 3.10, Protein Metrics 263 Inc.). Custom parameters were used as follows: non-specific digestion; precursor and product 264 mass tolerance was set to 12 ppm and 0.02 Da respectively; carboxymethylation (+58.005479) 265 on cysteine was set as fixed modification; oxidation on methionine and tryptophan was set as 266 variable common 1 modification; carboxymethylation on the N-terminus, pyroglutamic acid 267 conversion of glutamine and glutamic acid on the N-terminus, deamidation on 268 asparagine/glutamine were set as variable rare 1 modifications. Peptides were filtered for score 269 >=500 for the final evaluation of spectrum quality and (depth of) coverage. Supernovo generates 270 peptide groups for redundant MS/MS spectra, including also when stepped HCD and EThcD 271 fragmentation on the same precursor both generate good peptide-spectrum matches. In these 272 cases only the best-matched spectrum is counted as representative for that group. This criterium 273 was used in counting the number of peptide reads reported in Table S1. Germline sequences and 274 CDR boundaries were inferred using IMGT/DomainGapAlign 38-39. 275 276 277 278 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 12 Revision of the anti-FLAG-M2 Fab crystal structure model 279 As a starting point for model building, the reflection file and coordinates of the published anti-280 FLAG-M2 Fab crystal structure were used (PDB ID: 2G60) 33. Care was taken to use the original 281 Rfree labels of the deposited reflection file for refinement, so as not to introduce extra model bias. 282 Differential residues between this structure and our mass spectrometry-derived anti-FLAG 283 sequence were manually mutated and fitted in the density using Coot 40. Many spurious water 284 molecules that caused severe steric clashes in the original model were also manually removed in 285 Coot. Densities for two sulfate and one chloride ion were identified and built into the model. The 286 original crystallization solution contained 0.1 M ammonium sulfate. Iterative cycles of model 287 geometry optimization in real space in Coot and reciprocal space refinement by Phenix were used 288 to generate the final model, which was validated with Molprobity 41-42. 289 290 Cloning and expression of synthetic recombinant anti-FLAG-M2 291 To recombinantly express full-length anti-FLAG-M2, the proteomic sequences of both the light 292 and heavy chains were reverse-translated and codon optimized for expression in human cells 293 using the Integrated DNA Technologies (IDT) web tool (http://www.idtdna.com/CodonOpt) 43. For 294 the linker and Fc region of the heavy chain, the standard mouse Ig gamma-1 (IGHG1) amino acid 295 sequence (Uniprot P01868.1) was used. An N-terminal secretion signal peptide derived from 296 human IgG light chain (MEAPAQLLFLLLLWLPDTTG) was added to the N-termini of both heavy 297 and light chains. BamHI and NotI restriction sites were added to the 5’ and 3’ ends of the coding 298 regions, respectively. Only for the light chain, a double stop codon was introduced at the 3’ site 299 before the NotI restriction site. The coding regions were subcloned using BamHI and NotI 300 restriction-ligation into a pRK5 expression vector with a C-terminal octahistidine tag between the 301 NotI site and a double stop codon 3’ of the insert, so that only the heavy chain has a C-terminal 302 AAAHHHHHHHH sequence for Nickel-affinity purification (the triple alanine resulting from the NotI 303 site). The L51I correction in the heavy chain was introduced later (after observing it in the crystal 304 structure) by IVA cloning 44. Expression plasmids for the heavy and light chain were mixed in a 305 1:1 (w/w) ratio for transient transfection in HEK293 cells with polyethylenimine, following standard 306 procedures. Medium was collected 6 days after transfection and cells were spun down by 307 10 minutes of centrifugation at 1000 g. Antibody was directly purified from the supernatant using 308 Ni-sepharose excel resin (Cytiva Lifes Sciences), washing with 500 mM NaCl, 2 mM CaCl2, 15 309 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 13 mM imidazole, 20 mM HEPES pH 7.8 and eluting with 500 mM NaCl, 2 mM CaCl2, 200 mM 310 imidazole, 20 mM HEPES pH 7.8. 311 312 Western blot validation of anti-FLAG-M2 binding 313 To test binding of our recombinant anti-FLAG-M2 to the FLAG-tag epitope, compared to the 314 commercially available anti-FLAG-M2 (Sigma), we used both antibodies to probe Western blots 315 of a FLAG-tagged protein in parallel. Purified Rabies virus glycoprotein ectodomain (SAD B19 316 strain, UNIPROT residues 20-450) with or without a C-terminal FLAG-tag followed by a foldon 317 trimerization domain and an octahistidine tag was heated to 95 °C in XT sample buffer (Biorad) 318 for 5 minutes. Samples were run twice on a Criterion XT 4-12% polyacrylamide gel (Biorad) in 319 MES XT buffer (Biorad) before Western blot transfer to a nitrocellulose membrane in tris-glycine 320 buffer (Biorad) with 20% methanol. The membrane was blocked with 5% (w/v) dry non-fat milk in 321 phosphate-buffered saline (PBS) overnight at 4 °C. The membrane was cut in two (one half for 322 the commercial and one half for the recombinant anti-FLAG-M2) and each half was probed with 323 either commercial (Sigma) or recombinant anti-FLAG-M2 at 1 µg/mL in PBS for 45 minutes. After 324 washing three times with PBST (PBS with 0.1% v/v Tween20), polyclonal goat anti-mouse fused 325 to horseradish peroxidase (HRP) was used to detect binding of anti-FLAG-M2 to the FLAG-tagged 326 protein for both membranes. The membranes were washed three more times with PBST before 327 applying enhanced chemiluminescence (ECL; Pierce) reagent to image the blots in parallel. 328 329 Data Availability 330 The raw LC-MS/MS data have been deposited to the ProteomeXchange Consortium via the 331 PRIDE partner repository with the dataset identifier PXD023419. The coordinates and reflection 332 file with phases for the remodeled crystal structure of the anti-FLAG-M2 Fab have been deposited 333 in the Protein Data Bank under accession code 7BG1. 334 335 Acknowledgements 336 Herceptin was a kind gift from Roche (Penzberg, Germany). We would like to acknowledge 337 support by Protein Metrics Inc. through access to Supernovo software and helpful discussion on 338 de novo antibody sequencing. We would like to thank everyone in the Biomolecular Mass 339 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 14 Spectrometry and Proteomics group at Utrecht University for support and helpful discussions. 340 This research was funded by the Dutch Research Council NWO Gravitation 2013 BOO, Institute 341 for Chemical Immunology (ICI; 024.002.009). 342 343 Author Contributions 344 WP and JS conceived of the project. WP carried out the MS experiments. WP and JS analyzed 345 the MS data. MFP remodeled the crystal structure. MFP cloned and produced the synthetic 346 recombinant antibody and carried out Western blotting. JS supervised the project. JS wrote the 347 first draft and all authors contributed to preparing the final version of the manuscript. 348 349 Competing Interests 350 The authors declare no competing interests 351 352 References 353 1. Tonegawa, S., Somatic generation of antibody diversity. Nature 1983, 302 (5909), 575-354 581. 355 2. Watson, C. T.; Glanville, J.; Marasco, W. A., The individual and population genetics of 356 antibody immunity. Trends in immunology 2017, 38 (7), 459-470. 357 3. Carter, P. J.; Lazar, G. A., Next generation antibody drugs: pursuit of the'high-hanging 358 fruit'. Nature Reviews Drug Discovery 2018, 17 (3), 197. 359 4. Grilo, A. L.; Mantalaris, A., The increasingly human and profitable monoclonal antibody 360 market. Trends in biotechnology 2019, 37 (1), 9-16. 361 5. Baker, M., Blame it on the antibodies. Nature 2015, 521 (7552), 274. 362 6. Uhlen, M.; Bandrowski, A.; Carr, S.; Edwards, A.; Ellenberg, J.; Lundberg, E.; Rimm, D. 363 L.; Rodriguez, H.; Hiltke, T.; Snyder, M., A proposal for validation of antibodies. Nature methods 364 2016, 13 (10), 823-827. 365 7. Fischer, N. In Sequencing antibody repertoires: the next generation, MAbs, Taylor & 366 Francis: 2011; pp 17-20. 367 8. Georgiou, G.; Ippolito, G. C.; Beausang, J.; Busse, C. E.; Wardemann, H.; Quake, S. R., 368 The promise and challenge of high-throughput sequencing of the antibody repertoire. Nature 369 biotechnology 2014, 32 (2), 158-168. 370 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 15 9. Robinson, W. H., Sequencing the functional antibody repertoire—diagnostic and 371 therapeutic discovery. Nature Reviews Rheumatology 2015, 11 (3), 171. 372 10. Boutz, D. R.; Horton, A. P.; Wine, Y.; Lavinder, J. J.; Georgiou, G.; Marcotte, E. M., 373 Proteomic identification of monoclonal antibodies from serum. Analytical chemistry 2014, 86 374 (10), 4758-4766. 375 11. Castellana, N. E.; McCutcheon, K.; Pham, V. C.; Harden, K.; Nguyen, A.; Young, J.; 376 Adams, C.; Schroeder, K.; Arnott, D.; Bafna, V., Resurrection of a clinical antibody: Template 377 proteogenomic de novo proteomic sequencing and reverse engineering of an anti-lymphotoxin-378 α antibody. Proteomics 2011, 11 (3), 395-405. 379 12. Chen, J.; Zheng, Q.; Hammers, C. M.; Ellebrecht, C. T.; Mukherjee, E. M.; Tang, H.-Y.; 380 Lin, C.; Yuan, H.; Pan, M.; Langenhan, J., Proteomic analysis of pemphigus autoantibodies 381 indicates a larger, more diverse, and more dynamic repertoire than determined by B cell 382 genetics. Cell reports 2017, 18 (1), 237-247. 383 13. Cheung, W. C.; Beausoleil, S. A.; Zhang, X.; Sato, S.; Schieferl, S. M.; Wieler, J. S.; 384 Beaudet, J. G.; Ramenani, R. K.; Popova, L.; Comb, M. J., A proteomics approach for the 385 identification and cloning of monoclonal antibodies from serum. Nature biotechnology 2012, 30 386 (5), 447-452. 387 14. Guthals, A.; Gan, Y.; Murray, L.; Chen, Y.; Stinson, J.; Nakamura, G.; Lill, J. R.; 388 Sandoval, W.; Bandeira, N., De novo MS/MS sequencing of native human antibodies. Journal of 389 proteome research 2017, 16 (1), 45-54. 390 15. Lee, J.; Boutz, D. R.; Chromikova, V.; Joyce, M. G.; Vollmers, C.; Leung, K.; Horton, A. 391 P.; DeKosky, B. J.; Lee, C.-H.; Lavinder, J. J., Molecular-level analysis of the serum antibody 392 repertoire in young adults before and after seasonal influenza vaccination. Nature medicine 393 2016, 22 (12), 1456-1464. 394 16. Lee, J.; Paparoditis, P.; Horton, A. P.; Frühwirth, A.; McDaniel, J. R.; Jung, J.; Boutz, D. 395 R.; Hussein, D. A.; Tanno, Y.; Pappas, L., Persistent antibody clonotypes dominate the serum 396 response to influenza over multiple years and repeated vaccinations. Cell host & microbe 2019, 397 25 (3), 367-376. e5. 398 17. Lindesmith, L. C.; McDaniel, J. R.; Changela, A.; Verardi, R.; Kerr, S. A.; Costantini, V.; 399 Brewer-Jensen, P. D.; Mallory, M. L.; Voss, W. N.; Boutz, D. R., Sera antibody repertoire 400 analyses reveal mechanisms of broad and pandemic strain neutralizing responses after human 401 norovirus vaccination. Immunity 2019, 50 (6), 1530-1541. e8. 402 18. Bandeira, N.; Pham, V.; Pevzner, P.; Arnott, D.; Lill, J. R., Automated de novo protein 403 sequencing of monoclonal antibodies. Nature biotechnology 2008, 26 (12), 1336-1338. 404 19. Rickert, K. W.; Grinberg, L.; Woods, R. M.; Wilson, S.; Bowen, M. A.; Baca, M. In 405 Combining phage display with de novo protein sequencing for reverse engineering of 406 monoclonal antibodies, MAbs, Taylor & Francis: 2016; pp 501-512. 407 20. Savidor, A.; Barzilay, R.; Elinger, D.; Yarden, Y.; Lindzen, M.; Gabashvili, A.; Tal, O. A.; 408 Levin, Y., Database-Independent Protein Sequencing (DiPS) Enables Full-Length de Novo 409 Protein and Antibody Sequence Determination. Molecular & Cellular Proteomics 2017, 16 (6), 410 1151-1161. 411 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 16 21. Sen, K. I.; Tang, W. H.; Nayak, S.; Kil, Y. J.; Bern, M.; Ozoglu, B.; Ueberheide, B.; Davis, 412 D.; Becker, C., Automated antibody de novo sequencing and its utility in biopharmaceutical 413 discovery. Journal of The American Society for Mass Spectrometry 2017, 28 (5), 803-810. 414 22. Sousa, E.; Olland, S.; Shih, H. H.; Marquette, K.; Martone, R.; Lu, Z.; Paulsen, J.; Gill, 415 D.; He, T., Primary sequence determination of a monoclonal antibody against α-synuclein using 416 a novel mass spectrometry-based approach. International Journal of Mass Spectrometry 2012, 417 312, 61-69. 418 23. Tran, N. H.; Rahman, M. Z.; He, L.; Xin, L.; Shan, B.; Li, M., Complete de novo assembly 419 of monoclonal antibody sequences. Scientific reports 2016, 6 (1), 1-10. 420 24. Brizzard, B. L.; Chubet, R. G.; Vizard, D., Immunoaffinity purification of FLAG epitope-421 tagged bacterial alkaline phosphatase using a novel monoclonal antibody and peptide elution. 422 Biotechniques 1994, 16 (4), 730-735. 423 25. Sigma-Aldrich anti-FLAG-M2 F1804 product page. 424 https://www.sigmaaldrich.com/catalog/product/sigma/f1804?lang=en®ion=NL (accessed 05-425 01-2021). 426 26. Paizs, B.; Suhai, S., Fragmentation pathways of protonated peptides. Mass 427 spectrometry reviews 2005, 24 (4), 508-548. 428 27. Diedrich, J. K.; Pinto, A. F.; Yates III, J. R., Energy dependence of HCD on peptide 429 fragmentation: stepped collisional energy finds the sweet spot. Journal of the American Society 430 for Mass Spectrometry 2013, 24 (11), 1690-1699. 431 28. Frese, C. K.; Altelaar, A. M.; van den Toorn, H.; Nolting, D.; Griep-Raming, J.; Heck, A. 432 J.; Mohammed, S., Toward full peptide sequence coverage by dual fragmentation combining 433 electron-transfer and higher-energy collision dissociation tandem mass spectrometry. Analytical 434 chemistry 2012, 84 (22), 9668-9673. 435 29. Frese, C. K.; Zhou, H.; Taus, T.; Altelaar, A. M.; Mechtler, K.; Heck, A. J.; Mohammed, 436 S., Unambiguous phosphosite localization using electron-transfer/higher-energy collision 437 dissociation (EThcD). Journal of proteome research 2013, 12 (3), 1520-1525. 438 30. Carter, P.; Presta, L.; Gorman, C. M.; Ridgway, J.; Henner, D.; Wong, W.; Rowland, A. 439 M.; Kotts, C.; Carver, M. E.; Shepard, H. M., Humanization of an anti-p185HER2 antibody for 440 human cancer therapy. Proceedings of the National Academy of Sciences 1992, 89 (10), 4285-441 4289. 442 31. Slamon, D. J.; Leyland-Jones, B.; Shak, S.; Fuchs, H.; Paton, V.; Bajamonde, A.; 443 Fleming, T.; Eiermann, W.; Wolter, J.; Pegram, M., Use of chemotherapy plus a monoclonal 444 antibody against HER2 for metastatic breast cancer that overexpresses HER2. New England 445 journal of medicine 2001, 344 (11), 783-792. 446 32. Einhauer, A.; Jungbauer, A., The FLAG™ peptide, a versatile fusion tag for the 447 purification of recombinant proteins. Journal of biochemical and biophysical methods 2001, 49 448 (1-3), 455-465. 449 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/ 17 33. Roosild, T. P.; Castronovo, S.; Choe, S., Structure of anti-FLAG M2 Fab domain and its 450 use in the stabilization of engineered membrane proteins. Acta Crystallographica Section F: 451 Structural Biology and Crystallization Communications 2006, 62 (9), 835-839. 452 34. Entzminger, K. C.; Hyun, J.-m.; Pantazes, R. J.; Patterson-Orazem, A. C.; Qerqez, A. N.; 453 Frye, Z. P.; Hughes, R. A.; Ellington, A. D.; Lieberman, R. L.; Maranas, C. D., De novo design of 454 antibody complementarity determining regions binding a FLAG tetra-peptide. Scientific reports 455 2017, 7 (1), 1-11. 456 35. Ikeda, K.; Koga, T.; Sasaki, F.; Ueno, A.; Saeki, K.; Okuno, T.; Yokomizo, T., Generation 457 and characterization of a human-mouse chimeric high-affinity antibody that detects the 458 DYKDDDDK FLAG peptide. Biochemical and Biophysical Research Communications 2017, 486 459 (4), 1077-1082. 460 36. Lima, W. C.; Gasteiger, E.; Marcatili, P.; Duek, P.; Bairoch, A.; Cosson, P., The ABCD 461 database: a repository for chemically defined antibodies. Nucleic acids research 2020, 48 (D1), 462 D261-D264. 463 37. Bondt, A.; Hoek, M.; Tamara, S.; de Graaf, B.; Peng, W.; Schulte, D.; den Boer, M. A.; 464 Greisch, J.-F.; Varkila, M. R.; Snijder, J., Human Plasma IgG1 Repertoires are Simple, Unique, 465 and Dynamic. SSRN 2020. 466 38. Ehrenmann, F.; Kaas, Q.; Lefranc, M.-P., IMGT/3Dstructure-DB and 467 IMGT/DomainGapAlign: a database and a tool for immunoglobulins or antibodies, T cell 468 receptors, MHC, IgSF and MhcSF. Nucleic acids research 2010, 38 (suppl_1), D301-D307. 469 39. Ehrenmann, F.; Lefranc, M.-P., IMGT/DomainGapAlign: IMGT standardized analysis of 470 amino acid sequences of variable, constant, and groove domains (IG, TR, MH, IgSF, MhSF). 471 Cold Spring Harbor Protocols 2011, 2011 (6), pdb. prot5636. 472 40. Emsley, P.; Cowtan, K., Coot: model-building tools for molecular graphics. Acta 473 Crystallographica Section D: Biological Crystallography 2004, 60 (12), 2126-2132. 474 41. Afonine, P. V.; Grosse-Kunstleve, R. W.; Echols, N.; Headd, J. J.; Moriarty, N. W.; 475 Mustyakimov, M.; Terwilliger, T. C.; Urzhumtsev, A.; Zwart, P. H.; Adams, P. D., Towards 476 automated crystallographic structure refinement with phenix. refine. Acta Crystallographica 477 Section D: Biological Crystallography 2012, 68 (4), 352-367. 478 42. Chen, V. B.; Arendall, W. B.; Headd, J. J.; Keedy, D. A.; Immormino, R. M.; Kapral, G. 479 J.; Murray, L. W.; Richardson, J. S.; Richardson, D. C., MolProbity: all-atom structure validation 480 for macromolecular crystallography. Acta Crystallographica Section D: Biological 481 Crystallography 2010, 66 (1), 12-21. 482 43. Fuglsang, A., Codon optimizer: a freeware tool for codon optimization. Protein 483 expression and purification 2003, 31 (2), 247-249. 484 44. García-Nafría, J.; Watson, J. F.; Greger, I. H., IVA cloning: a single-tube universal 485 cloning system exploiting bacterial in vivo assembly. Scientific reports 2016, 6, 27459. 486 487 488 .CC-BY 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.07.425675doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425675 http://creativecommons.org/licenses/by/4.0/