key: cord-0025001-d1ptk3py
authors: Bondt, Albert; Hoek, Max; Tamara, Sem; de Graaf, Bastiaan; Peng, Weiwei; Schulte, Douwe; van Rijswijck, Danique M.H.; den Boer, Maurits A.; Greisch, Jean-François; Varkila, Meri R.J.; Snijder, Joost; Cremer, Olaf L.; Bonten, Marc J.M.; Heck, Albert J.R.
title: Human plasma IgG1 repertoires are simple, unique, and dynamic
date: 2021-12-15
journal: Cell Syst
DOI: 10.1016/j.cels.2021.08.008
sha: d632b08551d5c99bf957a7de5a94db01a8f45bd4
doc_id: 25001
cord_uid: d1ptk3py

Although humans can produce billions of IgG1 variants through recombination and hypermutation, the diversity of IgG1 clones circulating in human blood plasma has largely eluded direct characterization. Here, we combined several mass-spectrometry-based approaches to reveal that the circulating IgG1 repertoire in human plasma is dominated by a limited number of clones in healthy donors and septic patients. We observe that each individual donor exhibits a unique serological IgG1 repertoire, which remains stable over time but can adapt rapidly to changes in physiology. We introduce an integrative protein- and peptide-centric approach to obtain and validate a full sequence of an individual plasma IgG1 clone de novo. This IgG1 clone emerged at the onset of a septic episode and exhibited a high mutation rate (13%) compared with the closest matching germline DNA sequence, highlighting the importance of de novo sequencing at the protein level. A record of this paper’s transparent peer review process is included in the .

Correspondence a.j.r.heck@uu.nl

The human body produces immunoglobulins (Igs) to help combat pathogens. The number of distinct IgG molecules our body can produce exceeds several billions. In contrast to this near-infinite theoretical number, we reveal here, as monitored directly by LC-MS, that only a few dozen distinct clones dominate in abundance the total spectrum of plasma IgG1s of both healthy and diseased donors. Our data indicate that each donor's IgG1 repertoire is distinctively unique. We longitudinally profile an individual's IgG1 repertoire and observe the occurrence or disappearance of specific IgGs over time in response to changes in physiology, e.g., during a septic episode. As a proof of concept, we show that individual plasma IgG1 clones can be quantified and fully sequenced and identified by using a combination of top-down and bottom-up proteomics.

The human immune system protects us not only from threats posed by pathogens but also cancer and various other diseases. The immune response in health and disease is crucially dependent on each person's repertoire of immune cells, antibodies, and other circulating plasma proteins. A detailed molecular view of these plasma components is crucial to understanding how they affect each individual's immune response. Immunoglobulins (Igs) represent some of the most important molecules in the human immune system. Ig molecules consist of two identical heavy chains and two identical light chains, held together by a network of disulfide bridges. The heavy chains possess three (IgG, IgA, and IgD) to four (IgM and IgE) immunoglobulin domains with large, conserved regions, which play a role in receptor binding and complement activation. Similar to the heavy chain, the C-terminal domain of the light chain is constant. On the other hand, for both heavy and light chain, the sequence of the N-terminal Ig domains is hypervariable and contains the recognitiondetermining parts, better known as complementarity-determining regions (CDRs), of the molecule. They are enclosed in the two fragment antigen-binding (Fab) arms of the antibody, consisting of the light chain and the N-terminal parts of the heavy chain (Fd).

The variable regions of the antibody, in particular the CDRs, are optimized to recognize antigens by a process known as affinity maturation. The best antigen binders, modified through somatic recombination and hypermutation of numerous coding gene segment variants, give rise to the mature IgG secreting plasma B cells that produce the antibodies that end up in our circulation. The circulating antibodies, thus, consist of the fully matured heavy-and light-chain variable domain sequences that harbor the CDRs, joined by generally less sequence-variable framework regions (FR) . Each unique combination of mature chains is called an Ig clone.

Considering the genes encoding the variable domain sections and the known genomic rearrangements, somatic hypermutations, and post-transcriptional processes that join these sections-resulting into the ultimate protein products-it has been estimated that in humans the theoretical molecular Ig diversity may extend beyond 10 15 (Schroeder, 2006) . Not all theoretically possible Ig clones will be expressed in the human body, since the number of B cells in a human body is several orders of magnitude lower (1-2 3 10 11 ) (Apostoaei and Trabalka, 2012) . Nevertheless, it has been assumed that the actual repertoire of circulating Igs is extremely large and diverse (Briney et al., 2019; Soto et al., 2019) .

Recombinantly expressed clones (mainly IgG) have become a major class of therapeutics, used to fight multiple types of pathologies such as cancers and various infectious diseases. Recent developments have moved the field toward using therapeutic monoclonal antibody (mAb) sequences derived from human subjects instead of laboratory animals; this trend is exemplified by successful new treatments for Ebola (Corti et al., 2016; Dyer, 2019; Mulangu et al., 2019) and COVID-19 (Jones et al., 2021) . These therapeutic antibody sequences are inferred from genetic material recovered from patients that successfully overcame the disease. The ability to detect and identify individual mature IgG protein sequences directly from donor specimens would aid such efforts.

To experimentally determine the Ig repertoire, attempts have been made to sequence Ig nucleic acids from bulk B cell populations or B cell subsets from single donors. These Igs are analyzed with high-throughput sequencing (Ig-seq or Repseq) at the DNA or RNA level (Georgiou et al., 2014) , resulting in datasets of tens to hundreds of thousands of unique reads of variable abundance (Vollmers et al., 2013) . Unfortunately, these analyses at the level of DNA and RNA do not measure the actual antibodies of interest, and the presence of a cognate BCR sequence in the B cell population provides no information regarding abundance levels of the antibodies that end up in circulation.

Alternatively, the challenge could be approached from the protein level, analyzing the Ig repertoire present in circulation. The most abundant Ig in human plasma is IgG, at a concentration of approximately 10 mg/mL during health (Cassidy and Nordby, 1975; Gonzalez-Quintela et al., 2008) . Of the four IgG subclasses, IgG1 is the most abundant, accounting for more than 50% of all IgGs (van der Giessen et al., 1975) in most people. Given the extremely high theoretical limits on Ig diversity and the large number of experimentally determined variants, most researchers have refrained from analyzing plasma antibodies directly at the protein level. It has been mostly assumed that it is impossible to detect any single Ig clone against the expected background of thousands to millions of other clones.

However, in recent years, several attempts have been made to identify plasma Igs directly. G. Georgiou and colleagues should be considered pioneers. In their method (also referred to as Igseq, but protein-based instead of solely gene-based), IgGs are purified from plasma, and tryptic Ig peptides are characterized using liquid chromatography coupled online to tandem mass spectrometry (LC-MS/MS), focusing on the detection of IgG heavy-chain CDR3 peptides Lavinder et al., 2014; Lee et al., 2016 Lee et al., , 2019 Wine et al., 2013) . Another recently developed proteomics approach uses LC-MS to profile intact light chains purified from plasma (Barnidge et al., 2014a (Barnidge et al., , 2014b He et al., 2019 He et al., , 2017 Mills et al., 2015; Sharpley et al., 2019) . In another report, profiling and sequencing of intact Fabs was attempted, but the sensitivity was too low, and only a few sequence tags could be derived from separated light chains (Wang et al., 2019) .

In these approaches, information about the heavy-and lightchain pairing is often lost, which is unfortunate as only the com-bined CDRs from both heavy and light chain provide the full complementarity against an antigen. Of note, in nearly all these studies, only a subset of plasma immunoglobulins recognizing and binding a specific antigen was targeted, or a mAb spiked into plasma was used as a model. Nevertheless, the remarkable observation was made that a person's plasma Ig repertoire could be several orders less variable than assumed based on the available B cell repertoire and likely dominated by only a limited number of clones. Whether this observed number of IgGs is a consequence of the targeted analysis of antigen-specific IgGs or diseases with monoclonal IgG overexpression (gammopathy) remained unclear.

Here, we introduce a sensitive and efficient approach for quantitative plasma IgG1 clone profiling. The method was applied to a sample set of two healthy control donors, as well as eight critically ill patients, from which sequential plasma samples were retrieved while developing nosocomial sepsis, experiencing a dramatic immunological change in a relatively short time span ( Figure 1A ). The application of our method revealed several important properties of the human plasma IgG1 repertoire: (1) the total IgG1 repertoire is dominated by a few dozen clones, but (2) is unique for each individual, and (3) single clones are differentially affected by physiological changes. Next, as proof of concept, we de novo sequenced a single plasma clone-that appeared at sepsis onset-directly from human plasma, only made possible by using iteratively a combination of protein-centric and peptide-centric proteomics. To validate the correctness of the derived de novo IgG1 sequence, we produced its recombinant mAb equivalent and compared its key structural features with the plasma clone. We foresee that this approach will unlock the potential of mass spectrometric analysis and identification of disease-responsive IgGs that may be directly evaluated and used as therapeutic agents, as they do already represent fully matured, fully human Abs.

Mass-spectrometry-based Fab profiling of the human plasma repertoire To chart and monitor the nature of the plasma IgG1 repertoire, we started our analysis with 10 mL of plasma, derived from a single donor. From such a sample, we first captured all intact IgGs using affinity beads. Subsequently, the captured IgG molecules were digested using the highly specific Ig degrading enzyme (IgdE), cleaving specifically IgG1's at a defined site in the upper hinge region, resulting in the segregated Fc (that remains bound to the affinity beads) and two identical Fabs ( Figure 1B ) (Spoerry et al., 2016) . We focused on the Fab fragments derived from the intact IgGs because this (1) concentrates the clonal signal since each IgG1 provides two identical Fab molecules, (2) results in more homogeneous mass profiles by removing the Fc portions that harbor two heterogeneous N-glycosylation sites, and (3) retains all hypervariable CDRs, which define the unique identity and antigen recognition of each clone.

Following elution of the IgG1 Fabs, all these intact 45-53-kDa Fab molecules were subjected to reversed-phase LC-MS. All individual Fab fragments were subsequently characterized by their distinctive mass and chromatography retention time. In our analyses, we spiked-in two monoclonal IgG1 antibodies (Table S1 , mAbs #1 and #2) of known sequence at a defined concentration. These mAbs were used as internal standards for mass and retention time calibration and quantification of all the other distinctive plasma IgG1 clones. This also allowed us to calculate the precision and accuracy of retention time, mass, and quantification in our measurements as illustrated in Figure S1A .

Using a mixture of six monoclonal antibodies (Table S1 ) spiked into a single-donor plasma background, we furthermore observed linear relationship between (1) the quantity of mAbs that were spiked into the plasma sample, and (2) the quantity that is observed (R 2 = 0.99, Figure S1B ), using a dilution/titration with 4,000, 800, 200, and 20 ng per mAb. Of note, no discrepancy was observed for the Fab glycosylated mAb as compared to the other mAbs. To evaluate the repeatability (technical replicate and sample preparation replicate), the 100 most abundant plasma-derived clones in this sample were quantified in multiple replicates ( Figure S1C ). Finally, one of the samples was injected three times to serve as injection replicates (Figure S1C, #1). From all these validation experiments, we concluded that, by using the approach depicted here (Figure 1 ), the repertoire of Fab clones could be accurately and reproducibly determined from as little as 10 mL of plasma obtained from a single donor at a single time point.

Plasma IgG1 repertoires are dominated by a few clones Next, we analyzed in parallel a set of 32 plasma samples obtained from eight patients of the Dutch molecular diagnosis and risk stratification of sepsis (MARS) cohort ( Figure 1A ; Table  S2 ). All these patients underwent major gastrointestinal surgery and subsequently developed an infectious complication (i.e., anastomotic leakage or pneumonia) resulting in sepsis. Plasma samples were obtained from the patients at four different stages: within 24 h of surgery when no signs of sepsis were present (sample T1; between t = À19 and t = À2 days), on two consecutive days after onset of sepsis (samples T2 and T3; t = 0 and t = 1 day), and upon intensive care unit (ICU) discharge, when the sepsis had been resolved (sample T4; between t = 3 and t = 61 days) ( Figure 1A ). In addition, to monitor the nature of the plasma IgG1 repertoire in healthy donors, we performed an identical analysis on two sets of three sequential healthy donor plasma samples, each collected roughly 1 month apart.

In marked contrast to expectations of extensive IgG1 diversity, we observed that all the LC-MS profiles of IgG1 Fab molecules were dominated by just a few dozen peaks, both in the 32 sepsis plasma samples as well as in the six plasma samples of the healthy donors (Data S1 and S2). In each of the LC-MS runs, we could pick up distinctive IgG1 signals of between 35 and (B) The experimental approach taken involves IgG capturing from 10-100 mL of serum, followed by the specific enzymatic digestion of the IgG1 molecules in their hinge region, generating two identical Fab portions. All generated Fabs are collected and subsequently subjected to LC-MS analysis. The clonal repertoire is profiled, whereby each identified clone is characterized by its unique mass and retention time. A single post-sepsis clone from one of the patients (F59) was selected for de novo sequencing, combining protein-and peptide-centric mass-spectrometry-based sequencing. The extracted full sequence of the plasma IgG1 was validated by analyzing, in a similar manner, a recombinant IgG1 analog of the plasma clone. Figure 2 . Monitoring personalized plasma Fab repertoires reveals not only their simplicity and extreme donor uniqueness but also longitudinal clonal variations within a single donor (A) Heatmap illustrating the degree of overlap between the detected IgG1 repertoires in all analyzed sepsis patient plasma samples. For each of the eight donors, four sampling times were available, and Fab profiles were measured by LC-MS analysis. Each LC-MS peak, exhibiting a unique mass and retention time pair, was considered a unique clone and annotated as RT # mass . The amount of Fab molecules, based on intensity that is persistent, is quantified and shown as a (legend continued on next page) ll OPEN ACCESS Article 543 in abundance dominant clones (median 196; Data S1) that we distinguished by their masses in Dalton and retention times (RT) in minutes. Each detected clone was given a unique identifier: RT # mass .

We found that the summed concentrations of the 30 most abundant IgG1 clones account for more than two-thirds of all IgG1 molecules detected from plasma (median 71.8%, range 47.3%-98.3%, Data S1). The full lists of detected clones are provided in Data S1. In addition, the deconvoluted mass plots (similar to Figure 2C ) and the raw chromatograms supported by extracted chromatograms for all identified Fabs obtained in the LC-MS measurements are provided in Data S2, and one example of the actual raw mass spectrometry data of Fabs that we have analyzed is provided in Data S3.

Next, we looked at the cumulative mass distribution of all detected IgG1 Fabs in the plasma samples from all donors and at all time points. This cumulative mass distribution-representing more than 5,500 clones experimentally identified-resembled the expected mass distribution of over 130 million IgG1 Fabs constructed from the sequences in the ImMunoGeneTics information system (IMGT) (Lefranc and Lefranc, 2001 ) database ( Figure S1D ), revealing that we profiled a representative IgG1 repertoire.

As can be seen in Figure S1D , most Fab fragments exhibit masses between 46 and 49.5 kDa. However, we also did detect some higher Fab masses, which may be indicative of Fab glycosylation. The average mass of Fab glycans is estimated to be around 2,300 Da (Bondt et al., 2016; Hafkenscheid et al., 2017) . In two of our donors, annotated M66 and M77, we did detect relatively high levels of Fab glycosylation as shown for M66, time point 3, in Figure S2A , with the annotation of the putative Fab glycosylation annotated in Figure S2B . Still the Fab glycosylated clones represented just a few percent of the total abundance (2%-6% for donor M66 and M77). The fractional abundance of glycosylated Fabs in the other patients was between 0% and 1.86% (with a median of 0.295%) ( Figure S2C ). Also, in the two healthy donors, one displayed a fractional abundance of glycosylated Fabs of about 3% (F66H), whereas in the other donor this remained around 0.5% at all sampling time points. This fractional abundance is substantially lower than would be predicted based on the IMGT database ($11% of these 130 million sequences carry at least one consensus N-glycosylation site) and lower than the $17% described in literature (Bondt et al., 2016; Hafkenscheid et al., 2017) . However, our data reveal that the fractional abundance of glycosylated Fabs is also donor-dependent.

Plasma IgG1 repertoires are unique for each donor Next, we compared the IgG1 Fab profiles between time points not only within a single donor but also between different donors. Interindividual analyses showed that virtually none of the Fab IDs overlapped between individuals (Figures 2A, S3 , S4A, and S4B). Also, hierarchical clustering based on clone IDs clusters each donor distinctively ( Figure 2B ). Thus, each donor has its own simple albeit unique IgG1 repertoire. However, within each individual, overlap between the measured IgG1 repertoires measured across time was found to be very high, even when the time span largely exceeded the average half-life of IgG1s (Figures 2A-2D and S4) . A large portion of the most abundant IgG1s remains present throughout the sampling window of up to 2 months, although we also observe a response in the IgG1 profile due to changes in the patient's physiology (discussed below). To exclude whether these findings were due to the severe physiological state of the eight septic patient donors, we performed a similar analysis on plasma of two healthy donors. In the absence of a dramatic immunological challenge, the IgG1 profiles, as obtained from the two healthy donors, show (1) a very high stability over time within individuals and (2) an interindividual overlap in uniquely RT-and mass-identified IgG1 clones near to zero (Figures S4A and S4B) .

Longitudinal quantitative monitoring of single IgG1 clones By spiking in two recombinant IgG1 mAbs at a known concentration to the plasma samples prior to sample preparation, we could provide additional absolute quantification for the abundance of the detected IgG1s. The concentrations of the LC-MS-detected endogenous IgG1 clones present in plasma ranged from less than 0.05 up to >400 mg/mL (<300 pM up to >2.5 mM, median $6.25 nM; Data S1). Monitoring serological IgG1 repertoires over time in patients who had undergone a septic episode, we observed several distinct quantitative patterns. The most recognizable patterns are highlighted in Figures 2C-2E. There are IgG1 clones that become lower in concentration over time blue boxes) . Another category of IgG1 clones was undetectable in the plasma until post-sepsis but became abundantly present at T4 ( Figures 2C-2E , red boxes). Yet, another group of IgG1 clones was found to be rather persistent in concentration over all sampling moments ( Figures 2C-2E , green boxes). In healthy donors, the majority of clones were more persistent in concentration, although some subtle changes could be observed for some clones. percentage as indicated by the color bar. In between donors, the overlap is found to be on average 3%, whereas within a single donor at different time points the overlap was found to be in between 26% and 98%. (B) Hierarchical clustering of the Fab clonal repertoires based on correlation distance. The branch lengths depict the distance between the repertoires. Donors are colored as in Figure 1A . Full de novo sequencing of an individual plasma IgG1 clone From the data presented earlier, we can conclude that the plasma IgG1 repertoire of individual healthy and diseased donors is unique and dominated by a few dozen abundant clones. Next, we sought to identify the exact sequences of these clones to obtain further insight into their function and origin. Complete de novo sequencing of serological IgGs is notably difficult for several reasons. First, the inherent sequence hypervariability has so far proven to be highly challenging even when (personalized) genome-based sequence templates are available. Second, de novo sequencing of antibodies at the protein level by MS is hampered by the complex nature of IgG molecules, stemming from their multichain structure and the intricate network of disulfide bridges. Finally, although shotgun proteomics can be used to obtain (partial) sequences of purified mAbs (Guthals et al., 2017; Sen et al., 2017; Tran et al., 2016) , this becomes several orders of magnitude more difficult in a plasma background containing many IgG molecules of closely homologous sequences.

To tackle this challenge, we explored a hybrid and iterative approach combining state-of-the-art peptide-centric (i.e., bottom-up) and protein-centric (i.e., middle-down) mass-spectrometric sequencing methods, using dedicated algorithms to mix-and-match the extracted proteomics-based sequencing data. As proof of concept, we attempted to fully sequence the light and heavy chain of a Fab derived from a single highly abundant IgG1 clone observed in donor F59. This donor showed a plasma IgG1 repertoire dominated by two clones in particular: 24.4 1 47,359.4 (average mass 47,359.4 Da, retention time 24.4 min) and 20.6 2 47,025.7 ( Figure 3A ). We focused on the 24.4 1 47,359.4 clone, as this clone appeared exclusively after the onset of sepsis.

Following fractionation and selection of the 24.4 1 47,359.4 clone, we subjected this Fab to mass-spectrometry-based de novo sequencing, combining data from middle-down and bottom-up proteomics ( Figure 1B) . The de novo sequence information from both approaches was used to first select several closely matching light-and heavy-chain templates from the publicly available IMGT database of IgG germline sequences ( Figure S5 ; Tables S3 and S4 ). Subsequently, the bottom-up and middledown sequencing data and the measured intact accurate masses of the Fab, light chain, and Fd were used to refine the selected template sequences and ultimately determine the mature sequence present in the donor, revealing discrepancies between the germline and mature sequences.

In the protein-centric approach, we performed electron transfer dissociation (ETD) on the intact Fab, as well as the light chain and Fd separately, obtained by reduction of the Fab molecule ( Figure 3A , bottom trace). Several fragmentation scans obtained for the intact Fab, the separated light chain, and Fd were grouped and combined based on their unique precursor mass and retention time ( Figure 3B ). The ETD mass spectra of the intact Fab yielded accurate masses of the light chain and Fd by cleavage of the interchain disulfide bond, thus providing direct information about the light-chain-heavy-chain pairing (Figure 3C, top spectra) . In addition, these ETD spectra yielded extended sequence tags, covering informative parts of the CDR3 and framework (FR) 4 regions of both Fab chains, in a similar manner as previously reported by performing ETD or ECD of intact IgG molecules (den Boer et al., 2020; Fornelli et al., 2017; Shaw et al., 2020) . Complementary, ETD spectra of the separated Fab chains yielded partial sequence information for the FR1, CDR1, FR2, CDR2, and the constant region of the selected clone ( Figure 3C , bottom spectra and Figure S6 ). Although our middle-down MS data provided valuable information about the clone of interest, they did not fully cover the sequence, primarily due to incomplete fragment formation and ambiguous sequence information obtained from the larger fragments.

To further extend our sequencing attempt, we subjected the fraction containing the targeted IgG1 clone to enzymatic digestion, using in parallel four proteases: trypsin, chymotrypsin, thermolysin, and pepsin. The resulting peptides were analyzed by a bottom-up approach, using de novo sequencing algorithms for sequence annotation (Peng et al., 2021) . Although fractionated and enriched for the desired 24.4 1 47,359.4 clone, the bottom-up MS data also contained numerous peptides originating from co-isolated plasma clones ( Figure S7 ), which made it impossible to determine the correct sequence by solely using the bottom-up MS data. Nevertheless, by iteratively extending the sequence information from the middle-down MS approach with the de novo peptides from the bottom-up MS approach, we ultimately were able to extract the most likely germline precursor of the targeted clone and, notably, its mature sequence by implementing various single-amino-acid mutations not present in the IMGT database ( Figure 4) .

In more detail, by using the IMGT database of germline sequences as input, the cumulative MS evidence revealed that the analyzed IgG1 Fab carried a lambda light chain. This light chain originated from a combination of the immunoglobulin lambda (IGL) variable (V) 2-14*01 (IMGT/LIGM-DB: Z73664), IGL joining (J) 2*01 (IMGT/LIGM-DB: M15641), and IGL constant (C) 2*01 (IMGT/LIGM-DB: J00253) alleles. For the heavy-chain Fd portion, we determined that it was constructed from the immunoglobulin heavy (IGH) V3-9*01 (IMGT/LIGM-DB: M99651), IGHJ5*01 (IMGT/LIGM-DB: J00256), and IGHG1*03 (IMGT/LIGM-DB: Y14737) alleles and a diversity (D)-region, which substantially deviated from any reported germline D-region. Although initial identification resulted in just a partial sequence coverage, we could fill the gaps in the germline sequences using sequence tags from the middle-down MS and the de novo peptides from the bottom-up MS ( Figures S8  and S9 ). Eventually, our approach resulted in a complete and exact precursor mass match for the light and heavy chains, 100% sequence coverage in bottom-up MS, and near-complete annotation of all available fragments in the middle-down MS data. In this process, numerous mutations had to be incorporated when comparing our data with the germline template sequences ( Figures 4B and 4C , in red letters), revealing somatic hypermutation (SHM) of around 13% and 16% for the V gene of the light chain and the heavy chain, respectively. The level of confidence in each identified mutation site is based on several criteria, including support of a mutation by consecutive mass peaks in the middle-down MS retrieved sequence tags, the peptide scores and coverage depth in the bottomup MS data as well as the frequency of amino acid occurrence at a given position in a pool of experimental and the germline IgG1 sequences (Table S5 ; Figure S10 ). Together this provides ll OPEN ACCESS Article proof of concept that it is possible to de novo sequence IgG1s present in plasma.

The definitive sequence assignment benefited largely from gathering multiple pieces of experimental evidence, notably (1) the accurate mass of the Fab, (2) the highly accurate masses of the two individual chains comprising the Fab, (3) the de novo identified amino acid sequence reads, retrieved from the middle-down fragmentation of intact chains and intact Fab molecule, and (4) the de novo identified amino acid reads from the-multiple proteasesbased-peptide-centric bottom-up approach. (C) Deconvoluted ETD-MS/MS spectra of the intact Fab (top spectra) and reduced LC and Fd fragments thereof (mirrored spectra) with the c/z-fragment ions annotated for the LC (left) and Fd (right). The isotopic envelopes of the most abundant charge states of the LC and Fd fragments released from the Fab upon ETD are depicted in the insets with theoretical isotope distributions of the corresponding chain sequences overlaid as black circles. Masses of the LC and Fd fragments and the cumulative mass of the Fab are indicated above the spectra. See also Figure S6 for more detail on the fragment ions identified in these middledown MS spectra.

To validate the accuracy of the full de novo sequence of the 24.4 1 47,359.4 clone from donor F59, we generated a synthetic recombinant IgG1 clone based on the experimentally determined sequence. We used exactly the same procedures to sequence the recombinant mAb as applied to the plasma-obtained clone, including all the peptide-and protein-centric approaches. Since CDRs are the most critical and hypervariable regions of the antibody, we set out to find peptides in the two datasets covering these regions, so that we could directly compare their fragmentation spectra. A direct comparison of tandem MS spectra of the CDR-spanning peptides from the donor clone and the recombinant mAb are presented in Figures 5A and 5B , covering parts of the light chain and Fd portion, respectively. Above the graphs, the de-novo-obtained sequence is shown with the annotated CDRs, whereby the purple lines indicate the selected peptides. In each panel, the MS/MS spectra obtained from peptides derived from the plasma clone of donor F59 and the recombinant mAb are shown, with the donor spectrum on top and the recombinant mAb spectrum mirrored below. Through visual comparison and as evidenced by the high correlation scores ranging between 0.91 and 0.98, the spectra obtained for peptides originating from the recombinant clone were highly similar to the MS/ MS spectra from the peptides derived from the plasma clone 24.4 1 47,359.4 . The observed high similarity was not restricted to the m/z positions but was found to be also reflected into fragment ion intensities, which are quite sequence specific, thus presenting an additional layer of confidence. (A) Data analysis pipeline displaying the key steps in the de novo sequencing, namely, filtering of the germline database of light-and heavy-chain sequences, assembling of selected allelic variants with mass constraints, scoring of the assembled sequences by using middle-down MS data, iterative refining of the best scoring templates by using peptides in bottom-up MS, and benchmarking of the optimized mature sequences using data from both middle-down and bottom-up MS analysis. (B) Alignment of the best matching germline IGLV amino acid sequence from the IMGT database (IGLV2-14*01) with the mature sequence that was determined for the light chain of the dominant clone (top box), the fragments from middle-down MS (middle box), and the peptides from bottom-up MS. (C) Alignment of the best matching germline IGHV amino acid sequence from the IMGT database (IGHV3-9*01) with the mature sequence that was determined for the Fd of donor F59's clone 24.4 1 47,359.4 (top box), the fragments from middle-down MS (middle box), and the peptides from bottom-up MS. CDR regions in top panels of (B) and (C) were annotated with reference to the closest matching IMGT sequence. Amino acids that were determined to be different in the mature 24.4 1 47,359.4 sequence are highlighted in red.

Such a direct comparison of spectral features was extended to the middle-down analysis, used for obtaining sequence tags of the intact Fab ( Figure 5C ). The intact recombinant Fab displayed a nearly identical retention time profile when compared with the plasma clone 24.4 1 47,359.4 Fab (Figure 5C, left panel) . Furthermore, both clone 24.4 1 47,359.4 and the recombinant mAb emerged with nearly identical charge distributions ( Figure 5C, middle panel) , whereby the slight dif-ferences in the distribution are likely due to the underlying background of co-eluting Fabs in the plasma-derived sample. Nevertheless, the masses detected for the two Fabs were identical, i.e., within a 20-ppm mass error. Moreover, when the intact Fabs were subjected to ETD, alike fragment masses and retention times for both the light chain and Fd were observed, comparing the recombinant mAb with the plasmaderived clone. Finally, the generated lower mass fragment ions used for sequence-tag generation were also very similar ( Figure 5C, right panel) . Likewise, the in-solution reduction of the Fabs revealed that there were no mass differences between the donor clone and recombinant mAb (light chain and Fd mass within 10 ppm).

Based on all these data, we can conclusively state that the sequence of the plasma-derived clone 24.4 1 47,359.4 is identical to that of the recombinant mAb, with practically identical data observed at every step of our integrative de novo sequencing approach. This not only validates the accuracy of the IgG1 sequence that we obtained for clone 24.4 1 47,359.4 but also reinforces that the methodology presented here can be used to derive the correct full sequences from individual clones even when they are in a background of other plasma (highly sequence-homologous) IgG1 clones. Although this whole analysis pipeline is still quite arduous, requiring manual validation throughout the process, we consider this proof of concept a major step forward and expect that further fine-tuning of the algorithms will enhance the throughput in the future.

The human body can make billions of different antibodies, stemming from the versatile and complex recombination process, accompanied by additional somatic hypermutations, helping us to adapt to a life-long exposure to various pathogens. Here, we demonstrate that it is feasible to profile the IgG1 repertoire of individual donors qualitatively and quantitatively by LC-MS, following the capturing of IgGs from plasma and analyzing the generated IgG1 Fab fragments. From this technical advance, one of the key observations we make is that in all studied donors at all time points, only a limited number of IgG1 clones dominate an individual's repertoire. In all donors, the 30 most abundant clones make up two-thirds of all detected circulating IgG1 molecules; in one donor just two clones contributed $50% to the detected serum population of IgG1 molecules. The IgG1 clonal profiles are found to be unique for each donor. Within a donor, the profiles are highly similar across time, but they also adapt to physiological changes (e.g., sepsis). The mass-spectrometry-based approach requires only minute amounts of plasma ($10 mL) and does not involve labor-intensive enrichment protocols. We further show that specific IgG clones can be extracted from the plasma and analyzed in depth, ultimately leading to the mass-spectrometry-based de novo sequencing of the whole Fab molecule. Therefore, one of the holy grails in proteomics, de novo sequencing of antibodies directly extracted from plasma, seems to be within reach.

The ultimate mature sequence of 24.4 1 47,359.4 clone we sequenced here revealed that around 13% and 16% of the amino acids of the V-regions of light and heavy chains were different when compared with the closest germline sequence match within the IMGT database. This number of mutations is higher than the reported average (7%) for IgG1 heavy-chain variable regions, as determined from RNA sequences (Kitaura et al., 2017) . This suggests that DNA/RNA templates of the IgG sequences can be helpful, but for obtaining the correct sequence of the circulating clone, analyzing sequences at the level of the proteins will be essential.

The ability to de novo sequence the whole Fab molecule is the result of combining, iteratively, middle-down and bottom-up proteomics data. An alternative strategy employed is to combine bottom-up proteomics data with BCR sequences using RNA sequencing of one donor to generate a database to match this donor's Ig bottom-up proteomics data against (Lee et al., 2016) . Although also very powerful, a recent application of this approach highlighted further the relevance of antibody sequencing at the level of proteins, when it was shown that for the six potent anti-HIV1 antibodies found by antigen-specific single-B-cell sequencing, only three could also be detected in circulation as IgG protein products (Williams et al., 2017) . All these issues highlight the necessity of direct analysis of the serum Ig repertoire at the protein level, as we now demonstrated here to be feasible.

Longitudinal quantitative clonal profiling, as presented here, opens a myriad of future prospects, both fundamental and applied. It allows to advance our understanding of B cell biology and antibody dynamics. Historically, general observations have been made about antibody half-lives using a single dose of labeled antibodies (Morell et al., 1970) or by determining the restoration of normal IgG levels following high-dose administrations of intravenous IgG (Melamed et al., 2018) . Through the method presented here, we can monitor the longitudinal abundance of each single clone in the circulation and monitor how it responds to changes in the donor's physiology. Given the approximately 20day IgG antibody half-life, it is expected that in the time span studied here (10-63 days), a decay in the concentration of clones would be detected. Indeed, we do observe several diminishing clones as depicted in Figures 2 and S4 . However, other patterns are also observed, indicative of continuous production of the clone, even over a time window of two months. This is in line with previous reports on the presence of and production of antibodies by long-lived plasma cells, both in mice and humans (Bernasconi et al., 2002; Manz et al., 1997; Slifka et al., 1998) . In addition, persistence of autoreactivity has been reported before (Tebani et al., 2020) , as well as persistence of antibody clonotypes detected by CDR-H3 proteomics (Lee et al., 2019) . From our data, we cannot derive information regarding (auto)reactivity, but the data do suggest that long-term stability is not exclusive to autoantibodies and is instead a common phenomenon.

In summary, the de novo sequencing at the protein level of IgG clones circulating in plasma is shown here to be feasible. We demonstrate the synergistic power of combining iteratively peptide-and protein-centric-based sequencing, which is capable of not only identifying the distinct alleles from which a clone originates but also its entire mature sequence. The work presented here is still quite a laborious proof of concept. Our aim for the future is quantitative monitoring and mass-spectrometry-based sequencing of multiple serum immunoglobulins directly at the protein level, i.e., as they appear in circulation and function in the human immune response.

Detailed methods are provided in the online version of this paper and include the following: 

The authors declare no competing interests. Any additional information required to reproduce this work is available from the Lead Contact.

We obtained longitudinal plasma samples from the Molecular Diagnosis and Risk Stratification of Sepsis (MARS) biorepository (ClinicalTrials.gov identifier NCT01905033), for which subjects had been included in the mixed ICU of a tertiary teaching hospital in the Netherlands (University Medical Centre Utrecht, Utrecht) since January 2011. Donors were enrolled via an opt-out consent method approved by the institutional review board of the UMC Utrecht (IRB no. 10-1056C). Daily leftover EDTA plasma (obtained from blood drawn during routine care) was stored at -80 C until use.

For the current study, we included eight patients with esophageal or gastroesophageal junction cancer who underwent an elective esophagectomy and gastric pull-through procedure and had subsequently developed an infectious complication (i.e., anastomotic leakage or pneumonia). These patients were all admitted to the Intensive Care Unit on two occasions. The first admission concerned routine observation after elective resection followed by an uncomplicated ICU stay of fewer than 2 days. All patients were subsequently readmitted to the ICU due to sepsis. For all episodes of sepsis, microbiological cultures were obtained either before or during ICU readmission, and clinical infection was adjudicated highly plausible according to pre-defined criteria (Klein Klouwenberg et al., 2013) . Furthermore, all infectious episodes met SIRS criteria and had a Sequential Organ Failure Assessment (SOFA) score R 2, thus fulfilling current Sepsis-3 definitions (Singer et al., 2016) . All patients were ultimately discharged from ICU in a clinically stable condition. We analyzed plasma samples obtained at four well-defined time points: within 24 hours of surgery when no signs of sepsis were present (sample T1), on two consecutive days after onset of sepsis (samples T2 and T3), and upon ICU discharge following resolution of sepsis (sample T4). Patient characteristics are summarized in Table S2 . All patients had been treated with neoadjuvant chemoradiotherapy prior to surgery. None of the patients received treatment with immunoglobulins or monoclonal antibodies either prior to or during ICU admission.

In addition, longitudinal EDTA plasma samples from two healthy Caucasian donors were purchased from Precision Med (Solana Beach, CA, US). The samples were part of the 'Normal Control Collections', protocol number 7005-8200. These donors were selected having similar characteristics as the sepsis donors regarding age and gender.

The IgG purification and generation of Fabs was adapted from an earlier published protocol (Bondt et al., 2014) . The FcXL affinity matrix used in the workflow, which binds to the CH3 domain of the IgG constant region, has recently been shown not to provide a bias regarding analysis of the Fc glycosylation residing in the CH2 domain (Amez Martín et al., 2021) . Mobicol spin filters were assembled according to the manufacturer's instructions and placed in 2 mL Eppendorf tubes. Then 20 mL FcXL affinity matrix slurry was added to the spin filter, followed by three washing steps with 150 mL PBS, in which the liquid was removed by centrifugation for 1 min at 1000 3 g. Two additional washing steps with 150 mL were performed. After washing, the 2 mL tube was replaced by a 1.5 mL tube. The affinity matrix was resuspended in 150 mL PBS, and 10 mL plasma was added. Furthermore, 1 mL of a solution containing two known mAbs at 200 mg/mL each was added, corresponding to 20 mg/mL when calculated to the plasma concentration. The samples were then incubated, under shaking conditions for one hour at room temperature. After the incubation, the flow-through was collected, and the affinity matrix with bound IgGs was washed four times with 150 mL PBS. Finally, 50 mL PBS containing 100 U of the IgdE (FabALACTICA; Genovis AB, Lund, Sweden) protease was added before incubating on a thermal shaker at 37 C for >16 hours. After the incubation, 10 mL of Ni-NTA beads were added to bind and remove the His-tagged protease, whereafter the samples were incubated for an additional 30 minutes. The flow through after centrifugation contained the Fab fragments generated from IgG1.

In an array of experiments, we optimized and validated the robustness and accuracy of our IgG1 capturing approach, the generation of the Fab fragments and the analysis of the Fab LC-MS profiles. Therefore, we prepared a mixture of six IgG1 monoclonal antibodies, including the two spiked into every plasma sample subsequently analyzed. For the method optimization the 6 recombinant mAbs were added to the plasma of a single donor in different quantities: 4,000, 800, 200, 20, 2, or 0.5 ng per mAb. The mAbs used were trastuzumab (Roche, Penzberg, Germany), alemtuzumab (Genmab, Utrecht, The Netherlands), the Fab glycosylated cetuximab, rituximab, bevacizumab, and infliximab (Evidentic, Berlin, Germany). Fab sequences and theoretical masses of these mAbs, including the most abundant cetuximab glycoforms, are shown in Table S1 . All these samples were injected once, except for the 200 ng sample which was injected three times to provide additional injection replicates.

Reversed-phase liquid chromatography was performed by using a Thermo Scientific Vanquish Flex UHPLC instrument, equipped with a 1 mm x 150 mm MAbPac RP analytical column, directly coupled to an Orbitrap Fusion Lumos Tribrid (Thermo Scientific, San Jose, CA, USA) or Q Exactive HF-X mass spectrometer (Thermo Fisher Scientific, Bremen, Germany). The column preheater, as well as the analytical column chamber, were heated to 80 C during chromatographic separation. Both samples, either containing intact Fabs or separate Fab chains, were separated over 62 min at a 150 mL/min flow rate. Gradient elution was achieved by using two mobile phases A (0.1% HCOOH in Milli-Q) and B (0.1% HCOOH in CH 3 CN) and ramping up B from 10 to 25% over one minute, from 25 to 40% over 55 min, and from 40 to 95% over one minute. MS data were collected with the instrument operating in Intact Protein and Low Pressure mode. Spray voltage was set at 3.3 kV, capillary temperature 350 C, probe heater temperature 100 C, sheath gas flow 35, auxiliary gas flow 10, and source-induced dissociation was set at 15 V. The electrospray voltage was applied after 2 min to prevent the salts in the sample from entering the MS. Intact Fabs were recorded with a resolution setting of 7,500 (@ m/z 200) in MS1, which allows for better detection of charge distributions of the large proteins (> 30 kDa) (van de Waterbeemd et al., 2018) . Separate Fab chains were analyzed with a resolution setting of 120,000 (@ m/z 200) in MS1, which allows for more accurate mass detection of smaller proteins (< 30 kDa). MS1 scans were acquired in a range of 500-3,000 Th with the 250% AGC target and a maximum injection time set to either 50 ms for the 7,500 resolution or 250 ms for the 120,000 resolution. In MS1, 2 mscans were recorded for the 7,500 resolution and 5 mscans for the 120,000 resolution per scan. Data-dependent mode was defined by the number of scans: single scan for intact Fabs and two scans for separate Fab chains. In both cases, MS/MS scans were acquired with a resolution of 120,000, a maximum injection time of 500 ms, a 1,000% AGC target, and 5 mscans averaged and recorded per scan. The ions of interest were mass-selected by quadrupole in a 4 Th isolation window and accumulated to the AGC target prior to fragmentation. The electrontransfer dissociation (ETD) was performed using the following settings: 16 ms reaction time, a maximum injection time of 200 ms, and the AGC target of 1e6 for the ETD reagent. For the data-dependent MS/MS acquisition strategy, the intensity threshold was set to 2e5 of minimum precursor intensity. MS/MS scans were recorded in the range of m/z = 350-5,000 Th using high mass range quadrupole isolation.

Clonal profiling data analysis Masses were retrieved from the generated RAW files using BioPharmaFinder 3.2 (Thermo Scientific). Deconvolution was performed using the ReSpect algorithm between 5 and 57 min using 0.1 min sliding window with 50% offset and a merge tolerance of 50 ppm, with noise rejection set at 95%. The output mass range was set at 10,000 to 100,000 with a target mass of 48,000 and mass tolerance of 30 ppm. Charge states between 10 and 60 were included, and the Intact Protein peak model was selected. Further data analysis was performed using Python 3.8.10 (with libraries: Pandas 1.2.4 (McKinney, 2010), Numpy 1.20.2 (van der Walt et al., 2011), Scipy 1.6.2 (Virtanen et al., 2020) , Matplotlib 3.3.4 (Hunter, 2007) and Seaborn 0.11.1). Masses of the BioPharmaFinder identifications (components) were recalculated using an intensity weighted mean considering only the most intense peaks comprising 90% of the total intensity. Furthermore, using the data of two spiked-in mAbs (trastuzumab and alemtuzumab) a mass correction was applied based on the difference between the calculated and observed mAb masses, and similarly, a retention time alignment was applied to minimize deviation between runs.

Components between 45,000 and 53,000 kDa with the most intense charge state above m/z 1,000 and score R40 were considered Fab portions of IgG clones. The clones were matched between runs using average linkage (unweighted pair group method with arithmetic mean UPGMA) L N distance hierarchical clustering. Flat clusters were formed based on a cophenetic distance constraint derived from the mass and retention time tolerance. These tolerances were defined as three times the standard deviation of the mAb standards, which were 1.4 Da and 0.8 min, respectively. Clones within a flat cluster were considered identical between runs.

Peptide-centric (bottom-up) de novo sequencing Clones of interest were captured through fraction collection using the same chromatography setup used for LC-MS profiling. Samples were dried under vacuum and resuspended in a 50 mM ammonium bicarbonate buffer. To boost signal intensity, the fractions were pooled across the time points. Samples were equally split for digestion with four proteases.

For digestion with trypsin, chymotrypsin and thermolysin, a sodium deoxycholate (SDC) buffer was added to a total volume of 80 mL, 200 mM Tris pH 8.5, 10 mM tris(2-carboxyethyl)phosphine (TCEP), 2% (w/v) SDC final concentration. For pepsin digestion, a urea buffer was added to a total volume of 80 mL, 2M Urea, 10 mM TCEP. Samples were denatured for 10 min at 95 C followed by reduction for 20 min at 37 C. Next, iodoacetic acid was added to a final concentration of 40 mM and incubated in the dark for 45 min at room temperature for alkylation of free cysteines. Then for trypsin, chymotrypsin and thermolysin 50 mM ammonium bicarbonate buffer was added to a total volume of 100 mL. For pepsin 1 M HCl was added to a final concentration of 0.04 M. 0.1 mg of each protease was added and incubated for 4 hours at 37 C. After digestion 2 mL HCOOH was added to precipitate the SDC. SDC was removed by centrifugation for 20 min at max speed (20,817 3 g) after which the supernatant was moved to a new tube.

Samples were desalted by Oasis HLB (Waters Corporation, Millford, MA, USA) following a 5-step protocol. 1) Sorbent was wetted using 2x 200 mL CH 3 CN, 2) followed by equilibration with 2x 200 mL water/10% HCOOH. 3) The sample was loaded and 4) washed with 2x 200 mL water/10% HCOOH. 5) Finally, the sample was eluted using 2x 50 mL water/50% CH 3 CN /10% HCOOH and dried down by vacuum centrifuge. Prior to MS analysis samples were reconstituted in 2% HCOOH. LC-MS/MS Data acquisition was performed on the Orbitrap Fusion Tribrid Mass Spectrometer (Thermo Scientific, San Jose, CA, USA) coupled to UHPLC 1290 system (Agilent Technologies, Santa Clara, CA, USA). Peptides were trapped (Dr. Maisch Reprosil C18, 3 mm, 2 cm 3 100 mm) prior to separation (Agilent Poroshell EC-C18, 2.7 mm, 500 mm 3 75 mm). Trapping was performed for 10 min in solvent A (0.1% HCOOH in Milli-Q), and the gradient was as follows: 0 -13% solvent B (0.1% HCOOH in 80% CH3CN) over 5 min, 13 -44% solvent B over 65 min, 44 -100% solvent B over 4 min, and 100% B for 4 min (flow was split to achieve the final flowrate of approximately 200 nL/min). Mass spectrometry data was collected in a data-dependent fashion with survey scans ranging from 350-2,000 Th (resolution of 60,000 @ m/z 200), and up to 3 sec for precursor selection and fragmentation with either stepped higher-energy collisional dissociation (HCD) set to [25%, 35%, 50%] or electron transfer dissociation (ETD), used with charge-normalized settings and supplemental activation of 27%. The MS2 spectra were recorded at a resolution of 30,000 (@ m/z 200). The AGC targets for both MS and MS2 scans were set to standard within a maximum injection time of 50 and 250 ms, respectively. Data analysis Raw LC-MS/MS data were processed using PEAKS X software (Bioinformatics Solutions Inc., Waterloo, ON, Canada) for de novo sequencing of peptides. The following parameters were used for de novo sequencing: parent mass error tolerance -12 ppm, fragment mass error tolerance -0.02 Da, max number of variable PTMs per peptide -3. Fixed modifications: Carboxymethyl; variable modifications: Oxidation (HW), Oxidation (M), Pyro-glu from E, Pyro-glu from Q, Carboxymethyl (KW, X@N-term), and Carbamylation. Resulting de novo peptide tables were exported as.csv files and used for filtering of the IMGT database and determination of the mature 24.4 1 47359.4 clone sequences.

Protein-centric (middle-down) de novo sequencing Fab samples were prepared without treatment as well as under denaturing and reducing conditions for analysis of intact Fab and separate Fab chains, respectively. The latter were denatured and reduced in 10 mM TCEP at 60 C for 30 min prior to LC-MS/MS analysis. Approximately 2-5 mg of each sample was injected for a single middle-down LC-MS(/MS) experiment using the parameters described above. Data analysis Full middle-down MS spectra were deconvoluted with either Xtract (Zabrouskov et al., 2005) or ReSpect (Thermo Fisher Scientific, Bremen, Germany) for isotopically-resolved (separate Fab chains) or unresolved (intact Fabs) data, respectively. Middle-down LC-MS/MS data were charge-deconvoluted and deisotoped into singly-charged mass spectra using the 'Parallel Xtract' node and converted to mascot generic format (mgf) files in Thermo Proteome Discoverer (version 2.3.0.523; Thermo Fisher Scientific, Bremen, Germany) . Deconvolution parameters were set as follows: ReSpect: precursor m/z tolerance -0.2 Th; relative abundance threshold -0 %; precursor mass range from 3 to 100 kDa; precursor mass tolerance of 30 ppm; charge states between 3 and 100. Xtract: signal/noise threshold of 3; m/z range -500-3,000 Th.

For the analysis of the final mature Fab chains of the most abundant clone in the plasma of patient F59, we used an integrative approach that utilizes bottom-up and middle-down data and the international ImMunoGeneTics information system (IMGT) database ( Figure 3B ). First, the replicate middle-down MS/MS spectra were grouped per deconvoluted mass feature in the LC-MS-only runs by using a 3 Da mass window and a 3 min retention time window. The resulting grouped spectra were merged into a single spectrum, whereby peaks' intensities were combined when they coincided within a 2 ppm window. The identity of the constant domain (C-) gene was determined by matching the fragments in these combined spectra to a database of all functional, open reading frame, and in-frame pseudogene alleles for C-genes retrieved from IMGT/Gene-DB (Lefranc et al., 1999) . Next, bottom-up LC-MS/MS spectra of the fractionated 24.4 1 47359.4 clone were screened against a database of all functional, open reading frame, and in-frame pseudogene alleles for the variable domain (V-) genes retrieved from the IMGT/Gene-DB (Lefranc et al., 1999) , using local Smith-Waterman alignment with the BLOSUM62 matrix in which the common de novo sequencing errors I/L, Q/E and N/D were modified to neutral substitutions (Smith and Waterman, 1981) . From any gene regions with confident peptide matches (FDR < 1%), we then took the FR1, 2 and 3 regions and subjected them to an in-house scoring algorithm to score their agreement with our middle-down data ( Figure S5 ). In short: the algorithm searches for peak patterns that would occur as a result of fragmentation of the provided sequence regions, disregarding preceding and succeeding parts of the initial sequence. Ranking the gene regions by a resulting composite score enabled us to select top-scoring templates as a starting point for our sequencing efforts as well as discard lowscoring regions from further analyses. The remaining gene regions were then used to in silico generate a database of germline light and heavy chains.

Then, using a custom implementation of the DirecTag algorithm (Tabb et al., 2008) , all possible sequence tags were detected and annotated in the combined middle-down spectra. These sequence tags were used to search the filtered germline light and heavy IgG chains. For the best scoring germline sequences, consistent sequence tags with a length of more than 4 amino acids were searched against de novo predicted peptides originating from the bottom-up peptide-centric MS data. In an iterative manner, the matching peptides were used to modify the best scoring selected germline sequences until the mass of the final sequence matched the precursor masses determined by middle-down MS (Figures S8 and S9) . In more detail, the gaps between the consecutive sequence tags extracted from the middle-down MS data were first filled with amino acids from the best matching germline sequence. Then, the filled gaps were compared to the highest scoring peptides retrieved from the bottom-up MS data, aligned to the region of interest using Clustal Omega algorithm. When aligned peptides showed discrepancies from the germline sequence the amino acid residues in the gaps were altered and the theoretical mass of the gap was compared to the experimental mass, defined by the mass difference between consecutive sequence tags. Finally, the modified sequences were rescored by spectral alignment, sequence-tag detection, and a bottom-up database search, providing the final mature Fab sequences. The final predicted sequences -and more specifically the identified mutations when compared to the most closely related gene regions -were additionally compared to the frequency of amino acid occurrence at their specific positions (as numbered by IMGT) in both the AbYsis database (Swindells et al., 2017) and the recombined full IMGT database (Lefranc et al., 1999) . This screening yielded an estimate of how likely the mutations were to occur. While some of the predictions are rather rare, none of them are impossible as reported by AbYsis (Table S5) .

For quantification of LC-MS profiling data, the intensity values of the two mAb standards (trastuzumab and alemtuzumab) were averaged in each run and set to 20 mg/mL. The intensity values of all other detected Fabs were normalized to these values in order to determine the concentration of each individual clone. For the quantification of mAbs in the validation experiment, a slightly different normalization was used. The intensity values of the detected mAbs in all runs were normalized to the intensity values of trastuzumab and alemtuzumab as measured in the first 200 ng replicate.

Statistical values in figures depicted as lower-case letter r indicate Pearson correlation coefficients. Distances between samples as shown in Figure 2B were determined by distance correlation. Linear regression for validation of quantification Figure S1B was determined by ordinary least squares regression with the coefficient of determination given as uncentered R 2 . The error-bars in the figure represent the standard error of the mean (SEM). Figure S1 | Performance evaluation of plasma Fab profiling approach using various experimental controls. A) Accuracy and precision in mass, retention time and abundance of spiked-in monoclonal antibody controls. The boxplots show aggregated data from the mAb controls over all plasma measurements. The box indicates median and inter quartile ranges (IQRs), and the whiskers span 1.5 times the IQR. Values outside this range (fliers) are marked with diamonds. From left to right, the panels show observed mass error of these mAbs, observed retention time, and detected intensity. B) Linearity of detection. For these experiments six monoclonal antibodies (Trastuzumab, Cetuximab, Rituximab, Campath, Bevacizumab and Infliximab) were added at 20, 200, 800 and 4000 ng in a plasma background. The detected response of all of these mAbs was compared to the expected response visualized as scatterplot. The error bars depict the standard error, and the dotted line shows an ordinary least squares (OLS) linear regression accompanied by a R 2 . C) Reproducibility of quantitation. The reproducibility of the top 100 most intense clones in a plasma were measured over several replicates and visualized as boxplots. The values are shown as fold change of the concentration compared to the first replicate measurement. The first two boxplots depict injection replicates, i.e. replicates from multiple injections of the same sample. The other boxplots show technical replicates, which constitute the entire sample preparation procedure starting from the plasma. The boxes are constructed using the same method as the boxplots in panel (A). D) Distributions of detected Fab masses compared to the expected mass distribution. Kernel density estimation of all Fabs detected in all sepsis donors, at all analyzed time points, compared against an in silico generated distribution of Fabs from the IMGT database. The number of Fabs used to generate each distribution is shown in the figure legend. Both distribution histograms use a bin size of 100 Da. The Pearson correlation coefficient (r) was calculated between both kernel density estimations. (B) . B) Zoomed-in mass profile with annotation of glycan-related masses. Monosaccharides mass differences between peaks are annotated as follows: blue square = GlcNAc (203 Da), magenta diamond = sialic acid (291 Da). For annotation of the glycosylation a mass tolerance of 1 Da and a retention time tolerance of 0.6 min was used. C) Estimated percentages of plasma Fab molecules being glycosylated in all samples measured. For this, Fab clones with a mass >49,500 Da were assumed to carry one or more Fab glycans. This value was chosen because the in silico Fab distribution generated from the IMGT database (shown in Figure S1D ) extends up to 49,500 Da, the majority of Fabs has a mass between 47,000 and 48,000, and the average literature described Fab glycan has a mass of approximately 2,300 Da. The validity of this assumption is illustrated for M66 -T3 in panels (A) with the glycosylated Fabs being in mass quite separated from the other clones. The percentage of plasma Fab molecules being glycosylated was calculated by taking the sum of Fab concentrations above 49,500 Da and dividing these by the total detected concentration in each sample. On the left in C) are shown the % Fab glycosylation in the plasmas of the septic patients, on the right the % observed in two healthy donors. In general, we observe that the % Fab glycosylation is < 1%, although in some donors it is substantially higher, i.e. M66. M77 and F66H. The filtering of IMGT database and scoring of the germline IGXV and IGXC-alleles was performed by using iteratively bottom-up (BU) and middle-down (MD) proteomics data. A) Filtering of germline IGL and IGK alleles with BU and MD mass spectrometry (MS) reduces the number of possible germline light chain sequences from 3,577 to 6 candidate sequences (~600-fold reduction). B) Filtering of germline IGH alleles with BU MS and MD MS reduces the number of possible germline heavy chain sequences from 42,840 to 8 candidate sequences (~5,000-fold reduction). C) Fragment matching scores for the germline C-gene alleles of the light (left) and heavy (right) chain of the Fab clone 24.4 1 47359.4 using the middle-down MS data. D) Fragment matching scores for the Framework Regions 1, 2, and 3 of the germline V-gene alleles of light (left) and heavy (right) chains of IgG1 determined by using the middle-down MS data. On the left, alleles from the largest IGHV3 family are displayed as filled circles; alleles of other IGHV families are shown as empty squares. On the right, alleles from larger and more homologous IGKV families are shown as empty squares, while filled circles display alleles of IGLV families. Germline V-gene sequences were downloaded from IMGT. B) Correlation matrix displaying sequence similarity among all germline V-gene sequences of the Fab heavy (left) and light (right) chain. Normalized cumulative PSM scores are shown below the correlation maps. Some of the top-scoring V-gene sequences are indicated with black arrows. The V-genes ultimately determined for clone 24.4 1 47359.4 by the integrative de novo bottom-up and middle-down sequencing are highlighted in green. Figure S8 | Refining of the sequence of clone 24.4 1 47359.4 light chain germline IGLV2-14*01-IGLJ2*01, based on the iterative integration of middle-down and bottom-up proteomics data. First, the sequence tags detected in the middledown MS data were used as arrays of consecutive fragment peaks, which directly hinted at the presence of 11 mutations (M49L, Y51S, Y51S, N55D, N62S, A85S, D86M, Y88F, S95D, S96L, S97T, T98S, and L99F). Next, these tags were aligned to the de novo sequenced peptide sequences obtained by bottom-up MS, revealing 2 additional mutations. The highestscoring aligned peptides were used to extend the initial sequence tags, and then these steps were iteratively repeated. At each step of tag extension, the mass offsets were calculated by comparing a mass gap between two consecutive tags to the mass of amino acid residues in the corresponding gap in the germline sequence. Iteratively, middle-down tags were extended with aligning peptides until all (if possible) mass offsets become equal to 0 Da. Eventually, 13 mutations and one modified residue (Pyro-Q) were determined for the 24.4 1 47359.4 light chain sequence. De-charged isotopic distributions of the fragments involved in each sequence tag are displayed as red peaks in the corresponding insets with the theoretical isotopic distributions for these fragments displayed underneath each fragment. Fragmentation spectra of the peptides used in this refining process for the CDRs are shown in Figure 5 . See also Supplemental Table 5 for an overview of the evidence supporting each detected amino acid mutation. Figure S9 | Refining of the sequence of clone 24.4 1 47359.4 heavy chain germline IGHV3-9*01-IGHJ5*01, based on the iterative integration of middle-down and bottom-up proteomics data. First, sequence tags were detected in the middledown MS data as arrays of consecutive fragment peaks similar to refining of the light chain sequence. Next, these tags were aligned to the de novo sequenced peptides from bottom-up MS. The highest-scoring aligned peptides were used to extend the initial tags, and then this step was repeated. At each step of tag extension, the mass offsets were calculated by comparing a mass gap between two consecutive tags to the mass of amino acid residues in the corresponding gap in the germline sequence. Iteratively, tags were extended with aligning peptides until all (if possible) mass offsets become equal to 0 Da. Eventually, more than 20 mutations were determined for the N-terminal portion of the heavy chain for clone 24.4 1 47359.4 . De-charged isotopic distributions of the fragments involved in each sequence tag are displayed as red peaks in the corresponding insets with the theoretical isotopic distributions for these fragments displayed underneath each fragment. Fragmentation spectra of the peptides used in this refining process for the CDRs are shown in Figure 5 . 

Serum and plasma immunoglobulin G Fc N-glycosylation is stable during storage

Review, synthesis, and application of information on the human lymphatic system to radiation dosimetry for chronic lymphocytic leukemia

Using mass spectrometry to monitor monoclonal immunoglobulins in patients with a monoclonal gammopathy

Phenotyping polyclonal kappa and lambda light chain molecular mass distributions in patient serum using mass spectrometry

Maintenance of serological memory by polyclonal activation of human memory B cells

Immunoglobulin G (IgG) fab glycosylation analysis using a new mass spectrometric highthroughput profiling method reveals pregnancy-associated changes

Fab glycosylation of immunoglobulin G does not associate with improvement of rheumatoid arthritis during pregnancy

Commonality despite exceptional diversity in the baseline human antibody repertoire

Human serum immunoglobulin concentrations: prevalence of immunoglobulin deficiencies

Proteomic analysis of pemphigus autoantibodies indicates a larger, more diverse, and more dynamic repertoire than determined by B cell genetics

Protective monotherapy against lethal Ebola virus infection by a potently neutralizing antibody

Selectivity over coverage in de novo sequencing of IgGs

Two Ebola treatments halve deaths in trial in DRC outbreak

Top-down analysis of immunoglobulin G isotypes 1 and 2 with electron transfer dissociation on a high-field Orbitrap mass spectrometer

The promise and challenge of high-throughput sequencing of the antibody repertoire

Serum levels of immunoglobulins (IgG, IgA, IgM) in a general adult population and their relationship with alcohol consumption, smoking and common metabolic abnormalities

De novo MS/MS sequencing of native human antibodies

Structural analysis of variable domain glycosylation of anti-citrullinated protein antibodies in rheumatoid arthritis reveals the presence of highly sialylated glycans

Classification of plasma cell disorders by 21 tesla Fourier transform ion cyclotron resonance top-down and middle-down MS/MS analysis of monoclonal immunoglobulin light chains in human serum

Analysis of monoclonal antibodies in human serum as a model for clinical monoclonal gammopathy by use of 21 tesla FT-ICR topdown and middle-down MS/MS

Matplotlib: A 2D graphics environment

The neutralizing antibody, LY-CoV555, protects against SARS-CoV-2 infection in nonhuman primates

Different somatic hypermutation levels among antibody subclasses disclosed by a new next-generation sequencing-based antibody repertoire analysis

Interobserver agreement of Centers for Disease Control and Prevention criteria for classifying infections in critically ill patients

Identification and characterization of the constituent human serum antibodies elicited by vaccination

Molecular-level analysis of the serum antibody repertoire in young adults before and after seasonal influenza vaccination

Persistent antibody clonotypes dominate the serum response to influenza over multiple years and repeated vaccinations

IMGT, the international ImMunoGeneTics database

The Immunoglobulin Factsbook

Lifetime of plasma cells in the bone marrow

Data structures for statistical computing in python

Pharmacokinetics of a novel human intravenous immunoglobulin 10% in patients with primary immunodeficiency diseases: analysis of a phase III, multicentre, prospective, open-label study

Detecting monoclonal immunoglobulins in human serum using mass spectrometry

Metabolic properties of IgG subclasses in man

A randomized, controlled trial of Ebola virus disease therapeutics

Mass spectrometry-based de novo sequencing of monoclonal antibodies using multiple proteases and a dual fragmentation scheme

Similarity and divergence in the development and expression of the mouse and human antibody repertoires

Automated antibody de novo sequencing and its utility in biopharmaceutical discovery

A novel mass spectrometry method to identify the serum monoclonal light chain component in systemic light chain amyloidosis

Direct determination of antibody chain pairing by top-down and middle-down mass spectrometry using electron capture dissociation and ultraviolet photodissociation

The third international consensus definitions for sepsis and septic shock (Sepsis-3)

Humoral immunity due to long-lived plasma cells

Identification of common molecular subsequences

High frequency of shared clonotypes in human B cell receptor repertoires

Novel IgG-degrading enzymes of the IgdE protease family link substrate specificity to host tropism of Streptococcus species

abYsis: integrated antibody sequence and structure-management, analysis, and prediction

DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring

Integration of molecular profiles in a longitudinal wellness profiling cohort

Complete de novo assembly of monoclonal antibody sequences

Dissecting ribosomal particles throughout the kingdoms of life using advanced hybrid mass spectrometry methods

Quantification of IgG subclasses in sera of normal adults and healthy children between 4 and 12 years of age

The NumPy array: A structure for efficient numerical computation

SciPy 1.0: fundamental algorithms for scientific computing in Python

Genetic measurement of memory B-cell recall using antibody repertoire sequencing

Topdown mass spectrometry analysis of human serum autoantibody antigenbinding fragments

Potent and broad HIV-neutralizing antibodies in memory B cells and plasma

Molecular deconvolution of the monoclonal antibodies that comprise the polyclonal serum response

New and automated MSn approaches for top-down identification of modified proteins

Human plasma IgG1 repertoires are simple, unique, and dynamic