key: cord-0281696-79p1ly2i authors: Swanson, Olivia; Beem, Joshua M.; Rhodes, Brianna; Wang, Avivah; Barr, Maggie; Chen, Haiyan; Parks, Robert; Saunders, Kevin O.; Haynes, Barton F.; Wiehe, Kevin; Azoitei, Mihai L. title: Identification of CDRH3 Loops in the Naive B Cell Receptor Repertoire that Can Be Engaged by Candidate Immunogens date: 2022-01-16 journal: bioRxiv DOI: 10.1101/2021.12.18.473225 sha: d16b94a8f74e3ecd16de811aea3711aff3dc9526 doc_id: 281696 cord_uid: 79p1ly2i B cell lineages that are the current focus of vaccine development efforts against HIV-1, influenza or coronaviruses, often contain rare features, such as long heavy chain complementarity determining regions (CDRH3) loops. These unusual characteristics may limit the number of available B cells in the natural immunoglobulin repertoire that can respond to pathogen vaccinations. To measure the ability of a given immunogen to engage naturally occurring B cell receptors of interest, here we describe a mixed experimental and bioinformatic approach for determining the frequency and sequence of CDRH3 loops in the immune repertoire that can be recognized by a vaccine candidate. By combining deep mutational scanning and B cell receptor database analysis, CDRH3 loops were found that can be engaged by two HIV-1 germline-targeting immunogens, thus illustrating how the methods described here can be used to evaluate candidate immunogens based on their ability to engage diverse B cell lineage precursors. B cell lineages that are the current focus of vaccine development efforts against HIV-1, 24 influenza or coronaviruses, often contain rare features, such as long heavy chain 25 complementarity determining regions (CDRH3) loops. These unusual characteristics may limit 26 the number of available B cells in the natural immunoglobulin repertoire that can respond to 27 pathogen vaccinations. To measure the ability of a given immunogen to engage naturally 28 occurring B cell receptors of interest, here we describe a mixed experimental and bioinformatic 29 approach for determining the frequency and sequence of CDRH3 loops in the immune 30 repertoire that can be recognized by a vaccine candidate. By combining deep mutational 31 scanning and B cell receptor database analysis, CDRH3 loops were found that can be engaged 32 by two HIV-1 germline-targeting immunogens, thus illustrating how the methods described here fewer than six different amino acids. In contrast, the composition of the D-gene and N-addition 164 encoded residues was more restricted, with an average of 3.5 alternative amino acids that 165 maintained immunogen binding identified at each site. These results indicate that 166 CH505.M5.G458Y engagement of BCRs upon vaccination will not be restricted by the amino 167 acid composition of their CDRH3 loops, since this immunogen can bind CH235 UCA variants 168 with highly divergent sequences in this region. In contrast, our data suggests that 10.17DT will 169 only bind natural BCRs that have CDRH3 loops with highly conserved sequence identity to 170 DH270 UCA. 171 To validate the results of scFv library screening, a subset of DH270 UCA mutants were 172 expressed as recombinant IgG proteins and tested for binding to 10.17DT by surface plasmon 173 resonance (SPR) and ELISA ( Figure 2G ). The four mutations tested at position W101 174 confirmed the substitution profile data, with three changes that significantly affected DH270 175 UCA binding (W101R: log2 enrichment=-2.6, % binding of WT DH270UCA maintained=0.1%; 176 and only one contained just one mismatch (Supplementary Figure 5) . Therefore, the predicted 213 frequency of BCRs expected to be bound strongly by 10.17DT is below the 1 in 85 million limit 214 of detection set by the size of the analyzed database. 215 Next, we searched the BCR database for CDRH3s related to CH235 UCA that can be 216 expected to be recognized by the target immunogen CH505.M5.G458Y. In contrast to DH270 217 UCA, the composition of the CDRH3 sites encoded by the D gene and recognized by 218 CH505.M5.G458Y was not restricted in CH235 UCA ( Figure 2E ). Therefore, we did not limit the 219 BCR database search using any D gene position or composition criteria as we did for DH270 220 UCA. The database search yielded 9,623,490 CDRH3s that matched the CH235 UCA CDRH3 221 length of 15 amino acids (11.3% of database sequences). CH505.M5.G458Y binding to CH235 222 UCA was not affected by the majority of the single amino acid substitutions in the CDRH3 loop 223 ( Figure 2C) . Accordingly, we found that CDRH3 loops predicted to be bound by 224 CH505.M5.G458Y (0 mismatches according to the substitution profile) occurred with a high 225 frequency of 1 in 100 sequences from the human BCR database (Supplementary Figure 5) . 226 These data suggested that CDRH3s compatible with CH505.M5.G458Y binding should be 227 readily available in the naïve B cell BCR pool. 228 By considering different log enrichment values at which a CDRH3 substitution is deemed 229 acceptable for immunogen binding, various sets of BCRs can be identified from the database, 230 and the associated frequency of B cells in the immune repertoire that can be engaged by the 231 target immunogen can be computed. While log enrichment values higher than -0.2 were 232 considered in the analysis above, this threshold value can be lowered or increased in order to 233 identify BCRs that are either more or less likely to be recognized by a target immunogen. This 234 analysis can provide a measure of the sequence "distance" and the relative abundance of 235 CDRH3 loops from the natural immune repertoire in relationships to the ideal CDRH3 loop 236 sequence recognized by the target immunogen. CH505.M5.G458Y recognition of natural 237 CDRH3s was predicated to remain high even when using stringent thresholds of acceptable 238 amino acid mismatches. For example, if only amino acids with log2 enrichment of 0 or higher 239 (corresponding to an estimate of no reduction in binding) are considered acceptable, the 240 frequency of CH505.M5.G458Y compatible CDRH3s is ~1 in 10,000 ( Figure 3B ). In contrast, 241 CDRH3s compatible with 10.17DT binding were only found when thresholds were lowered 242 substantially, with estimated precursor frequencies of ~1:10 million only found when single 243 substitutions predicted to reduce 10.17 DT binding by 40% reduction were accepted as matches 244 ( Figure 3A) . containing natural CDRH3 loops, with binding levels less than 10% of that measured for the 300 unmutated DH270 UCA mAb (Figure 4) . These results were expected, since all the CDRH3 301 loops tested experimentally contain at least two amino acids predicted to greatly reduce 302 10.17DT binding based on the substitution profile. 303 Overall, these data revealed that the naive B cell repertoire contains a high number of 304 BCRs with CDRH3 loop sequence that permit engagement by CH505.M5.G458Y, while B cell 305 activation by the 10.17DT immunogen will be significantly limited by the lack of BCRs containing 306 CDRH3 loops expected to be bound by this immunogen. By analyzing a large collection of 307 BCRs either sequenced from the naïve B cell repertoire or generated computationally, these 308 methods can rapidly estimate the frequency of characteristic B cells that can be engaged by an 309 immunogen upon vaccination. subsequently identified and characterized. This approach is laborious, expensive, may be 340 biased by the particular cell labeling and sorting strategy, and requires access to large number 341 of isolated B cells, especially when the target BCRs are expected to be present at low frequency naturally. In comparison, the approach described here relies on bioinformatic analysis of human 343 BCR sequences from either publicly available databases or simulated from models of VDJ 344 recombination, using sequence identification criteria determined experimentally. For a given 345 antibody-antigen pair, we first employed deep mutational scanning to determine the single 346 amino acid variants in the CDRH3 loop that maintained antigen binding. Based on the resulting 347 data, we then found human BCRs containing CDRH3 loops related to the target antibody and 348 that are expected to be recognized by a particular immunogen. This method allows the 349 identification of CDRH3 loops that are significantly different in sequence than that of the target 350 antibody, yet are still expected to be bound by the given immunogen. In order to sample the 351 immune repertoire at sequence depths beyond those available in databases of isolated human 352 BCRs, we demonstrated that IGoR simulations can accurately model natural CDRH3 loop 353 sequence diversity. These synthetic sequences can then be analyzed to determine the 354 frequency of rare BCRs that may be engaged by a target immunogen. With this approach, we 355 found that BCRs containing CDRH3 related to that of CH235 UCA, and that are predicted to be 356 engaged by the targeting immunogen CH505.M5.G458Y, are abundant in the natural repertoire. 357 Therefore, we expect that vaccination with CH505.M5.G458Y will engage and activate 358 CH235.12 precursors with diverse CDRH3 loops. In contrast, 10.17DT recognition of DH270.6 359 precursors is likely restricted to antibodies with CDRH3 loops that contain the same D gene and The CDRH3 centric approach used in this study has some limitations in estimating the 381 frequency of B cells targeted by a given immunogen. It is possible that some CDRH3 loops 382 identified as favorable by our method could be part of BCRs that contain other molecular 383 features that prevent immunogen binding, such as incompatible VH or VL gene segments. In 384 addition, the deep scanning mutagenesis approach evaluates the effect on immunogen binding 385 of only one amino acid change in the context of the WT CDRH3 of the target antibodies. 386 However, as shown in this study, many of the BCRs identified contain multiple amino acids 387 changes relative to the target CDRH3. These loops may contain novel structural and molecular 388 features where the additive effect of individual mutations may no longer predict the overall 389 binding propensity to the target immunogen. Nevertheless, in this study we were able to identify 390 and validate experimentally CDRH3 loops that showed binding to 10.17DT, yet had up to 45% 391 different amino acids than those of the DH270 UCA CDRH3. On the other hand, it is also 392 possible that our approach may omit BCRs that contain CDRH3 loops with slightly different immunogenetics than those of target antibody, but that could still be activated by the target 394 immunogens and subsequently mature into antibodies with the desired reactivity. For example, 395 it is possible that 10.17DT may recognize BCRs that contain CDRH3 loops of different lengths 396 than that of DH270 UCA and that these BCRs could affinity mature into antibodies with similar 397 function to those from the DH270 lineage either by acquiring indels or by yet unknown 398 evolutionary pathways. starting at a concentration of 100µg/mL. Binding was detected by incubation 672 with biotinylated antibody PGT151, that is specific Env trimers, followed by addition of 673 streptavidin-HRP The logarithm of the area under the curve (LogAUC) was calculated using Prism 8 Precursor Frequency and Affinity Determine B Cell Competitive 694 Fitness in Germinal Centers, Tested with Germline-Targeting HIV Vaccine Immunogens High-resolution description of antibody heavy-chain repertoires in humans The Mechanism and Regulation of Chromosomal V(D)J 700 Recombination An improved yeast transformation method for 702 the generation of very large human antibody libraries Analysis of a clonal lineage of HIV-1 envelope V2/V3 conformational 705 epitope-specific broadly neutralizing antibodies and their inferred unmutated common ancestors Staged induction of HIV-1 glycan-dependent broadly 709 neutralizing antibodies Commonality despite exceptional diversity 711 in the baseline human antibody repertoire Human Peripheral Blood Antibodies with Long 713 HCDR3s Are Established Primarily at Original Recombination Using a Limited Subset of Germline Genes Engineering human ACE2 to optimize binding to the spike protein of SARS coronavirus 2 Isolating and 719 engineering human antibodies using yeast surface display Design of Nanoparticulate Group 2 Influenza 722 Virus Hemagglutinin Stem Antigens That Activate Unmutated Ancestor B Cell Receptors of Broadly 723 Neutralizing Antibody Lineages A neutralizing antibody selected from plasma cells that binds to group 1 726 and group 2 influenza A hemagglutinins Large-scale sequence and structural comparisons of human 729 naive and antigen-experienced antibody repertoires Developmental pathway for potent V1V2-directed 732 HIV-neutralizing antibodies Anti-HIV-1 B cell responses are dependent on B cell 735 precursor frequency and antigen-binding affinity A stable trimeric influenza hemagglutinin stem as a 738 broadly protective immunogen Development of the Expressed Ig CDR-H3 Repertoire Is Marked by Focusing of Constraints in Length Amino Acid Use, and Charge That Are First Established in Early B Cell Progenitors Vaccine-Induced Antibodies that Neutralize Group 1 and 745 Group 2 Influenza A Viruses Neutralization-guided design of HIV-1 envelope trimers with high 748 affinity for the unmutated common ancestor of CH235 lineage CD4bs broadly neutralizing antibodies In vitro and in vivo functions of SARS-CoV-2 infection-enhancing and 752 neutralizing antibodies HIV-1 VRC01 Germline-Targeting Immunogens Select Distinct Epitope-755 Neutralizing Activity of Broadly Neutralizing anti-HIV-1 Antibodies 758 against Primary African Isolates High-throughput immune repertoire analysis with IGoR A broadly cross-reactive antibody neutralizes and protects against sarbecovirus 763 challenge in mice Chimeric spike mRNA vaccines protect against 766 Sarbecovirus challenge in mice The H3 loop of antibodies shows 768 unique structural characteristics Neutralizing antibody vaccine for pandemic and pre-emergent coronaviruses. 771 Targeted selection of HIV-specific antibody mutations by engineering B 774 cell maturation Recombination centres and the orchestration of V(D)J recombination. 776 Broadly protective human antibodies that target the active 779 site of influenza virus neuraminidase SARS-CoV-2 RBD antibodies that maximize breadth and 782 resistance to escape Deep Mutational Scanning of SARS-CoV-2 Receptor Binding 785 Domain Reveals Constraints on Folding and ACE2 Binding HIV Vaccine Design to Target Germline Precursors of 788 Glycan-Dependent Broadly Neutralizing Antibodies A generalized HIV vaccine design strategy for priming of broadly 791 neutralizing antibody responses Rapid selection of HIV envelopes that bind to neutralizing antibody B cell 794 lineage members with functional improbable mutations Induction of HIV Neutralizing Antibody Lineages in Mice with Diverse Precursor Repertoires Broad sarbecovirus neutralization by a human monoclonal antibody Broad neutralization coverage of HIV by multiple highly potent 803 antibodies Broad and potent neutralizing antibodies from an African donor reveal a 806 new HIV-1 vaccine target Broadly neutralizing human antibody that recognizes the 809 receptor-binding pocket of influenza virus hemagglutinin Vaccination 811 with soluble headless hemagglutinin protects mice from challenge with divergent influenza viruses Length distribution of CDRH3 in antibodies. Proteins: 814 Structure, Function, and Bioinformatics Hemagglutinin-stem nanoparticles generate 817 heterosubtypic influenza protection Expressed Murine and Human CDR-H3 Intervals of Equal Length Exhibit Distinct Repertoires that 820 Differ in their Amino Acid Composition and Predicted Range of Structures CDRH3s in the BCR database. Immunogens that can bind and tolerate alternative high 522 frequency amino acids at position 5 (e.g. Gly, Tyr, and Ser), at position 16 (e.g. Tyr, Gly, and 523Asp), and at position 17 (e.g. Tyr, Gly, and Ala) would greatly improve the overall frequency of 524 were generated without using the IGoR error model and thus represent unmutated sequences. 630As sequences were generated, we stored only CDRH3s that matched the criteria for being 631 DH270 UCA-like: same length, same D gene, same D gene reading frame and same D gene 632 position. This resulted in 5x10 12 total generated sequences with 3.3x10 11 CDRH3 sequences of 633 the same length as DH270 UCA and 1.15x10 9 CDRH3 sequences with the same length, same 634 D gene, same D gene reading frame and same D gene position.