key: cord-0957102-xyfvayfz authors: Waltari, Eric; Nafees, Saba; Wong, Joan; McCutcheon, Krista M.; Pak, John E. title: AIRRscape: an interactive tool for exploring B-cell receptor repertoires and antibody responses date: 2022-03-27 journal: bioRxiv DOI: 10.1101/2022.03.24.485594 sha: f6a9493f55aae46322d3afc27b0668924f3d066b doc_id: 957102 cord_uid: xyfvayfz The sequencing of antibody repertoires of B-cells at increasing coverage and depth has led to the identification of vast numbers of immunoglobulin heavy and light chains. However, the size and complexity of these Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) datasets makes it difficult to perform exploratory analyses. To aid in data exploration, we have developed AIRRscape, an R Shiny-based interactive web browser application that enables B-cell receptor (BCR) and antibody feature discovery through comparisons among multiple repertoires. Using AIRR-seq data as input, AIRRscape starts by aggregating and sorting repertoires into interactive and explorable bins of germline V-gene, germline J-gene, and CDR3 length, providing a high-level view of the entire repertoire. Interesting subsets of repertoires can be quickly identified and selected, and then network topologies of CDR3 motifs can be generated for further exploration. Here we demonstrate AIRRscape using patient BCR repertoires and sequences of published monoclonal antibodies to investigate patterns of humoral immunity to three viral pathogens: SARS-CoV-2, HIV-1, and DENV (dengue virus). AIRRscape reveals convergent antibody sequences among datasets for all three pathogens, although HIV-1 antibody datasets display limited convergence and idiosyncratic responses. We have made AIRRscape available on GitHub to encourage its open development and use by immuno-informaticians, virologists, immunologists, vaccine developers, and other scientists that are interested in exploring and comparing multiple immune receptor repertoires. Author Summary Technological advances in next generation sequencing have allowed for broad experimental sampling of immune repertoires, providing insight into how our immune system responds to infection, vaccination, autoimmunity, and cancer. The scale of these “big data”, however, make it difficult to bioinformatically extract the key sequence features that are shared across multiple repertoires. With AIRRscape, we enable large-scale immune repertoire visualization and analysis that requires no knowledge of the command line or advanced programming. By providing the community with an open-source, interactive, and user-friendly interface, we reduce the barriers to exploring immune repertoires at scale. We demonstrate the use of AIRRscape to characterize features of immune responses to viral infection that are shared across multiple repertoire datasets. Technological advances in next generation sequencing have allowed for broad 23 experimental sampling of immune repertoires, providing insight into how our immune system 24 responds to infection, vaccination, autoimmunity, and cancer. The scale of these "big data", 25 however, make it difficult to bioinformatically extract the key sequence features that are shared 26 across multiple repertoires. With AIRRscape, we enable large-scale immune repertoire 27 visualization and analysis that requires no knowledge of the command line or advanced 28 programming. By providing the community with an open-source, interactive, and user-friendly 29 interface, we reduce the barriers to exploring immune repertoires at scale. We demonstrate the 30 use of AIRRscape to characterize features of immune responses to viral infection that are 31 shared across multiple repertoire datasets. 32 33 Individual B-cell receptor (BCR) repertoires have exceptional diversity, estimated to be 36 greater than 10 9 in a single adult [1, 2] . Improvements in sequencing capability over the past 37 twenty years have allowed a wider sampling of BCR sequences to be uncovered [e.g. 3-5]. 38 These adaptive immune receptor response sequence (AIRR-seq) datasets can offer a complex 39 and information-rich glimpse into B-cell immune responses to vaccination and infection. For 40 example, with analyses of AIRR-Seq data comes an increasing recognition that convergence 41 among BCR repertoires, sometimes termed 'public' clonotypes [1, 2, 6, 7] , is important for 42 understanding the humoral immune response to natural infections and for improving vaccine 43 design. 44 45 The size of AIRR-Seq datasets can make it challenging to visualize the BCR sequence 46 features of individual repertoires, let alone of multiple repertoires concurrently. Several groups 47 have developed metrics to assess similarity among entire BCR repertoires [8, 9] , but few open-48 source tools enable facile, multi-dimensional visualization and exploration of these repertoires. 49 Notable examples of higher-level BCR repertoire visualizations (e.g. for features such as V-50 gene and J-gene usage, CDR3 length, and CDR3 amino acid sequence motifs) include Circos 51 plots [10], radial phylogenies [11] , and clouds summarizing clonotype networks [12] . While 52 powerful in offering a global overview of AIRR-Seq data, these methods are not amenable to 53 interactive exploration, in particular to uncover and display antibody convergence. 54 55 To enable simultaneous interactive visualization of multiple BCR repertoires and in-56 depth data exploration, we have developed an open-source tool called AIRRscape. Analysis 57 using AIRRscape begins with the generation of sequence feature heatmaps, which can span 58 both individual or combined AIRR-Seq datasets. Visual comparison of these heatmaps in their 59 entirety provides a simple and intuitive global overview of differences in three coupled AIRR-60 Seq dataset features: 1) V-and J-gene usage; 2) CDR length, and 3) either somatic 61 hypermutation (SHM) 140 AIRRscape is developed as an interactive web application using R Shiny. We use 141 AIRRscape to visualize the main features of BCR repertoire datasets from processed flat files. 142 To begin, researchers can explore a generated heatmap with either multiple panels of individual 143 BCR repertoires (Fig 2A) or a single panel combining multiple repertoires ( Fig 2B) . The x-axis of 144 the heatmap shows antibodies binned according to their assigned V-gene family + J-gene 145 germline, while the y-axis shows CDR3 length. For this visualization, the user selects a dataset 146 along with one of three parameters for coloring the bins: by percent of overall total antibody 147 sequences, by average SHM of the bin, or by maximum SHM of the bin. The user can then 148 hover over the paneled heatmaps to view an interactive popup displaying a bin's attributes (Fig 149 2B ). Clicking on a single bin or selecting multiple bins using a drawn rectangle produces a table 150 of antibodies and their features (Fig 3) . Below the table of selected antibodies, AIRRscape displays interactive topologies of 162 CDR3 amino acid sequence similarity (Fig 3) . Binning antibodies both by germline assignment 163 as well as CDR3 length enables examination of these CDR3 motifs in a phylogenetic 164 framework, since the CDR3 motifs will be 'aligned' when constrained to bins with a given CDR3 165 length. The labels of the topology tips conveniently display antibody names, assigned V-genes, 166 and CDR3 amino acid sequences. The major advantage of this approach is that thousands of 167 antibody CDR3 sequences can be visualized via a web browser for quick and easy exploration 168 by researchers without command-line expertise. 169 170 AIRRscape allows for multiple options to create CDR3 topologies. First, a user may 171 select a set of antibodies from the We first asked how representative the anti-SARS-CoV-2 antibodies are relative to 208 patient repertoires and relative to a healthy control repertoire, based on the heatmap 209 visualizations in AIRRscape. This analysis revealed that the overall pattern of heavy chain V-210 gene + J-gene usage versus CDRH3 length distribution does not differ among the CoV-AbDab 211 dataset, the four patient repertoires, and the healthy control repertoire (Fig 4) . The most 212 common V+J gene family assignments in all datasets were IGHV3+IGHJ4, IGHV3+IGHJ6, 213 IGHV4+IGHJ4, IGHV4+IGHJ6, IGHV1+IGHJ4 and IGHV1+IGHJ6. A notable visual difference 214 between the datasets was the presence or absence of sequences assigned to IGHV7 (Fig 4) , 215 which is not unexpected given that this gene 'family' consists of a single functional V-gene that 216 does not occur in all individuals [1] . SHM levels among the anti-SARS-CoV-2 antibodies (mean 217 2.3%) were lower than the four patient repertoires (overall mean 3.7%) and the healthy control 218 repertoire (mean 3.3%), although the patient from the Galson et al. study had noticeably higher 219 SHM levels (mean 5.5%). These data suggest that a majority of neutralizing antibodies against 220 SARS-CoV-2 do not diverge greatly from germline and therefore would be more easily and 221 broadly elicited among patients. 222 223 Next we used AIRRscape to visualize convergent clonotypes among anti-SARS- across all four bins was observed (Fig 5; S1 -3 Figs), with each containing 12-30 CDRH3 motifs 234 from the CoV-AbDab dataset; these motifs were found across seven or more unique studies. 235 Three of the bins show convergent motifs among multiple patient repertoires (Fig 5; S1 and S3 236 Fig) , with two also showing convergence among three of the four patient repertoires. 237 Furthermore, two of the bins show convergence to the healthy control repertoire (Fig 5; S1 Fig) . 238 The three convergent motifs seen in multiple COVID-19 patient repertoires were also found in 239 the dengue patient repertoires (see 3.3), although convergence across both CDRH3 and exact 240 V-gene usage was only observed for one motif. While SHM is not included in the CDRH3 241 topologies, examination of the convergent sequences indicated that they all show low SHM (0-242 4% 251 We collated anti-HIV-1 monoclonal antibodies from two databases and one study [ The heatmap visualizations in AIRRscape were used to compare published anti-HIV-1 260 antibodies to anti-SARS-CoV-2 antibodies and to two HIV-1 patient repertoires. While the 261 number of anti-HIV-1 antibodies was sparse relative to that of anti-SARS-CoV-2 antibodies, they 262 appeared to have similar heavy chain V-gene + J-gene usage (Fig 2A) Four of the bins showed limited convergence among HIV-1 datasets (Fig 6; S5 Fig) Fig 4) Fig 7) . We also found a previously unreported instance of 313 convergence in the Colombian dataset. CDRH3 motifs from the CF6 clonal family identified in 314 patient d20 were also found in the patient d13 bulk BCR repertoire, as well as in two patient 315 repertoires from the Nicaraguan bulk dataset ( The six common CDRH3 motifs reported in the Nicaraguan dengue patients were also 327 explored using AIRRscape. These six motifs group into two clusters by amino acid similarity. 328 We found that the first cluster is common among the Nicaraguan dengue cohort, occurring in 17 329 patients, but not seen in the Colombian dataset. Notably, we found that the second cluster was 330 convergent with the CF1 clonal family [19; S9 Fig] , demonstrating its frequent occurrence. 331 These data demonstrate how repertoire convergence in populations can occur at different 332 geographic levels, some more restricted than others. This could be due to regional differences 333 or similarities in factors such as infection or vaccine history, genetics, and environment. 334 335 The importance of understanding BCR repertoires has gained increasing appreciation, 338 especially as a result of the current COVID-19 pandemic. repertoires, particularly among individuals infected with a common antigen [2] . Such 353 convergence has great implications for understanding immune responses to antigens and 354 informing vaccine design. Therefore, a major aim of AIRRscape is to visualize related antibody 355 sequences, particularly from different repertoires, using the commonly accepted characteristics 356 of convergence, V-gene assignment, J-gene assignment, and CDR3 motif. We use phylogenetic 357 methods to more easily visualize and understand differences among the motifs, keeping in mind 358 that the visualizations are topologies of CDR3 motifs and not necessarily phylogenetic trees 359 indicating common ancestry, e.g. when multiple individual repertoires are compared. The 360 characteristics used to find convergence are the same as those used to define clonotypes, or 361 clonal clusters of antibodies within individuals. Given that no consensus on clonotype definitions 362 has been reached, with debate largely centered around thresholds of sequence identity in the 363 CDR3 motif, AIRRscape creates topologies based on CDR3 amino acid sequences while being 364 agnostic with respect to thresholds. Users may currently select one of four amino acid identity 365 thresholds, three of which are used in recent large-scale studies of BCR repertoires [1, 6, 7] . 366 AIRRscape also does not require a priori clustering of clonotypes in the input datasets. Users 367 with pre-defined clonal families can employ another tool (Olmsted: https://olmstedviz.org) to 368 generate such visualizations. 369 370 We validated the utility of AIRRscape by exploring datasets of both antibodies and 371 patient bulk BCR repertoires for three viral pathogens: SARS-CoV-2, HIV-1, and DENV. Among 372 the COVID-19 datasets, we first visually confirmed that the set of known anti-SARS-CoV-2 373 antibodies is broadly similar both to a healthy BCR repertoire and to a collection of COVID-19 374 patient repertoires, as measured by heavy chain V-gene + J-gene usage, CDRH3 length 375 distribution, and relatively low levels of SHM from germline. This is consistent with studies made 376 in the first months of the COVID-19 pandemic, which concluded that anti-SARS-CoV-2 377 antibodies could likely be induced by vaccines [41] . We then used AIRRscape to examine 378 convergence among anti-SARS-CoV-2 antibodies and COVID-19 patient BCR repertoires as 379 reported by multiple studies [14] [15] [16] . With an amino acid sequence identity threshold of 80%, 380 we find that the COVID-19 convergent motifs are indeed present in currently known anti-SARS-381 CoV-2 antibodies as well as in four COVID-19 patient bulk BCR repertoires. COVID-19 382 convergent motifs were also identified among repertoires of healthy controls and dengue 383 patients, which would not be expected to contain antibodies against SARS-CoV-2, but could be 384 explained by the relative proximity of the motifs to germline sequences and/or to infection with 385 other coronaviruses. Convergence of neutralizing antibody sequences among multiple COVID-386 19 repertoires is a strong indicator of similarity in SARS-CoV-2 immune responses and 387 suggests that vaccines eliciting these antibodies will be broadly effective [15, 42] . The discovery 388 that these antibody sequences are public (i.e. also identified in our dengue and healthy control 389 samples) suggests that there is a general pre-existing foundation in populations towards 390 COVID-19 protection. Such a primed immune repertoire bodes well for robust vaccine 391 responses. Repertoires of patients infected by different SARS-CoV-2 variants could be explored 392 using AIRRscape to find potential convergence among these datasets. Such convergence 393 would indicate antibody motifs with potential to neutralize the range of known SARS-CoV-2 394 variants, which could be prioritized for reverse vaccinology research or development. 395 396 In contrast to the COVID-19 datasets, the HIV-1 datasets are mostly idiosyncratic but do 397 show limited convergence. Using AIRRscape, we first visualized characteristics of anti-HIV-1 398 antibodies and eight HIV-1 patient bulk BCR repertoires. The heatmaps show high SHM, a 399 common feature of neutralizing antibodies against HIV-1. Notably, this collection of antibodies 400 displays much higher SHM levels than anti-SARS-CoV-2 antibodies. We then searched for 401 convergence in CDRH3 motifs between anti-HIV-1 antibodies and eight HIV-1 patient bulk BCR 402 repertoires, using a more permissive threshold of 70% CDR3 sequence identity. Data availability statement 466 All AIRRscape code and data loaded for visualization, as well as code for processing 467 similar datasets are available on a GitHub repository at https://github.com/czbiohub/AIRRscape. 468 469 Figures Fig 1. Workflow of repertoire data retrieval and processing. An 80% identity threshold is used to calculate convergence. Tips are colored by dataset source. Purple tips are published anti-COVID-19 antibodies from 7 different studies, dark gray tips are antibody sequences from a healthy donor BCR repertoire, and orange through brown shaded tips are antibody sequences from COVID-19 patient BCR repertoires. Fig 4) . (A) Convergent clonotypes to mAb 02-o in the 1_4_13 bin. (B) Convergent clonotypes to mAb 02-s in the 1_4_14 bin. (C) Convergent clonotypes to mAb HK20 in the 1_3_15 bin. A 70% identity threshold is used to calculate convergence. Tips are colored by dataset source. Purple tips are published anti-HIV-1 antibodies, while green shaded tips are antibody sequences from HIV-1 patient BCR repertoires. ). An 80% identity threshold is used to calculate convergence. Tips are colored by dataset source. Purple tips are plasmablast sequences reported by Zanini (2018) isolated from two Colombian patients (d13 and d20), blue tips are antibody sequences from the BCR repertoire of patient d13, and gold tips are antibody sequences from a cohort of Nicaraguan patient BCR repertoires. Commonality despite exceptional diversity in 471 the baseline human antibody repertoire Current strategies for detecting functional 474 convergence across B-cell receptor repertoires Analyzing Immunoglobulin Repertoires. Frontiers in 477 Immunology Immunoglobulin gene analysis as a 479 tool for investigating human immune responses Novel Approaches to Analyze Immunoglobulin Repertoires High frequency of 484 shared clonotypes in human B cell receptor repertoires Longitudinal Antibody Repertoire Sequencing Reveals the Existence of Public Antibody 488 Clonotypes in HIV-1 Infection Computational 491 Strategies for Dissecting the High-Dimensional Complexity of Adaptive Immune 492 Repertoires Individual 494 heritable differences result in unique cell lymphocyte receptor repertoires of naïve and 495 antigen-experienced cells The presence of CLL-associated stereotypic B cell receptors in the normal BCR 499 repertoire from healthy individuals increases with age Generation Sequencing of T and B Cell Receptor Repertoires from COVID-19 Patients 503 Showed Signatures Associated with Severity of Disease Lineage tracing of 506 human B cells reveals the in vivo landscape of human antibody class switching. Neher RA, 507 editor shiny: Web Application Framework for 509 R Deep sequencing of B cell receptor repertoires from COVID-19 patients reveals 512 strong convergent immune signatures Convergent 515 antibody responses to SARS-CoV-2 in convalescent individuals Structural basis of a shared 518 antibody response to SARS-CoV-2 Broadly neutralizing 521 human antibodies against dengue virus identified by single B cell transcriptomics. eLife Antibody Signatures in Human Dengue Virus-527 inclusive single-cell RNA sequencing reveals the molecular signature of progression to 528 severe dengue Reproducibility and Reuse of Adaptive Immune Receptor Repertoire Data. Frontiers in 531 Immunology Adaptive Immune Receptor Repertoire Community recommendations for sharing immune-534 repertoire sequencing data A 536 platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data 537 across federated repositories Observed Antibody 540 Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires Observed Antibody Space: A diverse database of 543 cleaned, annotated, and translated unpaired and paired antibody sequences Comprehensive mapping of immune perturbations associated with severe COVID-19 Human B Cell 549 Clonal Expansion and Convergent Antibody Responses to SARS-CoV-2. Cell Host & 550 Microbe 5′ Rapid Amplification of 552 cDNA Ends and Illumina MiSeq Reveals B Cell Receptor Features in Healthy Adults Adults With Chronic HIV-1 Infection, Cord Blood, and Humanized Mice. Frontiers in 554 Immunology Functional 556 enrichment and analysis of Antigen-Specific memory B cell antibody repertoires in PBMCs. 557 Frontiers in immunology Maturation and 559 Diversity of the VRC01-Antibody Lineage over 15 Years of Chronic HIV-1 Infection CoV-AbDab: the coronavirus antibody 562 database Antibodies and T Cell Receptors in the Immune Epitope Database CATNAP: a tool to 567 compile, analyze and tally neutralizing antibody panels Allelic Frequency and CDRH3 Region Limit the Engagement of HIV Env Immunogens by 571 Putative VRC01 Neutralizing Antibody Precursors 574 pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte 575 receptor repertoires Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing 579 data Cutting Edge: Ig H Chains Are Sufficient to Determine Most B Cell 581 Clonal Relationships Diversity in the CDR3 Region of VH Is Sufficient for Most Antibody 584 Phylogeny estimation: traditional and Bayesian approaches Broad 588 diversity of neutralizing antibodies isolated from memory B cells in HIV Recent progress in broadly neutralizing antibodies to HIV Longitudinal 593 Isolation of Potent Near-Germline SARS-CoV-2-Neutralizing Antibodies from Exploiting B Cell Receptor Analyses to 596 Inform on HIV-1 Vaccination Strategies Multidonor Analysis Reveals 599 Structural Elements, Genetic Determinants, and Maturation Pathway for HIV-1 600 Neutralization by VRC01-Class Antibodies Structural basis for germ-line gene 603 usage of a potent class of antibodies targeting the CD4-binding site of HIV Antigenic landscape of the HIV-1 envelope and new immunological 606 concepts defined by HIV-1 broadly neutralizing antibodies. Current Opinion in Immunology The Repertoire Dissimilarity Index as a 609 method to compare lymphocyte receptor repertoires 612 sumrep: A Summary Statistic Framework for Immune Receptor Repertoire Comparison 613 and Model Validation CoV-2 neutralizing antibody structures inform therapeutic strategies 618 Structures of Human Antibodies Bound to SARS-CoV-2 Spike Reveal Common Epitopes 619 and Recurrent Features of Antibodies A bioinformatic framework for 622 immune repertoire diversity profiling enables detection of immunological status An 80% identity threshold is used to calculate convergence. Tips 669 are colored by dataset source. Purple tips are plasmablast sequences reported by Zanini (2018) 670 isolated from two Colombian patients (d13 and d20), blue tips are antibody sequences from the 671 BCR repertoire of patient d13