key: cord-0262145-vw5vtwf3 authors: Safonova, Yana; Shin, Sung Bong; Kramer, Luke; Reecy, James; Watson, Corey T.; Smith, Timothy P.L.; Pevzner, Pavel A. title: Revealing how variations in antibody repertoires correlate with vaccine responses date: 2021-08-07 journal: bioRxiv DOI: 10.1101/2021.08.06.454618 sha: f838a76a0d2abd7566b0be3e5d9230db512f00cf doc_id: 262145 cord_uid: vw5vtwf3 An important challenge in vaccine development is to figure out why a vaccine succeeds in some individuals and fails in others. Although antibody repertoires hold a key to answering this question, there have been very few personalized immunogenomics studies so far aimed at revealing how variations in immunoglobulin genes affect a vaccine response. We conducted an immunosequencing study of 204 calves vaccinated against bovine respiratory disease (BRD) with the goal to reveal variations in immunoglobulin genes and somatic hypermutations that impact the efficacy of vaccine response. Our study represents the largest longitudinal personalized immunogenomics study reported to date across all species, including humans. To analyze the generated dataset, we developed an algorithm for identifying variations of the immunoglobulin genes (as well as frequent somatic hypermutations) that affect various features of the antibody repertoire and titers of neutralizing antibodies. In contrast to relatively short human antibodies, cattle have a large fraction of ultralong antibodies that have opened new therapeutic opportunities. Our study revealed that ultralong antibodies are a key component of the immune response against the costliest disease of beef cattle in North America. The detected variants of the cattle immunoglobulin genes, which are implicated in the success/failure of the BRD vaccine, have the potential to direct the selection of individual cattle for ongoing breeding programs. The challenge of identifying variations in immunoglobulin genes that affect vaccine response. 28 Although vaccination is a primary tool to control the spread of viral and bacterial diseases, the success of 29 vaccines at the population level does not always translate to protection at the individual level. Antibodies are not encoded in the germline genome but rather result from somatic genomic recombinations 37 called VDJ recombinations (Tonegawa, 1983) . This process affects an IG locus containing the families of 38 the variable (V), diversity (D), and joining (J) genes (referred to as IG genes) by selecting one V, one D 39 gene, and one J gene, and concatenating them together to generate one of the antibody chains. Further 40 diversity of antibodies is generated by the class-switch recombination and somatic hypermutations (SHMs) 41 (Dudley et al. 2005 ). There are three types of IG loci in mammalian species (including cows): heavy chain, 42 kappa light chain, or lambda light chain. In this work, we focus on the heavy chain (IGH) locus only. 43 The expression quantitative trait loci (eQTL) analysis links variation in gene expression to genotypes. 45 Although eQTL analysis has greatly contributed to the dissection of the genetic basis of disease and vaccine 46 response (Franco et al., 2013 , Bhalala et al., 2018 , the IG loci remain virtually untouched by eQTL studies 3 (Watson et al., 2017) . eQTL studies usually start from generating an n×m genotype matrix that contains 48 information about each of m markers (e.g., SNPs) in each of n individuals and an n×k phenotype 49 (expression) matrix that contains information about expression levels of each of k genes in each of n 50 individuals. Generating analogs of the genotype and phenotype matrices in immunogenomics studies is a 51 more complex task than in traditional eQTL studies. 52 53 First, while the set of genes in eQTL studies is fixed and shared by all individuals, the antibody repertoire 54 is composed of a virtually unlimited set of proteins, and there are typically few antibodies shared between 55 any two individuals. Thus, given an antibody repertoire represented as a repertoire sequencing (Rep-seq) 56 dataset, it is not clear how to define the phenotype matrix. One possibility is to consider each germline gene 57 in the IG locus (e.g., a V gene, a D gene, or a J gene) and to define the usage of this gene as the fraction of 58 antibodies that originated from this gene among all antibodies in the Rep-Seq dataset. Our goal is to identify 59 Analysis of Rep-seq datasets collected at the time point "-3" revealed that each V gene but IGHV1-7 has 148 positive or negative correlation of usages with other IGHV genes. Figure 1C shows that V genes form three 149 groups based on their usage, consisting of seven positively correlated genes: G1=(IGHV1-10, IGHV1-14), 150 G2=(IGHV1-17, IGHV1-20, IGHV1-30), and G3=(IGHV1-21, IGHV1-27). IGHV1-7 is the only gene with 151 an independent usage profile. These correlations are consistent across time points "0", "+3", and "+6". We 152 conjecture that this is explained by an association of IGHV1-7 with ultralong CDR3s and their special role 153 in the adaptive immune response. Figure 1C illustrates that the usages in groups G1, G2, and G3 154 anticorrelate. 155 156 BRD vaccination triggers the increased production of ultralong antibodies. Figure 1D shows that for 157 all high-usage cattle V genes, except for IGHV1-7, the mean CDR3 length does not exceed 75 nt. It also 158 illustrates that, unlike other V genes, IGHV1-7 has a bimodal distribution of CDR3 lengths, and the second CDR3s in all CDR3s (across all V genes) at time points "+3" and "+6" (P=3.61×10 -107 ). Figure 1F shows 166 that the fraction of ultralong CDR3s in all CDR3s derived from IGHV1-7 also increases after vaccinations 167 (P=1.19×10 -119 ). These observations suggest that the vaccination triggers the production of antibodies 168 derived from IGHV1-7. Antibody titers correlate with fractions of ultralong CDR3s. Some calves have pre-existing immunity 182 because they either were previously exposed to the BRD-causing virus or have maternal antibodies specific 183 to BRD. This pre-existing immunity (as well as cross-reactivity of antibodies) may affect titers at the initial 184 time point "-3". Downey et al., 2013 demonstrated that the decay rate of maternal antibodies is rather low 185 and that there is a threshold effect: the calves do not respond to the vaccine if the level of maternal antibodies 186 exceeds a threshold and only respond when this level drops. Also, the impact of calf age on antibody 187 response (titers) to BRD was previously shown to be insignificant (Kramer et al., 2017) . 188 189 Figure 2A shows that, on average, the booster vaccination increased neutralizing antibody titers. Figure 2B 190 shows the Pearson correlation r between antibody titers at four time points and illustrates that they correlate 191 at points "-3" and "0" (r=0.78, P=5.83×10 -43 ), "-3" and "+3" (r=0.43, P=1.53×10 -10 ), and "0" and "+3" 192 (r=0.43, P=2.29×10 -10 ). In contrast, antibody titers from time points "-3" and "0" anticorrelate with final 193 titers at the time point "+6" (r=-0.42, P=3.34×10 -10 and r=-0.4, P=1.78×10 -9 , respectively). This suggests 194 that pre-existing immunity to BRD antigens may be suboptimal, preventing development of a successful 195 immune response to the BRD vaccine. Impacts of suboptimal antibody responses caused by pre-existing 196 immunity were reviewed by Zimmermann and Curtis, 2019 (for various antigens) and Iwasaki and Yang, 197 10 2020 (for SARS-CoV-2) and discussed in Supplemental Note "The relations between pre-and post-198 vaccination immunity to the BRD vaccine in calves". 199 200 We have not found any statistically significant correlations between the titers and the usages of all high-201 usage V genes, except for IGHV1-7. Both the usage of IGHV1-7 ( Figure 2C ) and the fraction of ultralong 202 CDR3s ( Figure 2D • Finding germline and somatic variations (GSVs). 220 • Generating and clustering a genotype matrix to reveal subjects with common genotypes. 221 • Finding statistically significant genotype-phenotype associations. 222 • Identifying the most consequential GSVs with respect to phenotypes. 223 Below we applied IgQTL to reveal the genotype-phenotype associations in the cattle immunosequencing 224 study. • if a subject is homozygous by N2 (i.e., N1 is substituted by N2 in the germline), we expect that 254 fN1~0 and fN2~1. 255 • if a subject is heterozygous by N1/N2, we expect that fN1~0.5 and fN2~0.5. 256 • if the germline nucleotide N1 is replaced by a frequent SHM represented by N2 (with frequency at 257 least 50%), we expect that fN1 ≤ freq. 258 We classified a position P in a gene G as a GSV if fN1 ≤ freq for at least one subject. 259 260 Each GSV (represented by a position P in a gene G, and nucleotides N1 and N2) was encoded as (P, G, 261 N1/N2). Figure 3B Generating the genotype matrix. For each GSV (P, G, N1/N2) in each animal, IgQTL computes the R-286 ratio as R = fN1 / (fN1 + fN2). The R-ratio represents a more flexible and expressive alternative to the 287 conventional binary description of SNP states (e.g., A/A or A/C) because it enables description of SHMs 288 and their relative abundance. The R-ratios also distinguish subjects that are heterozygous by the same pair 289 of alleles but have different expression profiles for these alleles (e.g., 80%-20% vs 50%-50%). to the IGenotype matrix. Iterative k-means clustering of the first two principal components with k from 2 299 to 10 followed by the elbow method (Thorndike, 1953) reveals that k=3 provides the optimal decomposition 300 of 204 animals ( Figure 4B , Figure S2 ). Although decompositions into more clusters resulted in similar 301 values of inertia ( Figure S2 ), we focused our analysis on k=3 because it simplified further statistical analysis 302 and allowed us to apply popular statistical methods such as the Kruskal-Wallis test (Kruskal and Wallis, 303 1952). 304 305 We say that the computed clusters are associated with the R-ratios (or usages/titers/fractions of ultralong 306 CDR3s) if the differences between distributions of R-ratios (or usages/titers/fractions of ultralong CDR3s) 307 across these clusters are statistically significant. The computed clusters C1, C2, and C3 are associated with 308 the R-ratios for 47 out of 52 GSVs, including 16 out of 17 known germline variations ( Figure S3 ). We thus 309 16 conclude that the decomposition of 204 calves into clusters C1-C3 is driven by multiple linked GSVs 310 (GSVs with correlated R-ratios) that represent common genotypes of V genes. 311 312 GSVs explain variance in usages of V genes and antibody titers. Clusters C1-C3 are associated with 313 usages of all highly-used V genes except for IGHV1-7 ( Figure 4C, Figure S4 ). Figure 4D shows that the 314 clusters are also associated with antibody titers collected at time point "+6": cluster C1 has higher antibody 315 titers compared to clusters C2 and C3 with P=0.017 according to the Kruskal-Wallis test. This observation 316 suggests that IGenotypes of V genes are associated with the response to the BRD vaccination. Antibody 317 titers collected at three other time points do not have statistically significant associations with clusters C1-318 C3. 319 320 Since the fractions of ultralong antibodies are not associated with clusters C1-C3 ( Figure S5 ), we 321 hypothesize that generation of ultralong antibodies is not specific to genotypes described by the revealed 322 clusters but rather is a general feature of cattle antibody repertoires. However, our analysis revealed subtle 323 correlations between genotypes and some features of ultralong CDR3s. Figure 4E shows that clusters C1-324 C3 partially explain the variance in Figure 2D : the fraction of ultralong CDR3s among all CDR3s at the 325 time point "+3" positively correlates with titers at the time point "+6" only for animals from the cluster C1 326 (r=0.32, P=0.0072). Similar correlations do not exist and are not statistically significant for clusters C2 and 327 C3 (r=0.12 and r=0.10, respectively). We thus assume that ultralong CDR3s from the cluster C1 work better 328 in response to the BRD vaccine. 329 330 Figure 4F and G show that animals from the cluster C1 are characterized by a lower initial fraction of 331 ultralong CDR3s (in all CDR3s derived from IGHV1-7) and a lower fraction of ultralong CDR3s with six 332 cysteines (important for knob formation) as compared to animals from clusters C2 and C3 (P-values for the 333 cluster variable are 3.96×10 -4 and 2.63×10 -5 , respectively). Since the initial number of cysteines in ultralong 334 CDR3s is four (Wang et al., 2013) , a higher number of cysteines suggests that ultralong CDR3s of animals 335 from clusters C2 and C3 underwent more extensive affinity maturation before the BRD vaccination 336 compared to animals from the cluster C1. Since the cluster C1 is associated with higher titers after the 337 second BRD vaccination, we extend the hypothesis about pre-existing immunity and suggest that it might 338 partially consist of mature ultralong CDR3s (with six cysteines) generated before the vaccinations in 339 animals from clusters C2 and C3. However, since titers at time points "-3" and "+6" are anticorrelated in 340 all clusters ( Figure S6 ), we also suggest that mature ultralong antibodies might be not the only component 341 of the pre-existing immunity. Further exploration of cattle antibody repertoires would help to understand 342 the origin of the pre-existing immunity (e.g., maternal antibodies or microbiota) and its impact on the BRD 343 vaccination. GSV with the next most significant association was GSV (144, IGHV1-17, C/T), but this significance was 369 many orders of magnitude lower (P=0.0016). The closest IGHV1-7 containing GSV in terms of significance 370 was GSV (71, IGHV1-7, C/T) which also has association that is orders of magnitude lower (P=0.0002). 371 Thus, GSV (148, IGHV1-7, A/G) is unique since its association P-values are many orders of magnitude 372 lower than association P-values of all other GSVs. 373 374 This GSV (that we refer to as G148A for brevity) also has the most significant association with the usage 375 of IGHV1-7 in the combined dataset (P=6.11×10 -17 , the linear regression model): the higher is the fraction 376 of nucleotide A at position 148 of IGHV1-7, the higher is the usage of IGHV1-7 ( Figure 5C ). Figure 5D 377 shows that R-ratios of the GSV G148A grow after both vaccinations (P=3.36×10 -204 ). This position was not 378 previously identified as a germline variation ( Figure 3B ), the known alleles of IGHV1-7 have a nucleotide 379 G at position 148 classified as the second popular nucleotide N2 by our analysis (Table S2 ). The R-ratios 380 20 of GSV G148A are not associated with clusters C1-C3, suggesting that the GSV is not linked with 47 GSVs 381 which R-ratios are associated by clusters C1-C3 ( Figure S3 ). The R-ratios for this GSV varies from 0.31 to 382 0.87, indicating that both nucleotides A and G are always present in Rep-seq reads in each animal (Figure 383 S9A ). Figure S9B also shows that R-ratios of the GSV grow similarly for clusters C1-C3. We thus assume 384 that GSV G148A is a frequent SHM that is often selected after vaccinations and is important for generating 385 ultralong antibodies. IGHV1-69 polymorphism modulates anti-influenza antibody repertoires, correlates with IGHV utilization 579 shifts and varies by ethnicity The protein 581 data bank Identification of expression 583 quantitative trait loci associated with schizophrenia and affective disorders in normal brain tissue Automated analysis of immunosequencing 586 datasets reveals novel immunoglobulin D genes across diverse species Broadly Neutralizing Bovine Antibodies: Highly Effective New Tools against 588 Evasive Pathogens? Viruses I-Mutant2. 0: predicting stability changes upon mutation from the protein 590 sequence or structure Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity Immunogenetic factors driving formation of ultralong VH CDR3 in Bos taurus antibodies Structural Diversity of Ultralong CDRH3s in Seven Bovine 598 An evaluation 600 of circulating bovine viral diarrhea virus type 2 maternal antibody level and response to vaccination in Angus calves Mechanism and control of V(D)J recombination versus class 603 switch recombination: similarities and differences Department of Agriculture. Cattle & Beef: Sector at a Glance A defective Vkappa A2 allele in Navajos which may 607 play a role in increased susceptibility to haemophilus influenzae type b disease Integrative genomic analysis of the human immune 610 response to influenza vaccination Automated analysis of high-throughput B-cell sequencing 612 data reveals a high frequency of novel immunoglobulin V gene segment alleles Kleinstein 615 SH. 2019. Identification of Subject-Specific Immunoglobulin Alleles From Expressed Repertoire Sequencing Data Diversity in the cow ultralong CDR H3 antibody repertoire Understanding research methods and statistics: An integrated introduction for psychology The potential danger of suboptimal antibody responses in COVID-19 Sequences of Immunoglobulin Chains: Tabulation Analysis of Amino Acid 624 Sequences of Precursors, V-regions, C-regions Evaluation of responses to vaccination of Angus cattle for four 626 viruses that contribute to bovine respiratory disease complex Use of Ranks in One-Criterion Variance Analysis Paratome: an online tool for systematic identification of antigen-binding 629 regions in antibodies based on sequence or structure Application of circular consensus sequencing and network analysis to characterize 631 the bovine IgG repertoire Vaccine genetics 633 of IGHV1-2 VRC01-class broadly neutralizing antibody precursor naïve human B cells IMGT 635 unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains IMGT, the international ImMunoGeneTics information system Structural 641 and genetic basis for development of broadly neutralizing influenza antibodies Internal 643 Duplications of DH, JH, and C Region Genes Create an Unusual IgH Gene Locus in Cattle Handbook of biological statistics Immunoglobulin germline gene variation and its impact on human disease Distinct antibody species: structural differences creating therapeutic 650 opportunities Association between a common immunoglobulin heavy chain allele and rheumatic 654 heart disease risk in Oceania IgRepertoireConstructor: a novel algorithm for antibody repertoire construction and 657 immunoproteogenomics analysis De novo inference of diversity genes and analysis of non-canonical V (DD) J 659 recombination in immunoglobulins Reconstructing Antibody 661 Repertoires from Error-Prone Immunosequencing Reads Rapid elicitation of broadly neutralizing antibodies to HIV by immunization in cows Conservation and diversity in the ultralong third heavy-chain 666 complementarity-determining region of bovine antibodies The epidemiology of bovine respiratory disease: 668 what is the evidence for preventive measures? Germline V-genes sculpt the binding 670 site of a family of antibodies neutralizing human cytomegalovirus Who belongs in the family? Somatic generation of antibody diversity Exceptionally long CDR3H are not isotype restricted in bovine 674 immunoglobulins Reshaping antibody diversity The individual and population genetics of antibody immunity Immunologic basis for long HCDR3s in broadly neutralizing antibodies against HIV-1. Front 680 Immunol Factors that influence the immune response to vaccination The role of the GSV G148A in ultralong CDR3s. The most abundant nucleotide N1=A of the GSV 392 G148A replaces the germline amino acid Gly (encoded by the codon GGT) with the amino acid Ser 393 (encoded by codon AGT) at amino acid position 50. This position represents the last amino acid of the 394 second framework region (FR2) according to the IMGT notation (Lefranc et al., 2003) but is classified as 395 a CDR2 position according to Kabat (Kabat et al, 1979) and Paratome (Kunik et al., 2012) Figure S10 ). We further applied the I-Mutant2.0 tool (Capriotti et al., 2005) to analyze the 407 effect of substitutions at this position on the stability of the analyzed antibodies. I-Mutant2.0 generated 408 prediction for only 10 out of 13 analyzed antibodies (three structures were processed with errors) and it 409 turned out that substitution of Ser by the germline amino acid Gly decreases antibody stability for all ten of 410 them ( Figure 6B ). We thus assume that amino acid Ser at position 50 is critically important for maintaining 411 the structure of ultralong antibodies. viruses associated with bovine respiratory disease as described (Kramer et al., 2017) . A second booster 515 vaccination was applied three weeks later. Bovine whole blood was collected and stored in Tempus TM 516 blood RNA tubes (Applied Biosystems, Foster City, CA). Collections occurred at four-time points; three 517 weeks prior to vaccination, at vaccination, three weeks post-vaccination (at booster vaccination), and 3 518 weeks after booster vaccination. RNA was isolated using the Tempus TM spin RNA isolation kit as 519 recommended by the manufacturer (Applied Biosystems, Foster City, CA). Antibody titers were quantified 520 as previously describe (Kramer et al., 2017) . 521 522 Repertoire sequencing. The RNA was converted to cDNA with the Clontech SMARTer kit (Takara Bio 523 USA; Mountain View, CA) using the modified (Switching Mechanism At 5' End of RNA Transcript) PCR 524 cDNA synthesis protocol (Clontech) and oligonucleotides as described below (Table S3) Competing Interest Statement. The authors declare that the research was conducted in the absence of any 564 commercial or financial relationships that could be construed as a potential conflict of interest. 565 566 Acknowledgements. We are grateful to Vaughn Smider for fruitful discussions and thoughtful comments. 567We also thank Kristen Kuhn, Jacky Carnahan, and William Thompson for technical support. Mention of 568 trade names or commercial products in this publication is solely for the purpose of providing specific 569 information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. 570 USDA is an equal opportunity provider and employer. 571