key: cord-0427188-anx0z8rz authors: Venkataraman, Thiagarajan; Valencia, Cristian; Mangino, Massimo; Morgenlander, William; Clipman, Steven J.; Liechti, Thomas; Valencia, Ana; Christofidou, Paraskevi; Spector, Tim; Roederer, Mario; Duggal, Priya; Larman, H. Benjamin title: Antiviral Antibody Epitope Selection is a Heritable Trait date: 2021-03-26 journal: bioRxiv DOI: 10.1101/2021.03.25.436790 sha: 3bcf635c8d977011b061c787bcfdb5b494d98eaa doc_id: 427188 cord_uid: anx0z8rz There is enormous variability in human immune responses to viral infections. However, the genetic factors that underlie this variability are not well characterized. We used VirScan, a high-throughput viral epitope scanning technology, to analyze the antibody binding specificities of twins and SNP-genotyped individuals. These data were used to estimate the heritability and identify genomic loci associated with antibody epitope selection, response breadth, and the control of Epstein-Barr Virus (EBV) viral load. We identified 4 epitopes of EBV that were heritably targeted, and at least two EBNA-2 binding specificities that were associated with variants in the MHC class-II locus. We identified an EBV serosignature that predicted viral load in white blood cells and was associated with genetic variants in the MHC class-I locus. Our study provides a new framework for identifying genes important for pathogen immunity, with specific implications for the genetic architecture of EBV humoral responses and the control of viral load. Antiviral antibody responses can last decades after an infection or immunization. 1,2 They serve as protection from re-infection and can document the exposure history of an individual or population. It has been known for over 50 years that the composition of circulating immunoglobulin (Ig) is influenced by host genetics. [3] [4] [5] [6] Twin, family, and population-based studies have provided examples of heritable contributions to antiviral immune responses; candidate gene and genome-wide association studies (GWAS) have identified genomic loci that influence specific immune traits. However, few studies have examined the heritability of adaptive immune responses broadly across different types of viruses. [7] [8] [9] [10] [11] A recent study investigating the genetic determinants of anti-viral antibody responses (measured by a multiplex serological assay) to 16 common viruses identified strong associations in the HLA locus and in 7 loci outside the HLA. 12 However, to our knowledge there have been no genetic studies of antibody epitope selection. Antibody fine specificity and breadth (polyclonality) can influence pathogen clearance and protection from re-infection. Genetic variation affecting the expression or function of viral sensing, innate immune signaling, antigen processing and presentation, immune cell function or variation in the antibody locus itself, could all impact the breadth and specificity of an anti-viral antibody response. seroprevalence in most adult populations. [13] [14] [15] The EBV genome is relatively large (172 kb), encoding >85 proteins. 16, 17 Primary EBV infections are usually asymptomatic in healthy children and result in a brief episode of infectious mononucleosis in adolescents and older individuals. 18 Once the primary infection resolves, EBV establishes a lifelong latent infection residing in circulating memory B-cells. 14 There is variability in circulating viral load in healthy individuals 19, 20 and the host genes involved in viral load control have not been elucidated. Moreover, Epstein-Barr nuclear antigen 1 (EBNA-1) is an important marker of EBV latency and detection of anti-EBNA-1 antibodies by enzyme-linked immunosorbent assay (ELISA) is often used as an indication of EBV infection. Titers of anti-EBNA-1 antibody in serum of healthy individuals have previously been linked to the HLA-DRB1 and HLA-DQB1 genes from the HLA locus (class II region). 8 However, the genetic architecture of antibody reactivity to the remainder of the EBV proteome has not be elucidated. EBV infection can cause nasopharyngeal cancer, Burkitt's lymphoma, Hodgkin's lymphoma, and gastric adenocarcinoma, with elevated risk linked to the HLA locus (class I region). 21, 22 Infection with EBV is also believed to play a role in the development of lupus, [23] [24] [25] [26] [27] multiple sclerosis (MS), 28,29 and other autoimmune diseases, with variants in the HLA locus modulating the risk. 30 We therefore assessed the genetics of antibody responses against EBV using Phage ImmunoPrecipitation Sequencing ("PhIP-Seq") with the complete human virome ("VirScan") 31-33 , in combination with the Anti-Viral Antibody Response Deconvolution Algorithm (AVARDA) 34 . In brief, library scale oligonucleotide synthesis was used to express 106,678 overlapping 56-amino acid peptides spanning all known human viral proteins, in a phage display format, covering ~400 viral species and strains. Phage clones immunoprecipitated by serum antibodies were quantified via Illumina sequencing. We and others have used the VirScan library to characterize the antibody repertoires of preterm neonates 35 , assess viral antibodies after solid organ transplant 36 34 , which calculates likelihoods of viral infections by considering potential cross-reactivities among the peptides and between viruses, and the amount of independent evidence supporting an antibody response to each virus represented in the library. In this study we find that antibody epitope selection is largely a heritable trait, and that class II major histocompatibility (MHC) molecules shape antibody recognition of EBV. We also identified an EBV serosignature that predicts EBV viral load in the periphery and associates with the MHC class I region of HLA. Anti-viral antibody breadth and epitope reactivities are heritable. We used a twins study design to characterize the heritability of VirScan antibody binding specificities. We profiled sera of 494 twins (332 DZ and 162 MZ) from the TwinsUK cohort against the VirScan library. Peptide reactivity scores (z-scores) were calculated by comparing each sample to a set of negative control mock immunoprecipitations (IPs), which did not include serum, and were included on the same plate (Fig. 1a) . 33, 42 For each virus, we used AVARDA 34 to calculate the antibody response breadth (polyclonality), and identified seropositive concordant twin pairs. To quantify the contribution of genetic and environmental effects we determined the additive genetic (A), common (C), and unique environmental effects (E) using SEM analysis applied to 446 immunodominant peptide reactivities and the overall breadth of the response to 43 viruses from 9 genera. For each peptide, an individual with a z-score greater than or equal to 7 was classified as a responder and less than or equal to 3 as a non-responder. All values between 3 and 7 were treated as missing data. Only dominant peptides (defined here as reactive in >20% of AVARDA-positive individuals) were included in the analysis; data are presented at the genus level to avoid ambiguity associated with cross-reactivity among viral species with high levels of sequence homology ( Table 1) . The mean breadth values ranged from ~10 peptides up to ~44 peptides with heritability estimates ranging from 5% up to 57%. The virus with the greatest number of dominant peptides, Epstein-Barr Virus (EBV, lymphocryptovirus genus), exhibited heritability ranging from 25% to 89% across 107 peptides. Figure 1 . EBV epitope selection and antibody response breadth are heritable traits. a, Schematic depicting the heritability analyses. VirScan of the TwinsUK cohort (n = 494; MZ = 81 and DZ = 166 pairs; A upper panel) was used to generate a matrix of peptide reactivity scores. The enrichment scores were used to calculate the Jaccard index and response breadth using AVARDA. b, The Jaccard similarity indices of DZ twin pairs (red circles), MZ twin pairs (green circles) or random pairings (blue circles) are shown above. The size of each circle is proportional to the total number of EBV peptides that both twins recognize. cd, Dot plots showing the correlation of EBV response breadth between MZ and DZ twin pairs. Square of the Pearson's correlation (R 2 ) value is provided for each group. EBV breadth and epitope selection are heritable traits. We also performed VirScan analysis of serum from a cohort comprised of 506 community volunteers who were also SNP genotyped, of which 388 were of European ancestry (EUR). Fig. S1 and S2 compare the antibody reactivities of the TwinsUK and the VRC cohorts against the 2,180 EBV peptides in the VirScan library. Immunodominant reactivities were largely restricted to specific regions of the EBV genome ( Fig. S1a-b) and were mostly localized to the EBNA family of proteins, the transcriptional activator BZLF1 and the envelope glycoprotein BLLF1 (Fig. S1c) . About 2,000 sub-dominant responses (present in less than 20% of the cohort) were distributed across the EBV genome ( Fig. S2a-b) , with the greatest number of reactivities against the EBNA proteins, BZLF1, BLLF1, the lytic factor LF3 and latent membrane protein 1 (LMP1) (Fig. S2c) . These results illustrate that while there are shared features of the anti-EBV antibody response between individuals, there is also great inter-individual variability. In EBV seropositive individuals, we observed an average of 129 reactive peptides (~6% of the 2,180 EBV peptides in the VirScan library), with a range of 12 to 324. Among these reactive peptides, we examined the similarity of the anti-EBV profiles between twin pairs using the Jaccard index (Fig. 1b) . The similarity of the response among MZ twin pairs was significantly higher compared to their DZ counterpart (p=4.2x10 -6 , much higher than random pairings, p=3.6x10 -14 ), illustrating that host genetics sculpts the repertoire of anti-EBV antibodies. EBV breadth values among twin pairs, estimated by AVARDA, are shown in Fig. 1c Genome-wide association studies of EBV peptide reactivities. Single-variant association analyses were performed on the 57 selected peptides and the p-value threshold for significance was adjusted by the Sidak-Nyholt method to account for multiple hypothesis testing (Methods). The Human Leukocyte Antigen (HLA) locus on chromosome 6 was associated (p ≤ 1x10 -9 ) with four EBNA2 peptides (Fig. 3a) . A meta-analysis including the VRC/EUR, VRC/AFR and the TwinsUK cohorts also confirmed the associations in this locus ( Table 2) . These four peptides are in the C-terminal transactivating domain of EBNA-2 ( search of the GWAS catalog showed that the same potentially causal variants identified in this study have also been associated with several different diseases and phenotypes, some with a known or suspected role for EBV infection, including Hodgkin's lymphoma 43 and ulcerative colitis 44 . A summary of these associations is provided in Table S1 . Antibody correlates of EBV viral load. Despite significant inter-individual variability, cell-associated EBV genomic copy number is considered to be relatively stable over time, reported at 1-30 genomes per million PBMCs. 19, 20 In an effort to understand antibody and host genetic factors that control the EBV copy number set point, we measured PBMC-associated genome copies by qPCR (Fig. 4a) . This assay, established by Tsai et al, is sensitive enough to detect a single copy of EBV genome in 10 5 PBMCs. 45 Only about one third of the VRC cohort had a measurable EBV viral load, with a majority of positives having 5 copies or less per 10 5 PBMCs (Fig. S5a) . We found no correlation between EBV viral load and other covariates such as age, ancestry, and sex ( Fig. S5b-c) . We performed genome-wide association using viral load data both as a continuous trait and a dichotomous variable (detected or undetected) to identify host genetic factors that control viral load, but no significant associations were identified. This lack of genetic association is likely due to a combination of a lack of statistical power to detect small genetic effects and an inability to detect cell-associated EBV genomes in all but the highest titer individuals. We next examined antibody correlates of viral load. There was no correlation between viral load and overall EBV antibody breadth (Fig. 4b) . We posited that specific antibody reactivities may be associated with viral load for three reasons: (i) they may directly limit viral replication, (ii) high viral load may stimulate antibody production, and/or (iii) antibodies may correlate with innate immune responses or cell-mediated immunity. We therefore tested individual peptides for their association with EBV viral load (Fig. 4c) . A single peptide derived from the large tegument protein BPLF1, showed significant association with EBV viral load (Sidak-Nyholt adjusted p-value≤4.8x10 -4 ). While not statistically significant, peptides derived from the EBNA family of proteins tended to be negatively associated with viral load and peptides derived from structural proteins from the tegument, capsid and envelope were positively correlated with viral load. The four EBNA-2 C-TAD peptides that associated with variants in the HLA locus (shown as green dots in Fig. 4c ) were not associated with viral load or sex (green dots in Fig. S5 d-f ). . EBV viral load in circulating PBMCs does not correlate with breadth of a response but correlates with responses against specific peptides. a, A schematic outlining the approach for EBV copy number measurements. 350ng of DNA (corresponds to roughly 10 5 cells) extracted from PBMCs was used to detect the presence of EBV genomes by qPCR. Copy numbers were calculated using a standard curve. b, EBV copy numbers showed no correlation with breadth of antibody response as calculated by AVARDA. c, A volcano plot of significant association between 112 immunodominant EBV peptides (>20% in cohort) and EBV viral load in the VRC cohort. One peptide that showed a significant association (adjusted p-value of 4.8 x 10 -4 ) is marked in red. d, An LZ plot showing variants in the MHC class-I region on chromosome 6, associated with predicted EBV viral load. In an effort to increase the number of individuals with 'detectable' levels of EBV, and to perform analyses in the TwinsUK cohort for which EBV copy number was not available, we used gradient boosting to developed a multi-peptide 'sero-signature' predictive of EBV viral load. The 112 peptides reactive in >20% of the VRC cohort were used for model building. ROC analysis was used to identify an appropriate threshold for predicting whether a sample should be considered EBV positive (high copy) or negative (low copy). We performed a GWAS for predicted EBV copy (high versus low) in both the VRC and the TwinsUK cohorts. Meta-analysis revealed a strong association with variants in the MHC class-I locus of the HLA region (Fig. 4d , Table 3 ). Our results support a role for CD8+ cytotoxic T cells, possibly in conjunction with antibody epitope selection, in controlling PBMC-associated EBV copy number set point. GWAS identified MHC class-II associations for four peptides from the C-terminus of EBNA-2. This is similar to previously reported EBNA-1 antibody titer associations with HLA-DRB1 and HLA-DQB1. 48, 49 In our study, reactivity to several immunodominant EBNA-1 peptides were strongly heritable but we did not detect any significant genome wide associations, most likely due to insufficient power. Variation in the immunoglobulin genes is expected to influence the quality of an antibody response. However, we found no variants in this region associated with EBV epitope selection. The lack of association at the IGH locus may reflect the complexity of genetic variation in this region -including structural rearrangements and copy number variation that are not well captured via SNP based arrays. 50 However, two reactivities were strongly associated with MHC class-II alleles, which are the major determinants of CD4+ helper T cell epitope selection. Increasing the size of the study population and use of sequencing-based genetic analyses will likely reveal a role for the immunoglobulin loci in epitope selection. We previously developed a summary statistic to capture the clonality of the antibody response ("breadth") to a protein or virus. This metric was calculated individually for each virus in the library. The breadth of an antiviral response is likely to correlate with greater protection from re-infection, including potentially heterologous protection from similar viruses. 51 Using TwinsUK VirScan data, we estimated the genetic heritability of the EBV antibody response breadth to be 39%. However, we detected no significantly associated loci using GWAS. This may be because the breadth of an antibody response is a complex trait, which could not be deconvoluted due to the size of the cohorts in this study. A GWAS of HIV viral load set point identified roles for the MHC class-I locus and a candidate gene study identified CCR5. 52 A similar study on EBV viral load has not been performed. We find that PBMC-associated EBV viral load significantly associates with antibody reactivities against a peptide from EBV tegument protein deneddylase, BPLF1, but this should be replicated in an independent cohort. Antibody reactivity against EBV structural proteins tended to be positively correlated with viral load, whereas reactivity against EBV nuclear antigens tended to be negatively correlated with viral load. This could be due to the different spectrum of proteins presented to the immune system in individuals with high versus low viral load, and supported the use of a multi-peptide serosignatures as a surrogate for EBV viral load. We therefore employed machine learning to predict EBV viral load using antibody reactivity of 112 peptides. GWAS linked this EBV serosignature with the MHC class-I locus. Cytotoxic CD8+ T cells play an important role in controlling EBV infection by targeting infected B cells. 53, 54 Specific MHC class I variants could enhance or suppress this activity. An important caveat to this analysis, however, is that hidden variables associated with the serosignature (e.g. a coinfection) may underlie an indirect association with EBV viral load. There are notable limitations of this study. First, the VirScan library is composed of 56-aa peptides displayed on T7 phage. Conformational, discontinuous, and post-translationally modified epitopes are therefore absent from this study. This limitation is may disproportionately impact surface exposed epitopes. Second, the relatively small size of our cohorts limited the power of the GWAS analyses to detect associations with subdominant epitopes, complex genetic interactions, and antibody responses to less prevalent viruses. In summary, antibody epitope profiling is a powerful approach for characterizing the 13, 1958-1978 (2018) . PhIP-Seq/VirScan. PhIP-Seq and VirScan have been previously described in detail. 33, 55 Briefly, ELISA was performed to measure total IgG in serum samples and input volume was adjusted to 2 g of IgG input per IP. VirScan library was mixed with diluted serum at 10 5 -fold coverage (about 9.6x10 9 pfu for the 56-mer virome library). The library/serum mixture was allowed to rotate overnight at 4 C, followed by a 4-hour IP with protein A and protein G coated magnetic beads. PCR was performed with primers that flank the displayed peptide inserts. A second round of PCR was performed to add adapters and indexes for Illumina sequencing. Fastq files were aligned to obtain read count values for each peptide in the library, followed by calculation of z-scores as previously described. 42 Cohorts Jaccard index. Jaccard index calculations were performed by transforming individuals with z-score >= 7 as 'responders' and those lesser than that as 'non-responders' for each peptide. The total number of EBV peptides where both twins were responders were counted and the where A is the set of peptides that twin1 responded to and B is a set of peptides for the corresponding co-twin (twin2). Binarization of data. The z-score values from PhIP-Seq were transformed to binarized (response = "1", non-response = "0" and indeterminate values = "NA") using a threshold of >=7 as a "1", <= 3 as a "0" and values between 3 and 7 as "NA". Testing significance of association between peptide responses and EBV copy number. Fisher's exact test was used to calculated p-values of association between specific anti-peptide responses and EBV copy numbers measured by real time PCR. Peptide selection. We developed a set of criteria to select peptides for GWAS, provided in the flowchart of Fig. S3 . The VirScan library is composed of 106,678 56-aa peptide sequences representing all known human viruses (~400 viral species and strains, Fig. S3, box 1) . EBV is represented by 2180 peptides in the VirScan library (Fig. S3, box 2) . After binarization of the data, we remove peptides where responder proportions were < 20% and > 80% independently in the TwinsUK cohort (Fig. S3, box 3 ) . A total of 144 peptides in TwinsUK were selected for heritability analysis by SEM, resulting in 107 peptides with > = 20% heritability (Fig. S3, box 4 ). Of these peptides, we retained those where the responder proportion in the VRC cohort was between 20% and 80%, resulting in 57 peptides that were used for GWAS (Fig. S3, box 5 ). Where "A" represents the additive genetic influence, "C" the common or shared environment between the twin pair and "E" represents the non-shared environment ("E" also includes measurement error). 56 To estimate the heritability, we used Structural Equation Models (SEM), which utilize observed covariances from both MZ and DZ pairs to establish a causal relationship among the covariances and the latent parameters. We investigated ACE, AE and CE models and used the Akaike's information criterion (AIC) to select the best fitting model. Genotyping and Imputation. The VRC cohort was genotyped using the Illumina Human Genotyping of the TwinsUK cohort has been described previously in detail. 63 Genetic associations of peptide reactivities. We performed single-variant association analysis using dichotomized peptide reactivity data, treating seronegative individuals and borderline reactivities as missing data. Peptides were considered sufficiently powered for analysis if they were reactive in at least 20% but no greater than 80% of the study population and were also heritable in the TwinsUK study (n=107 of 144 peptides; Fig. S3 64, 65 , accounting for the number of independent traits (n=48) resulting in a genome-wide significance for EBV of p-value ≤ 1.04x10 -9 . The meta-analysis p-value was set to 0.01 in the replication cohorts, or a joint association p-value less than the VCR European only value. In the VRC cohort, we interrogated 7,637,921 variants for an association with each of the viral peptides using the penalized quasi-likelihood (PQL) approximation to the GLMM For each GWAS in the discovery group (VRC/EUR) we selected the associated variants included within a credible set and report the disease associations presented in the catalog in Table S1 . Multi-peptide serosignature for EBV viral load prediction. To predict the presence or absence of EBV based on VirScan data, we employed the XGBoost software. (https://xgboost.readthedocs.io/en/latest/) Specifically, we sought to classify samples based on presence or absence of EBV (> 0 copies is present and = 0 copies is absent) based on reactivity to 112 immunodominant EBV peptides. XGBoost leverages ensemble learning and gradient tree boosting to perform regression and/or classification. Machine learning with XGBoost relies on multiple hyperparameters, including size of terminal nodes in regression/classification trees, subset of samples used per round, subset of features used per round, number of trees, learning rate, regularization term, and others. To create the prediction model, we first subset the VRC samples into a 90% training set and a 10% validation set. We performed a grid search with 10-fold cross-validation on the 90% validation set to tune model hyperparameters; final hyperparameters were those that maximized the cross-validation AUC and were: max_depth = 2, eta = 0.001, gamma = 2, colsample_bytree = 0.5, min_child_weight = 1, subsample = 0.5. We then applied this gradient boost model to the entire 90% training set, and the performance of the model was determined with the 10% validation set. To prevent overfitting, the iteration with the best cross-validation AUC was used (iteration = 45; Fig. S7a ). This final model was then used to make predictions of expected likelihood of EBV presence in the TwinsUK cohort. Feature importance (Gain) was calculated using XGBoost. Receiver operating characteristics (ROC) analysis of viral load prediction models. Supplemental Figure 5 . EBV viral load is not correlated with ancestry, age or gender. a, A histogram showing the distribution of the EBV genomic copies detected in 10 5 PBMCs of individuals in the VRC cohort. b, Scatter plot of EBV viral load with age, grouped by EUR or AFR ancestry and by gender. c, Histogram of EBV positive and negative individuals in the EUR and AFR sub-groups grouped by gender. Tables below the panels provide the number of individuals in each group shown. d, A comparison of each peptide specific reactivity between male and female sub-groups in the VRC cohort shows no significant differences. e-f, Associations between peptide reactivities and EBV viral load show no differences for men (e) or women (f). Supplemental Figure 6 . Quantitative PCR detection of EBV EBNA-1 was highly sensitive. The graph shows qPCR performed on a dilution series of EBNA-1 DNA fragment. The assay had a linear range spanning 7 decades and was sensitive to 4 copies per reaction. Heterogeneity and longevity of antibody memory to viruses and vaccines Immunological memory in humans Heritability estimates and genetic and environmental correlations for the human immunoglobulins G, M, and A Plasma immunoglobulin concentrations in twins Serum immunoglobulin levels in twins The influence of heredity and environment on human immunoglobulin levels Identification of sequence variants influencing immunoglobulin levels Genome-wide genetic investigation of serological measures of common infections Human genetic variants and age are the strongest predictors of humoral immune responses to common pathogens and vaccines Genetic factors influence serological measures of common infections A Viral Exposure Signature Defines Early Onset of Hepatocellular Carcinoma The landscape of host genetic factors involved in immune response to common viral infections Virus Latency: Current and Future Perspectives Epstein-barr virus sequence variation-biology and disease EBV gene expression and regulation -Human Herpesviruses -NCBI Bookshelf To Be or Not IIb: A Multi-Step Process for Epstein-Barr Virus Latency Establishment and Consequences for B Cell Tumorigenesis Epstein-barr virus-infected resting memory B cells, not proliferating lymphoblasts, accumulate in the peripheral blood of immunosuppressed patients Molecular Parameters for Precise Diagnosis of Asymptomatic Epstein-Barr Virus Reactivation in Healthy Carriers Human leukocyte antigens and genetic susceptibility to lymphoma Human Leukocyte Antigens and Epstein-Barr Virus-Associated Nasopharyngeal Carcinoma: Old Associations Offer New Clues into the Role of Immunity in Infection-Associated Cancers Deconvoluting Virome-Wide Antiviral Antibody Profiling Data The repertoire of maternal anti-viral antibodies in human newborns Temporal virus serological profiling of kidney graft recipients using VirScan Ontogeny of recognition specificity and functionality for the broadly neutralizing anti-HIV antibody 4E10 Pan-viral serology implicates enteroviruses in acute flaccid myelitis Viral epitope profiling of COVID-19 patients reveals cross-reactivity and correlates of severity Comprehensive Profiling of HIV Antibody Evolution Measles virus infection diminishes preexisting antibodies that offer protection from other pathogens Improved Analysis of Phage ImmunoPrecipitation Sequencing (PhIP-Seq) Data Using a Z-score Algorithm Genome-wide association study of classical Hodgkin lymphoma and Epstein-Barr virus status-defined subgroups Evidence of Epstein-Barr virus infection in ulcerative colitis EBV PCR in the Diagnosis and Monitoring of Posttransplant Lymphoproliferative Disorder: Results of a Two-Arm Prospective Trial The Individual and Population Genetics of Antibody Immunity Global analyses of human immune variation reveal baseline predictors of postvaccination responses A Genome-Wide Integrative Genomic Study Localizes Genetic Factors Influencing Antibodies against Epstein-Barr Virus Nuclear Antigen 1 (EBNA-1) Combined linkage and association studies show that HLA class II variants control levels of antibodies against Epstein-Barr virus antigens The immunoglobulin heavy chain locus: genetic variation, missing data, and implications for human disease Antibody responses to endemic coronaviruses modulate COVID-19 convalescent plasma functionality Genome-Wide Association of Non-Progressive HIV and Viral Load Control Role of cytotoxic T lymphocytes in Epstein-Barr virus-associated diseases Cellular responses to viral infection in humans: lessons from Epstein-Barr virus Viral immunology. Comprehensive serological profiling of human populations using a synthetic human virome Analytic approaches to twin data using structural equation models Estimating heritability for cause specific mortality based on twin studies Second-generation PLINK: Rising to the challenge of larger and richer datasets Principal components analysis corrects for stratification in genome-wide association studies Next-generation genotype imputation service and methods ANNOVAR: Functional annotation of genetic variants from highthroughput sequencing data The genetic architecture of the human immune system: a bioresource for autoimmunity and disease pathogenesis Rectangular Confidence Regions for the Means of Multivariate Normal Distributions A Simple Correction for Multiple Testing for Single-Nucleotide Polymorphisms in Linkage Disequilibrium with Each Other Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness Model-free Estimation of Recent Genetic Relatedness below are EBV negative. The model performed at 70% sensitivity and 70% specificity (AUC = 0.776) for the training data and at 60% sensitivity and 60% specificity (AUC = 0.655) for the validation data (Fig. S7c-d right panels) Features of sub-dominant anti-EBV antibody responses. a, Sub-dominant anti-EBV responses (less than 20% of the cohort) were seen in peptides that map to most positions on the EBV reference genome b, Number of peptides (1997, VRC and 1918 TwinsUK) and c, grouping of sub Flow chart of EBV peptide selection for GWAS. Peptide reactivity scores were binarized (Responders: Z-score>=7, non-responders: Z-score <=3) and filtered based on responder proportion (>=20% and <= 80%) in TwinsUK. Of 144 peptides, 107 have estimated heritability >= 20%. A total of 57 EBV peptides were seroprevalent (Responders/non-responders >=20% and <= 80%) in VRC. These 57 peptides were selected for genome The magnitude of antibody response positively correlates with the number of effect alleles present. a-b, Distribution of z-scores in the VRC/EUR sub-cohort (a) and the complete VRC cohort (b) shows a positive correlation between magnitude of antibody response (higher z-scores equals stronger response) and number of effect alleles Supplemental Table 1. A list of diseases and phenotypes associated with credible-set variants identified for the two C-terminal EBNA-2 peptides Generation of a prediction model by gradient boosting a, Training of the model. b, ROC analysis of training cohort (n = 512, 90%). c, The 10 most important features of the prediction model