key: cord-0850406-5yceql4k authors: Nguyen, A.; Yusufali, T.; Hollenbach, J.; Nellore, A.; Thompson, R. F. title: Minimal observed impact of HLA genotype on hospitalization and severity of SARS-CoV-2 infection date: 2021-12-22 journal: nan DOI: 10.1101/2021.12.22.21268062 sha: b1a258f67cd848d5095e34f48505cad02fbb2ae1 doc_id: 850406 cord_uid: 5yceql4k HLA is a critical component of the viral antigen presentation pathway. We investigated the relationship between severity of SARS-CoV-2 disease and HLA type in 3,235 individuals with confirmed SARS-CoV-2 infection. We found only the DPB1 locus to be associated with the binary outcome of whether an individual developed any COVID-19 symptoms. The number of peptides predicted to bind to an HLA allele had no significant relationship with disease severity both when stratifying individuals by ancestry or age and in a pooled analysis. Age, BMI, asthma status, and autoimmune disorder status were predictive of severity across multiple age and individual ancestry stratificiations. Overall, at the population level, we found HLA type is significantly less predictive of COVID-19 disease severity than certain demographic factors and clinical comorbidities. In this study, we investigated the specific relationship between HLA type and COVID-19 severity in a cohort of 3,235 individuals obtained from AncestryDNA (10, 19) with confirmed SARS-CoV-2 infection. We extracted basic demographic and clinical data for 3,235 individuals among the AncestryDNA cohort (10, 19) with a positive SARS-CoV-2 nasal swab and classified the severity of their COVID-19 disease according to patient survey responses (Table 1) . We next assessed the extent to which these demographic and clinical features predicted COVID-19 severity, and we found comorbidities that are contributors in a linear model predicting hospitalization (Supplementary Table 1 ). To assess an individual's capacity to present SARS-CoV-2 peptides, we computed HLA-specific MHC b affinities of all k-mers of sizes between 8 and 12 inclusive from the SARS-CoV-2 proteome (n=48,395 un peptides) passing a proteasomal cleavage propensity filter. We used two different predictive tools: netMHCpan and HLAthena (21, 22) . In agreement with our prior work (11), we find a wide variety in puta peptide presentation capacity across different HLA types (Supplementary Figure 1) . We next developed a pooled multivariate model of severity score, accounting for comorbidities as well a putative viral peptide presentation as a function of HLA type, and we found that age (p < 0.01), BMI (p < asthma status (p < 0.01), diabetes status (p < 0.01), and other lung conditions (p < 0.05) were all predict ( Figure 1 ; Supplementary Table 1-Stratified_LM_models_corrected.csv). There was no association betw the number of putatively presented class I peptides and COVID-19 severity. The significance of the asso between BMI and age was diminished for age >60 years. This is consistent with CDC reports of obesity risk factor in hospitalization and death, specifically among individuals <65 years (3). . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 22, 2021. ; https://doi.org/10.1101/2021.12.22.21268062 doi: medRxiv preprint HLA genes are generally considered important for host response to novel infectious diseases. In this study, we found that age, BMI, and other comorbidities determined clinical outcome across 3,235 individuals as described in the literature (1) (2) (3) 13) , and to a far greater degree than an individual's HLA-specific capacity to present SARS-CoV-2-specific peptides. While we previously explored the potential of HLA-peptide binding to predict COVID-19 severity (11), we do not see evidence for this phenomenon in the large real-world clinical cohort explored here. While the majority of the individuals were imputed to be of European ancestry, there were sizable numbers of individuals of Amerindian, Asian, and African descent. While Roberts et al. (10) performed a stratified GWAS analysis using this same dataset, with binary endpoints of hospitalization and whether an individual developed any COVID-19 symptoms, they did not specifically explore the role of HLA, which has a high level of variability that reduces power to detect differences in populations. Further, we investigated SARS-CoV-2 specific peptide presentation as a nonlinear function of HLA type, where some HLA types may be more similar to each other in the number of predicted peptides they can bind than they may be in canonical HLA supergroups. We note several limitations to our work. Firstly, the proportion of SARS-CoV-2 peptides that we tested were generated through whole-peptidome in silico analysis of SARS-CoV-2. This may not be representative of the actual SARS-CoV-2 peptides presented in a given individual, whether due to biological sources such as viral variation, or methodological sources such as potential inaccuracies in peptide-MHC binding affinity predictions. Secondly, individuals who suffered debilitating infections may have been less likely to participate in the survey, and no individuals who died of COVID-19 were able to participate in the study, potentially resulting in an undercounting of the most severe phenotypes. Further, the cohort was primarily European, with much smaller sample sizes for African, Asian, and Amerindian ancestry. Lastly, these data were composed entirely of the unvaccinated cohort, as this population was tested and surveyed before the release of the many SARS-CoV-2 vaccines. A number of other studies (15) (16) (17) (18) (23) (24) (25) have examined the relationship between HLA alleles and COVID-19 severity, and few have found alleles significantly associated with severity. In the majority of these studies, the large number of possible alleles in each study reduced the statistical power to identify significant alleles after . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 22, 2021. ; https://doi.org/10.1101/2021.12.22.21268062 doi: medRxiv preprint multiple testing correction. Further, a number of studies reporting statistical significant associations between severity and HLA type were regional; they tended to have more ethnically and geographically homogeneous cohorts, likely resulting in overrepresentation of some alleles. Taken together with our analysis of the AncestryDNA dataset, we suggest that the literature does not reliably support the role of HLA type in modifying real-world COVID-19 disease severity across a population. There are multiple potential explanations for this, including that the data and analyses to date do not accurately reflect the true potential disease-modifying effects of HLA genes. On an individual basis, HLA type may indeed influence the severity of COVID-19 disease; however, this hypothesis is not readily borne out at a population level. Multiple demographic features and clinical comorbidities are significantly more predictive of disease severity in a population. Future work should take a very critical and individualized approach towards evaluating any connections between HLA variation and differences in COVID-19 disease severity. Genetic ancestry was determined using plinkQC v1.9 (26) to combine genotypes of the cohort with genotypes of a reference dataset (27) . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 22, 2021. ; https://doi.org/10.1101/2021.12.22.21268062 doi: medRxiv preprint HLA Class I/II alleles were obtained using HIBAG v1.3 (28), a prediction method for HLA imputation that utilizes large training sets with known HLA and SNP genotypes in combination with attribute bagging. Ancestry-specific pre-fit models available within HIBAG for European, Asian, African, and Amerindian populations were applied to the subgroups of distinct ancestries within the AncestryDNA cohort. We collapsed the 10 point WHO COVID-19 Ordinal Scale of disease severity (29) into a 7-point scale to accommodate available phenotype information in the AncestryDNA COVID-19 study. The possible symptoms in the AncestryDNA cohort are fever, shortness of breath, dry cough, body aches, abdominal pain, cough producing phlegm, and nausea. There are 3 levels of severity to each symptom: normal, severe, and very severe. We defined severe symptoms (Severity Score 3) as any of the listed symptoms at the severe or very severe level. In models where we used hospitalization as an endpoint, we added hospitalization as a binary variable, with scores >=4 considered hospitalized. We obtained SARS-CoV-2 peptide sequences by k-merizing FASTA protein sequences obtained from the NCBI RefSeq database (NC_045512.2 and NC_004718.3) into 8-to 12-mers. These k-mers were filtered by NetChop v3.1 using default settings with a cutoff of 0.1. MHC class I binding affinity predictions were performed using netMHCpan v4.0 using the '-BA' option to include binding affinity predictions and the '-l' option to specify peptides 8 to 12 amino acids in length. Additional MHC class I binding affinity predictions were . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 22, 2021. ; performed using HLAthena. For predicted peptide binding, we used the cutoff of <500nM for peptides predicted by netMHCpan v4.0 and the cutoff of >0.5 probability score for peptides predicted by HLAthena. While nearly all individuals have two HLA-A/B/C haplotypes constituting as few as three but as many as six distinct alleles, a single peptide may be predicted to bind to more than one of an individual's HLA alleles. While there is no definitive evidence that a peptide is more likely to be presented when predicted to bind to more than one allele, we wanted to capture this possibility by using 2 metrics: an overall predicted peptide value and a unique predicted peptide value. For each individual, to calculate capacity to bind SARS-CoV-2 peptides, we summed the number of predicted peptides bound to each individual's allele (min 3, max 6). For a unique-peptide specific capacity, the peptides were filtered to remove duplicates after summation. We performed statistical tests for HLA vs. hospitalization using the Bridging ImmunoGenomic Data-Analysis Workflow Gaps (BIGDAWG) pipeline and a comprehensive SARS-CoV-2 peptide-genotype binding analysis for all individuals in our dataset. All statistical analyses were performed using R version 4.0.3. For each statistical test, we performed pooled and ancestry-stratified testing. For multivariate linear modeling, we used the R function lm for multivariate regression with one of severity index, hospitalization status, or asymptomatic/symptomatic as the endpoint. Tests of Hardy-Weinberg equilibrium using Chi-squared testing for haplotypes, loci, and HLA-amino acid positions were performed using the BIGDAWG v1.3.4 R package. Note that all reported p-values have been corrected for multiple hypothesis testing, where relevant, using Benjamini-Hochberg correction. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 22, 2021. ; https://doi.org/10.1101/2021.12.22.21268062 doi: medRxiv preprint Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan Risk factors for Covid-19 severity and fatality: a structured literature review Body Mass Index and Risk for COVID-19-Related Hospitalization, Intensive Care Unit Admission, Invasive Mechanical Ventilation, and Death -United States Natural HLA Class I Polymorphism Controls the Pathway of Antigen Presentation and Susceptibility to Viral Evasion HLA and Infectious Diseases HLA-Class II Artificial Antigen Presenting Cells in CD4+ T Cell-Based Immunotherapy Human Leukocyte Antigen (HLA) and Immune Regulation: How Do Classical and Non-Classical HLA Alleles Modulate Immune Response to Human Immunodeficiency Virus and Hepatitis C Virus Infections? Front Immunol Major histocompatibility complex: Antigen processing and presentation HLA alleles measured from COVID-19 patient transcriptomes reveal associations with disease prognosis in a New York cohort Host Genetic Study Identifies Three Novel Loci Human Leukocyte Antigen Susceptibility Map for Severe Acute Respiratory Syndrome Coronavirus 2. Gallagher T, editor Genome-wide association study of COVID-19 severity among the Chinese population Genomewide Association Study of Severe Covid-19 with Respiratory Failure Host genetic factors determining COVID-19 susceptibility and severity The influence of HLA genotype on the severity of COVID-19 infection Correlation of the two most frequent HLA haplotypes in the Italian population to the differential regional incidence of Covid-19 Association between HLA genotypes and COVID-19 susceptibility, severity and progression: a comprehensive review of the literature Possible role of HLA class-I genotype in SARS-CoV-2 infection and progression: A pilot study in a cohort of Covid-19 Spanish patients Ancestry COVID-19 study -EGA European Genome-Phenome Archive Bridging ImmunoGenomic Data Analysis Workflow Gaps (BIGDAWG): An integrated case-control analysis pipeline NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data A large peptidome dataset improves HLA class I epitope prediction across most of the human population Association of HLA Class I Genotypes With Severity of Coronavirus Disease-19 Human Leukocyte Antigen (HLA) Class I Susceptible Alleles Against COVID-19 Increase Both Infection and Severity Rate. Cureus SARS-CoV-2 genomic variations associated with mortality rate of COVID-19 Data quality control in genetic case-control association studies A global reference for