key: cord-1004504-gqnz8kf5
authors: Weiner, January; Suwalski, Phillip; Holtgrewe, Manuel; Rakitko, Alexander; Thibeault, Charlotte; Müller, Melina; Patriki, Dimitri; Quedenau, Claudia; Krüger, Ulrike; Ilinsky, Valery; Popov, Iaroslav; Balnis, Joseph; Jaitovich, Ariel; Helbig, Elisa T; Lippert, Lena J; Stubbemann, Paula; Real, Luis M; Macías, Juan; Pineda, Juan A; Fernandez-Fuertes, Marta; Wang, Xiaomin; Karadeniz, Zehra; Saccomanno, Jacopo; Doehn, Jan-Moritz; Hübner, Ralf-Harto; Hinzmann, Bernd; Salvo, Mauricio; Blueher, Anja; Siemann, Sandra; Jurisic, Stjepan; Beer, Juerg H.; Rutishauser, Jonas; Wiggli, Benedikt; Schmid, Hansruedi; Danninger, Kathrin; Binder, Ronald; Corman, Victor M; Mühlemann, Barbara; Arjun Arkal, Rao; Fragiadakis, Gabriela K.; Mick, Eran; COMET, Consortium; Calfee, Carolyn S.; Erle, David J.; Hendrickson, Carolyn M.; Kangelaris, Kirsten N.; Krummel, Matthew F.; Woodruff, Prescott G.; Langelier, Charles R.; Venkataramani, Urmila; García, Federico; Zyla, Joanna; Drosten, Christian; Alice, Braun; Jones, Terry C; Suttorp, Norbert; Witzenrath, Martin; Hippenstiel, Stefan; Zemojtel, Tomasz; Skurk, Carsten; Wolfgang, Poller; Borodina, Tatiana; Pa-COVID, Study Group; Ripke, Stephan; Sander, Leif E; Beule, Dieter; Landmesser, Ulf; Guettouche, Toumy; Kurth, Florian; Heidecker, Bettina
title: Increased risk of severe clinical course of COVID-19 in carriers of HLA-C*04:01
date: 2021-09-02
journal: EClinicalMedicine
DOI: 10.1016/j.eclinm.2021.101099
sha: 93018f1100193e672cd002a62b3d2cdbd0cea23d
doc_id: 1004504
cord_uid: gqnz8kf5

BACKGROUND: Since the beginning of the coronavirus disease 2019 (COVID-19) pandemic, there has been increasing urgency to identify pathophysiological characteristics leading to severe clinical course in patients infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Human leukocyte antigen alleles (HLA) have been suggested as potential genetic host factors that affect individual immune response to SARS-CoV-2. We sought to evaluate this hypothesis by conducting a multicenter study using HLA sequencing. METHODS: We analyzed the association between COVID-19 severity and HLAs in 435 individuals from Germany (n = 135), Spain (n = 133), Switzerland (n = 20) and the United States (n = 147), who had been enrolled from March 2020 to August 2020. This study included patients older than 18 years, diagnosed with COVID-19 and representing the full spectrum of the disease. Finally, we tested our results by meta-analysing data from prior genome-wide association studies (GWAS). FINDINGS: We describe a potential association of HLA-C*04:01 with severe clinical course of COVID-19. Carriers of HLA-C*04:01 had twice the risk of intubation when infected with SARS-CoV-2 (risk ratio 1.5 [95% CI 1.1–2.1], odds ratio 3.5 [95% CI 1.9–6.6], adjusted p-value = 0.0074). These findings are based on data from four countries and corroborated by independent results from GWAS. Our findings are biologically plausible, as HLA-C*04:01 has fewer predicted bindings sites for relevant SARS-CoV-2 peptides compared to other HLA alleles. INTERPRETATION: HLA-C*04:01 carrier state is associated with severe clinical course in SARS-CoV-2. Our findings suggest that HLA class I alleles have a relevant role in immune defense against SARS-CoV-2. FUNDING: Funded by Roche Sequencing Solutions, Inc.

Evidence before this study

We have searched Pubmed for publications containing keywords "HLA", "COVID", "association" and "severity" published before June, 17th 2021 (exact search phrase: (covid) AND (severity) AND (hla) AND (association) AND (("2019/12/31 00 [Date -Completion]: "2021/06/17 00 [Date -Completion]). Among the 69 query results, there were several studies, which involved Human leukocyte antigen alleles typing and COVID-19. None of the analyses performed included a multi-center, multi-cohort study and none included additional covariates, which can potentially distort the observed association, such as sex, age or genetic background. Furthermore, none of these analyses attempted to compare the HLA typing results with the results from genome-wide association studies. All previously published analyses were based on small sample sizes.

We have performed a systematic analysis of the association of type I and type II HLA alleles in over 400 patients from several countries. Furthermore, we have replicated the HLA association analysis with an analysis of GWAS data (7796 cases and 875,694 controls) and combined our analysis with results of in silico epitope modeling. Our models included covariates aimed at excluding potential confounding effects (such as alleles which are associated with age or sex rather than severity).

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was first reported in Wuhan, China, at the end of 2019 [1, 2] . It rapidly spread to Europe, the United States (US), and the rest of the world, manifesting as coronavirus disease 2019 [3, 4] .

Since the beginning of the pandemic, a striking clinical spectrum has been described among patients with COVID-19. Some patients remain asymptomatic, others may become "super spreaders" or "super emitters" [5] , while yet others have a severe clinical course leading to respiratory or multiorgan failure with potentially lethal outcome [6] . Various clinical risk factors for severe clinical course have been described such as age, diabetes, hypertension, and obesity [4, 6] . At the genomic level, blood groups [7, 8] , as well as genes involved in antiviral defense mechanisms and inflammatory organ damage, may affect clinical outcomes [9] .

The HLA system plays a crucial role in immune response [10, 11] . Genetic polymorphisms of HLA have been reported to affect the clinical course of patients infected with various RNA viruses (e.g. influenza virus H1N1 [12] , Hantaan virus [13] , and SARS-CoV-1 [14] ). HLA-B* 46:01 has been shown to be associated with a severe clinical course in patients infected with SARS-CoV-1 [14] . Furthermore, a recent in silico analysis of HLA data from an international database suggests that HLA-B* 46:01 predisposes patients to a more severe clinical course of SARS-CoV-2 [15] , similar to that found in patients with SARS-CoV-1 [14] . While HLA-B* 46:01 frequency is 17% in Asia, where these data were obtained, the allele frequency is 0.05% in Germany and <0.01% in Spain, where the largest part of our study was conducted (http://www.allelefrequencies.net) [16] . Therefore, we suspected that if an HLA risk allele was identified in our data, it would differ from the one discovered during the SARS-CoV-1 pandemic in Asia.

Our central hypothesis was that specific HLA alleles predispose patients to a severe clinical course in COVID-19. Severe clinical course was defined as intubation, intensive care requirement, or death. We also sought to evaluate if specific HLA alleles predispose patients to acute cardiac injury, which is another surrogate of severe illness.

To test our hypothesis, we collected blood samples from patients infected with SARS-CoV-2 for HLA sequencing during the first peak of the European COVID-19 pandemic. Enrollment centers included hospitals in Germany, Switzerland, and Spain. Furthermore, we leveraged existing RNA-Seq data obtained in the US to validate our findings. Then, we tested our findings through meta-analysis of data from several, large, recent GWAS studies to evaluate the reproducibility of the identified HLA risk allele in patients with severe clinical course in COVID-19 and to investigate possible evidence of increased susceptibility to COVID-19 in carriers of this risk allele.

We analyzed the associations between all HLA class I and important HLA class II loci and disease severity in 435 patients with mild to severe symptoms of COVID-19 for HLA sequencing analysis.

All patients of the Charit e Universit€ atsmedizin Berlin are included in data set 1 (DS1) and patients from Switzerland and Spain in data set 2 (DS2) according to the sequence of their enrollment into the study. In addition to the 288 patients recruited for this study (DS1 and DS2), we used data from a public data set containing RNA sequencing data for a total of 99 patients with COVID-19 containing phenotypic information on requirement for intensive care or intubation (https://www.ncbi.nlm.nih.gov/geo; GSE157103 [17] , GSE174818 [18] ) and for 48 patients with COVID-19 enrolled in the UCSF COVID-19 Multiphenotyping for Effective Therapies (COMET) Cohort with available RNA sequencing data (data set 3, DS3; Supplemental Table  1 ). Through inclusion of multiple international data sets we tested for broad applicability of our results.

Participants in our cohort were representative of the full spectrum of clinical presentation of COVID-19 from asymptomatic to multi organ failure and death. Diagnosis was made based on detection of SARS-CoV-2 viral RNA using a reverse transcription-polymerase chain reaction (RT-PCR) test from nasopharyngeal swabs and/or detection of SARS-CoV-2 specific antibodies in the blood through enzyme-linked immunosorbent assay (ELISA). Sample collection took place during the first peak of the pandemic at six European Hospitals: Germany: Charite Universitaetsmedizin Berlin (3 sites, total 135 patients, DS1) [19, 20] ; Spain: Hospital Universitario de Valme, Sevilla (total 115 patients, first part of DS2), Hospital Universitario San Cecilio, Granada (total 18 patients, second part of DS2) and Switzerland: Kantonsspital Baden AG, Baden (total 20 patients, third part of DS2). All patients with COVID-19, who provided informed consent and were 18 years or older were included. Whole-blood samples, buffy coats, or peripheral blood mononuclear cells (PBMCs) were collected from all patients for deoxyribonucleic acid (DNA) extraction during routine diagnostic venipuncture. In addition to clinical and laboratory data, we had access to first-measured viral load data for the cohort from Germany (based on RT-PCR cycle threshold) [21] .

We used requirement for intensive care unit (ICU) admission and intubation as categories for HLA analysis and standard reference range for troponin T hs (<14 ng/l) to define troponin elevation. Acute cardiac injury was defined as elevation of troponin T hs (>14 ng/l) without obvious signs for acute coronary syndrome. We directly used the troponin T hs levels to test for association. Troponin T hs levels were available for 66 patients in whom the value was determined as part of clinical routine, when indicated. Finally, we tested if the identified risk allele was associated with viral load. Dr. habil. January Weiner, PD Dr. med. Bettina Heidecker and Mr. Phillip Suwalski had full access to the data.

Our protocol included blood samples obtained during routine, clinically indicated venipuncture, which were then frozen at À80°C for subsequent DNA extraction and sequencing. A minimal set of deidentified clinical data was collected to protect data privacy. At Charite Universitaetsmedizin Berlin procedures were performed within the framework of the Pa-COVID-19 protocol [19] .

Given the exceptional situation of the pandemic and the international character of this study, there were differences amongst local ethics committees in consent procedures [8] . At each center, written informed consent was obtained from patients or representatives when possible, sometimes retrospectively if the patient was not able to give consent at the moment of sample collection.

This study was approved by the ethics committee of Charite Aliquots of patient genomic DNA samples were diluted to the same concentrations in 96-well plates. To meet the challenges of a high-throughput project, further pipetting steps were performed with multichannel pipettes or on the epMotion 5075 pipetting robot (Eppendorf).

For each patient sample, eleven HLA genes were amplified using NGSgo Ò -AmpX v2 HLAGeneSuite kit (#7371,662, GenDx). According to the design of this protocol, amplicons cover all class I HLA genes (HLA-A, -B, -C) and the most relevant parts of the class II HLA genes (HLA-DRB* 1, -DRB* 3/4/5, -DQB* 1, -DPA* 1, -DPB* 1, -DQA* 1), which have very long introns.

Amplification was performed according to the manufacturer's instructions (https://www.gendx.com/product_line/ngsgo-ampx-v2). To improve PCR efficiency, the starting amount of template DNA was increased from 45 ng to 80 ng per reaction for loci HLA-DRB* 1, -DQB* 1, -DPA* 1, -DPB* 1 and ÀDQA* 1. In addition, elongation time for HLA-DQA* 1 was extended to 5 min. Nevertheless, assay performance differed between the loci. Amplification was typically unproblematic for HLA-A, -B, -C, and DQA1 genes and less consistent in terms of output for other loci.

Amplicons were purified with Agencourt AMPure XP beads (#A63881, Beckman Coulter). Quality and quantity of purified PCR amplicons were checked using D5000 Screen Tape assay (#5067À5593, Agilent) on TapeStation (Agilent) and Qubit dsDNA assay kit (#33,120, Thermofisher) on FLUOStar Omega microplate reader (BMG Labtech), respectively. Confirmed amplicons were equimolar combined into twoÀ for HLA class I and HLA class II À pools per patient.

Single molecule real-time (SMRT) sequencing libraries were prepared according to the manufacturer's protocol (Procedure & Checklist -Preparing SMRTbell TM Libraries using PacBio Ò Barcoded Adapters for Multiplex SMRT Ò Sequencing PacBio Ò ). Barcoded Adapters were used to label patient amplicon pools. Up to 72 amplicon pools were combined per HLA class I and HLA class II library. Libraries were validated using 12,000 DNA Assay (#50,697À1508, Agilent) on Agilent Bioanalyzer and quantified using Qubit dsDNA assay on Qubit Fluorometer.

Libraries were loaded on SMRT cells and sequenced on a Sequel instrument (Pacific Biosciences) using 10 h / SMRT Cell 1 M sequencing runtime. Loading setup and run design was planned with SMRTLink v9.0 (https://www.pacb.com/wp-content/uploads/Quick-Reference-Card-Loading-and-Pre-Extension-Recommendations-forthe-Sequel-System.pdf). In total, HLA genes amplicons for 229 patients were sequenced on 12 SMRT cells.

Circular consensus sequence (CCS) data analysis and demultiplexing was performed within SMRT Link 9.0.0.92188 GUI.

HLA typing of RNA-Seq data (data set 3, DS 3)

Publicly available data were downloaded from the Gene Expression Omnibus (GEO identifier GSE157103). All RNA-Seq data were analysed using the arcasHLA package, version 0.2.0 [22] .

Finally, correlation analysis was performed to compare HLA allele frequencies between all three data sets (DS1, DS2, DS3).

Genomic DNAs were quantified by Quant-iT TM dsDNA Assay Kit, broad range (ThermoFisher Scientific) with a FLUOStar Omega microplate reader (BMG Labtech). Illumina-compatible libraries were prepared on a Bravo NGS Workstation Option B according to the manual KAPA HyperPrep/HyperPlus Automated Workflow for Agilent Bravo B NGS Instructions for Use, v3.0 (Roche Sequencing) using the KAPA Hyper Plus workflow with the reagents included in the KAPA Hyper Plus kit (Roche Sequencing). In short, 100 ng of genomic DNA was fragmented at 37°C in an Eppendorf Mastercycler Plus S for 25 min with subsequent end repair. Following end repair, fragments were ligated to the KAPA Universal Adapter (Roche Sequencing) and purified by KAPA HyperPure Beads (Roche Sequencing). Purified libraries were amplified by 6 cycles of PCR utilizing the KAPA UDI Primer Mix* (Roche Sequencing) and again purified using KAPA HyperPure Beads*. Amplified sample library concentration and size distribution was determined by Qubit dsDNA HS assay kit (ThermoFisher Scientific) with a Qubit TM 3 Fluorometer (ThermoFisher Scientific) and D5000 ScreenTape assay (Agilent Technologies) with a 4200 TapeStation system (Agilent Technologies). Eight amplified sample libraries were pooled prior to enrichment. For this purpose, 187,5 ng of each library was pooled into one well and the final volume was adjusted to 45 ml with nuclease-free water as designated by the manual KAPA HyperCap Workflow v3.0 (https://pim-eservi ces.roche.com/eLD/web/pi/en/home) and all further steps were performed following the recommendations of that manual.

These included hybridization of the amplified sample library pool to KAPA HyperExome Probes (Roche Sequencing) washing and recovery of the captured multiplex sample library, as well as amplification and purification of the enriched multiplex sample library. Final library pools were quantified by Qubit dsDNA HS assay kit with a Qubit TM 3 Fluorometer and correct size distribution was validated by High Sensitivity D5000 ScreenTape assay (Agilent Technologies) with a 4200 TapeStation system. A total of 12 multiplex enriched sample libraries were pooled equimolar utilizing the NextGen HyperCap Pooling Guide and Calculator template (Roche Sequencing) and diluted to 1.3 nM. Each final pool was sequenced on two lanes of an S4 FlowCell (Illumina, Inc.) for 2 £ 151 cycles on a NovaSeq 6000 system (Illumina, Inc.) *For Research Use Only. Not for use in diagnostic procedures.

Exome data was aligned using BWA-MEM to the reference GRCh37 (hs357d.fa) [53] . Samples were identified as KIR2DS4f-positive [54, 55] by screening the GATK HaplotypeCaller v3.7 [23] variant calls for presence of an insertion of CCCGGAGCTCCTATGACATGTA in exon 4 of KIR2DS4. For a more detailed description of the analysis please see methods in the data supplement.

Principal components analysis based on SNP genotyping from the obtained exome data was performed with the R package SNPrelate v. 1.26.0 [24] as described below.

Composite disease score DS1 consisted of patients of the Charit e cohort. For patients in DS1, we defined a composite score, including troponin T hs levels, death, ICU and intubation status (binned in five categories). Then, we used multiple correspondence analysis (MCA; R package FactoMineR [25] , v. 2.3) to obtain a score derived from the selected variables, defined as the first dimension of the individual coordinates.

The dominant model of association (h/h and h/-vs -/-) with the selected response variables was tested using a generalized linear model as implemented in the HIBAG R package [26] . For DS1 (German cohort), sex, age, and reported region of origin were used as covariates; for DS2, sex and age, and for DS3, sex, age, and reported ethnicity. In an additional analysis, BMI was used as a covariate. For categorical variables such as ICU status, logistic regression was used. In the analysis, we filtered alleles by minor allele frequency (MAF), i.e. considered alleles with frequency in all data sets above 5% and present in at least five individuals. Altogether, 35 alleles from 10 loci were tested for association (detailed information about tested alleles in Supplemental Table 2 ). The response variables differed between the three data sets analysed. DS 1 (Germany) included the following continuous variables: Troponin T hs (logarithmitized), maximum WHO ordinal scale for clinical improvement score, as well as categorical: Troponin T hs elevation above 14 ng/l, ICU, and intubation status. For DS 2 and 3 (DS2: Switzerland/Spain and DS3: US, respectively) we tested only for association with ICU and intubation status. For intubation and ICU status, we tested the association with alleles in all three data sets separately and then integrated the results using meta-analysis (see below). Troponin T (as continuous or categorical variable) and WHO score were only tested in DS1 and the results were corrected for multiple testing using the Bonferroni correction.

In addition, as a follow-up analysis, we tested for an association between viral load and alleles in DS1.

The results of association tests in DS1-À3 were next subjected to a meta-analysis. Given that only intubation status and ICU status were present in all three data sets, our primary analysis was focused on these two response variables. The log-odds ratios (beta-coefficients from the logistic regression models) were combined, and p-values for each allele were obtained using a weighted Z-test [27] (weight equal to the reciprocal of standard error). Family-wise error rates (p-values corrected for multiple comparisons) were derived using the Bonferroni method over all meta-analysis results.

To test presence of genetic subpopulations in DS 1 and 3, we leveraged exome derived gene variants. We used the R package SNPrelate, version 1.26.0, to process the VCF file and perform principal components (PC) analyses. Next, we tested whether any of the PCs were associated with the presence / absence of the identified risk allele using a t-test. Since exome typing was not available for roughly half of the samples in DS1 and for no samples in DS3, we have used self-reported region of origin for DS1 and ethnicity for DS3 as covariates in the linear regression models. In addition, we tested whether removing potential outliers based on exome data-derived PC analysis and including the PCs as covariates in the models of DS1 and DS2 has a major impact on the results of the meta-analysis. To this end, we identified outliers using interquartile range method as implemented in the R function boxplot.stats and removed them (as well as all individuals with no genetic background information) from DS1 and DS2. Next, we recalculated the PC analysis using the selected samples only and repeated the association tests, but now using PC1 and PC2 in lieu of self-reported region of origin. Finally, we repeated again the metaanalysis and compared its results with the results of the original analysis based on all data.

Association of HLA-C*04:01 with susceptibility to COVID-19 in external cohort

Recent reports demonstrated that some genetic risk alleles for COVID-19 susceptibility are also associated with severity [28] . Studies that investigate susceptibility to COVID-19 by comparing participants who are COVID-19 negative vs. positive are generally much larger in sample size than studies that investigate disease severity in COVID-19. Therefore, we investigated data derived from studies of susceptibility and severity.

We hypothesized that HLA-C*04:01 might also be associated with susceptibility to COVID-19. To test this hypothesis, we used an external cohort of 12,139 clients of the genetic testing company Genotek Ltd (Moscow, Russia). Genotek Ltd is part of the COVID-19 Host Genetics Initiative (https://www.covid19hg.org/results). All participants were genotyped using Illumina Infinium Global Screening Array. The HLA alleles were imputed using HIBAG software [26] . Cases were defined as individuals with self-reported positive COVID-19 RT-PCR or antibody test. Controls were defined as individuals with self-reported COVID-19 negative status.

Phenotyping was performed during November 2020 -April 2021. Samples from participants age > 18 years, and BMI >15 and <60 kg/ m 2 were included in the final data set. The cohort consisted of 2113 cases and 10,026 controls (51.6% were females; average age 36.4 § 13.5 years). To test the association of HLA-C*04:01 with COVID-19 susceptibility a logistic regression was fitted with age, sex, and BMI as covariates.

We analyzed data from HLA-related studies and GWAS analyses that investigated COVID-19 susceptibility and severity, including studies in preprint stage. The list of studies included in the metaanalysis is provided as a table in supplemental file 2.

In most GWAS analyses, authors did not impute HLA alleles from microarray data and report association results for the individual SNPs. To show that a certain SNP is a tag-SNP for HLA-C*04:01, we calculated D', the corresponding normalized coefficient of linkage disequilibrium and the squared correlation coefficient between pairs of loci (r 2 ). These linkage disequilibrium metrics were computed using R package genetics and genotypes from the 1000 Genomes (1KG) Project.

Consortium [29] . HLA-alleles for the 1KG data set were derived from a study by Abi-Rached and colleagues [30] . To detect the possible tag-SNPs we calculated linkage disequilibrium-metrics (D',r 2 ) for all SNPs from 1KG EUR which are within 500 kb of the HLA-C gene (provided as a table in supplemental file 3). We find that rs5010528 was presented in the summary statistics and could be used as tag-SNP for HLA-C*04:01 allele (r 2 = 0.966, D' = 1 in 1KG EUR cohort). To estimate the overall effect, we performed the meta-analysis using R package meta, assuming the fixed-effect model.

This study was partially funded by Roche Sequencing Solutions, Inc., which also provided material for exome sequencing and helped with project coordination and shipment of samples.

Roche Sequencing Solutions, Inc. contributed to study design and data analysis. The other funding sources had no influence on study design, data analysis or interpretation.

Descriptive data on age, sex, intensive care management, maximum respiratory support during hospitalization and relevant comorbidities (hypertension, coronary artery disease, diabetes) of patients included in all three data sets are provided in table 1 and supplemental Table 1 .

The median age was similar in all three data sets, ranging from 57 to 60 years. The Spanish cohort consisted of more women (59%) than the cohorts from Germany (33%), Switzerland (20%) and the US (35%). In the Spanish cohort the prevalence of hypertension and diabetes was lower, and patients were less likely to have been intubated. In DS1, the majority of the patients were of European ancestry (81% of patients who reported ancestry). However, this information was not available for 43% of the patients.

In DS3, data on ethnicity was available for the majority of patients. Patients were classified as being Asian, African American, Hispanic or Caucasian.

We found that HLA allele frequencies amongst the three data sets (DS1, DS2, DS3, supplemental Figure 1 and supplemental Figure 2 ) were correlated and no major differences in the frequencies of the alleles were apparent.

The allele HLA-C*04:01 was associated with intubation (adjusted p-value = 0.0074, Table 2 , supplemental file 1) in all 3 data sets using age, sex and ethnicity as covariates. The calculated odds ratio for intubation was 3.5 [95% CI 1.9À6.6] while the risk ratio was 1.5 [95% CI 1.1À2.1]. The OR for the association between HLA-C*04:01 with intubation when no other covariates were included in the model was 2.9 [1.6À5.2], adjusted p-value = 0.02.

In the individual data sets, HLA-C*04:01 was associated with a higher percentage of intubated patients in DS1 (69% vs 35%) and in DS3 (63% vs 37%), but not in DS2 (7% vs 6%, Fig. 1 and Fig. 2) . In total, out of 127 intubated patients in this study, 38 (30%) carried the allele HLA-C*04:01, while among the 308 remaining patients, 56 (18%) were carriers (for 7 patients, the HLA-C locus could not be determined).

When BMI was used as an additional covariate, the adjusted pvalue for the association of HLA-C*04:01 with intubation remained significant (adjusted p-value = 0.025; supplemental Table 3 ). After removing the cohort from Sevilla from the main analysis, the association of HLA-C*04:01 with intubation remained significant with an adjusted p-value = 0.0035 (supplemental Table 4 ).

Association of HLA-C*04:01 with COVID-19 severity and genetic background One of the potential error sources of HLA analysis may be that an allele occurs with a higher frequency in individuals with certain genetic background and, incidentally, a different profile of the disease ARTICLE IN PRESS due to genetic or socioeconomic factors. For DS1 and DS2, partial genetic data was available from exome sequencing for 153 out of 302 patients (51%). Rather than excluding all individuals with missing genetic background information, we sought other means of testing if genetic background has impact on the association. First, we did a PC analysis based on SNP variants in the available data (Supplemental Figure 3 ). Next, we tested whether there is any significant difference in the first four components between individuals who are or are not treated at the ICU, intubated, as well as difference between carriers of the HLA C*04:01 allele and patients lacking that allele (Supplemental Table 5 ). No significant associations of genetic background were found. Then, we identified outliers in the PC analysis plot, removed them and repeated the whole procedure again to detect any further outliers (Supplemental Figure 4) . Finally, we repeated the full metaanalysis using only those samples from DS1 and DS2 for which genetic data was available and which were not outliers on the PC analysis (as determined by the IQR method of detecting outliers). Then, we calculated the associations between alleles and ICU / intubation status using PC1 and PC2 as covariates and compared the results of the meta-analysis with the results from the total data set (Supplemental Figure 5) . The calculated effect sizes were highly similar (Pearson r 2 = 0.90 for intubation status and 0.87 for ICU), indicating that the effect did not depend on the genetic background in DS1 and DS2. Moreover, the HLA-C*04:01 allele remained the allele with the highest odds ratio (OR) in both meta-analyses.

While the majority of patients in DS1 were of European ancestry (Supplemental Figure 6) , the data set from the United States (DS3) included patients from Asian, African American, Hispanic, and Caucasian ethnicities (Supplemental Figure 7) . While such designations can be used at best as proxies of a genetic background [31] , it is known that among these ethnicities, severity and incidence of COVID-19 as well as frequencies of alleles vary. Therefore, there was a possibility that the association between HLA-C*04:01 (known to differ between various populations) and COVID-19 severity is a statistical artifact in DS3. The association of HLA-C*04:01 remained significant despite including ethnicity designation as a covariate for the association test in the DS3. On the level of individual ethnicities, we found the association of HLA-C*04:01 with COVID-19 severity to be present in all groups. Due to low sample size resulting from this subgroup analysis, this association did not reach a level of significance in any ethnicity except for Caucasian. However, all patients of African American, Hispanic, and Caucasian ethnicity, who were carriers of HLA-C*04:01 were hospitalized in the ICU, and all African American and Hispanic, as well as 66% of Caucasian carriers of HLA-C*04:01 were intubated (Supplemental Figure 7) .

For DS1, first-measured viral load data were available. Notably, measured viral load was not associated with severity of symptoms such as ICU/intubation status or troponin T HS levels (Supplementary Table 6 ). There were five significant associations of initial viral load with HLA alleles at p < 0.05, none of which were significant after correction for false discovery rate, but with at least medium effect size. The HLA-C*04:01 allele was not significantly associated with viral load. Given that viral load was not associated with disease severity in DS1 and that viral load data were not available for DS2 or DS3, we did not attempt to confirm whether the lack of association between HLA and viral load in DS1 was mirrored in the two other data sets.

Meta-analysis of HLA-C*04:01 allele and COVID-19 severity / susceptibility After having identified HLA-C*04:01 as a risk allele for severe clinical course in COVID-19, we sought to test the hypothesis that it is also associated with COVID-19 susceptibility. To test this hypothesis, we analyzed data from the Genotek cohort of 2113 self-reported COVID-19 positive individuals and 10,026 controls. In the Genotek cohort, the frequency of HLA-C*04:01 was 13%. An association analysis using logistic regression with age, sex, and BMI as covariates showed that HLA-C*04:01 significantly increased the risk of infection with COVID-19 (OR = 1.16, p-value = 0.005). Next, we studied if HLA-C*04:01 was associated with COVID-19 susceptibility and severity in previous GWAS and HLA analyses reported in the literature. We conducted two meta-analyses on HLA-C*04:01 and COVID-19 severity/susceptibility (Fig. 3) . For the severity meta-analysis we included association results from four cohorts with 7796 cases and 875,694 controls in total. The overall effect for HLA-C*04:01 on COVID-19 severity was OR = 1.1 (95%-CI: 1.04À1.17, p-value = 5.8e-04). The leading variant, rs143334143 (CCHCR1), was significantly associated with COVID-19 severity (OR = 1.9, p-value = 8.8 £ 10À18). This variant is in linkage disequilibrium with HLA-C*04:01 (r 2 = 0.77, D' = 0.97) in the 1KG European cohort.

For the susceptibility meta-analysis, we included association results from four cohorts with 39,044 cases and 1,668,157 controls in total. The overall effect for HLA-C*04:01 on COVID-19 susceptibility was OR = 1.06 (95%-CI: 1.04À1.09, p-value = 1.4e-05). However, in this meta-analysis significant heterogeneity between studies was observed (I 2 = 78%, t 2 = 0.0115, p-value = 3.2e-03).

A possible explanation for the effect of an HLA allele on disease severity might be abnormal binding affinity with SARS-CoV-2 peptides. To test that hypothesis, we used a previously described data set of putative (computed in silico) binding affinities [15] . Following an approach described by Iturrieta-Zuazo and colleagues [32] , we calculated the number of SARS-CoV-2 peptides, which bind either "tightly" (at < 50 mM) or "loosely" (at < 500 mM) for each allele. Using this approach, we discovered that HLA-C*04:01 is among the ten alleles with the fewest predicted binding SARS-CoV-2 peptides (supplemental Table 7 ). Therefore, we sought to systematically test the hypothesis that the putative affinity with SARS-CoV-2 peptides is a correlate of disease severity. First, for each of the response variables (ICU and intubation status, troponin T hs levels, composite score) we tested whether the effect size is correlated with the number of potentially binding peptides. We included all alleles (not filtered for frequency) in this comparison. While in all data sets combined there were no large effects in alleles with many potential binding peptides, negative correlation was only significant in DS3 (Spearman test, p < 0.007,

Next, for each patient, we calculated how many SARS-CoV-2 peptides can the type I alleles in that patient potentially bind and tested whether there is an association between disease severity (using troponin T hs levels, intubation and ICU status and composite score as proxies) and that number. However, none of these tests showed a significant association in this subgroup analysis.

Furthermore, we tested if HLA-C*04:01 is associated with higher C-reactive protein (CRP) levels as a surrogate for systemic inflammation. The differences were significant, (Wilcoxon test, p = 0.021; Supplemental Figure 8 ; Supplemental Table 8 ) but the effect was small (Wilcoxon effect size r = 0.2). CRP was, however, significantly different between patients who were treated at the ICU and those who were not (p < 10À5, r = 0.53), as well as between those who were intubated and those who were not (p < 10À5; r = 0.6).

Since polymorphisms in the KIR2DS4 gene in combination with the HLA-C*04:01 allele have been reported to increase viral load and lead to severe clinical course in patients with human immunodeficiency virus (HIV) [33] , we evaluated if presence of the KIR2DS4f variant was associated with our outcome variables (e.g. viral load, intubation). However, no association was identified. A combination of a KIR2DS4f variant with HLA-C*04:01 was identified in only four patients, of which only one was homozygous at the KIR2DS4 locus. This patient had a severe clinical course, was intubated, and had highly elevated troponin T hs and maximum WHO ordinal scale for clinical improvement score.

In the beginning of the outbreak of the COVID-19 pandemic in Europe in spring 2020, an international collaboration of European centers was established to address the question of whether there were potential genetic host factors associated with severe clinical course of SARS-CoV-2 infection. Samples were collected in Spain, Switzerland, and Germany [19] and subsequently processed in Germany for full-length sequencing of HLA genes.

We identified HLA-C*04:01 as a potential risk allele, which was associated with a two-fold increased risk of intubation when present in the form of at least one allele. This is the first significant description of this HLA allele as a relevant allele for the clinical course.

Importantly these findings were reproduced in an independent public data set obtained from patients with COVID-19 at Albany Medical Center, in the United States using RNA sequencing [17, 18] and a data set from the University of California, San Francisco, United States. Using this approach, we demonstrated broad applicability of our findings and confirmed reproducibility in data gathered through a distinct method À the analysis of the transcriptome [34] . This is particularly relevant in the context of regional differences in allele frequencies [35, 36] . In that regard, HLA-B* 46:01 was shown to be associated with severe clinical course in SARS-CoV in Asia [14] . However, this allele is very rare in Germany, Spain, Switzerland and the United States [16] . In line with that, HLA-B* 46:01 was not represented in our European data set. In a correlation analysis we demonstrated that the frequencies of most alleles of the German cohort were similar to those in the cohorts from Switzerland, Spain, and the United States.

However, an observation that warrants mention is that the Spanish population in DS2 consisted of a larger percentage of women (63% in Spain vs 33% in Germany). Importantly, DS2 was a representative sample of patients in Sevilla, where our recruiting center collected samples consecutively at the peak of the pandemic and by now has seen a similar proportion of women (60%) in all patients, who have since been recruited (n = 842). While our analysis did not reveal any association of specific HLA alleles with sex (supplemental Table 8 ), it has been shown that female sex is associated with better outcomes [37] . This effect of sex may have masked the effect of HLA-C*04:01 validation in DS 2. A higher percentage of women in this cohort may have contributed to the low rate of patients who required treatment in the intensive care unit. Future research is necessary to investigate this possibility.

The reported OR for HLA-C*04:01 were high with very broad CI (OR in meta-analysis 3.5 [1.9À6.6]). The reason for this was inclusion In addition to the robust association between HLA-C*04:01 and intubation, previous literature suggests a potential biological role of HLA-C*04:01 in defending viral infections. Specifically, it has been shown that in patients infected with the HIV, HLA-C*04:01 in combination with polymorphisms in the KIR2DS4 gene is associated with high viral load and severe clinical course [33] . Moreover, HLA-C*04:01 has been reported to occur at a higher frequency in COVID-19 patients than in the healthy population [38] .

The allele frequency of HLA-C*04:01 is approximately 13% in Germany overall and 19% in German residents of Turkish origin [16] , suggesting that the presented findings may impact a relevant fraction of the German population. Similar numbers are found in Switzerland, where the allele frequency is about 16% and Spain, where an allele frequency of 15% has been reported [16] . Notably, the difference in HLA-C*04:01 frequency in populations with different genetic background may suggest the presence of a potential confounder such as that the observed effect may be due to a combination of differences in allele frequencies and socioeconomic factors influencing the disease severity. However, despite testing this hypothesis with multiple approaches, we were unable to find confirmation of this effect. Association between HLA-C*04:01 and disease severity did not change when genetic background or ethnicity were used as covariates, or when the effect was calculated for a genetically homogeneous population.

Since HLA-C*04:01 is associated with viral load in HIV, we sought to evaluate if this was true for SARS-CoV-2. We did not find any association of HLA-C*04:01 with first-measured viral load of patients during their hospitalization. In addition, the first-measured viral load in intubated patients did not differ from patients with mild disease. We support the common hypothesis that by the time the patients present with severe respiratory failure, the virus may have been already partially cleared from the body in some of them and the disease may be maintained through inflammatory or autoimmune mechanisms similar to other viral respiratory illnesses [39] . The dynamics of viral load in SARS-CoV 2 have been reported comprehensively in the literature [40À42] . It may be beneficial to test for HLA-C*04:01 in clinical trials to identify from which drug class this high-risk population benefits most.

It is also possible that viral load of patients with HLA-C*04:01 may have been higher during the very first days of infection [43] , but since most patients only become symptomatic after approximately a week [42, 44, 45] , this time window may have been missed in some patients of our cohort to see a potential relationship of HLA-C*04:01 and viral loads measured during early infection. In general, there was no association of any specific HLA allele with first-measured viral load in our cohort.

Exome sequencing revealed a combination of a KIR2DS4f variant with HLA-C*04:01 in only four patients. One of those patients was homozygous at the KIR2DS4 locus and had a severe clinical course. While these findings are interesting, the small number of cases preclude drawing meaningful conclusions.

Notably, HLA affinity analysis revealed that HLA-C*04:01 binding affinity with SARS-CoV-2 peptides was amongst the ten lowest HLA alleles (supplemental Table 7 ). Delayed immune response due to low HLA binding affinity may be one potential biological explanation for the severe clinical course observed in patients with HLA-C*04:01.

Alternatively, HLA-C*04:01 may have led to worse outcomes by predisposing patients to a more severe inflammatory state. To test that, we evaluated for a potential association of HLA-C*04:01 with CRP serum levels. Indeed, CRP levels were higher in carriers of HLA-C*04:01 (p = 0.02, supplemental Figure 8 ). CRP correlated with the risk of intubation, as shown in prior studies [46, 47] .

The number of samples in our study was relatively small. To avoid the detrimental effects of small sample size on the statistical power, we have limited our study only to alleles filtered by MAF, resulting in 35 alleles tested for 2 phenotypes (ICU and intubation status) in all three data sets, and three additional phenotypes in DS1 only. Furthermore, we have included several different cohorts and compared our results with publicly available data sets. Nonetheless, small sample size might result in overestimated effect sizes. This could be clearly seen when comparing models including covariates (such as sex, and ethnicity) and models without such covariates.

We found significant association of the HLA-C*04:01 allele with COVID-19 susceptibility in the independent cohort. Moreover, we conducted two meta-analyses to combine the association results of HLA-C*04:01 or linked SNPs with COVID-19 severity and susceptibility. In these analyses, HLA-C*04:01 was significantly associated with COVID-19 severity and susceptibility. It should be noted that some studies included in the meta-analyses did not detect the association between HLA-C*04:01 and COVID-19, which can be explained by several factors. First, small sample size was a major limiting factor in some studies, resulting in a design and test combination that was underpowered for detecting hypothetical effect sizes of interest. Second, there were inconsistent definitions of phenotypes between different studies. In many studies the control groups were defined as "general population" with limited details. HLA alleles have different frequencies in different populations, which could affect the power of the studies with heterogeneous cohorts (e.g. COVID-19 Host Genetics Initiative) and lead to false positive findings. Some studies did not genotype or impute HLA alleles. Therefore, we had to use the tag SNPs which are in strong linkage disequilibrium with the considered HLA allele to include these studies in the meta-analysis. However, a high variability of linkage disequilibrium between certain SNP and HLA alleles for various populations in 1KG data can be observed.

The study by Pairo-Castineira and colleagues included 2244 cases from the GenOMICC data base who had a confirmed diagnosis of COVID-19 based on clinical testing and who required continuous cardiorespiratory monitoring, and ancestry-matched (5 to 1) controls from the UK Biobank in the discovery cohort. We found that the leading variant rs143334143 (CCHCR1) was significantly associated with COVID-19 severity and that this variant is in linkage disequilibrium with HLA-C*04:01 in the 1KG European cohort. However, in the meta-analysis [48] this variant was removed from the final results due to high heterogeneity (mainly driven by the GenOMICC cohort). The possible reasons of such heterogeneity include: 1) Different definitions of the controls (population control, PCR negative controls etc.); 2) various ancestry of the cohorts; 3) small sample sizes of certain studies.

Some other SNPs which are in the same linkage disequilibriumblock with rs143334143 and HLA-C*04:01 did not demonstrate a similar effect. The different analysis method of cases and controls in this study might explain that mentioned difference and therefore we opted for not including this study into the meta-analysis.

Our study is an important addition to prior HLA research in the field of COVID-19 that provided cumulative evidence for a potential predisposing susceptibility of HLA alleles for infection with SARS-CoV-2. It has been shown that specific HLA alleles were more common in patients infected with SARS-CoV-2 as compared to healthy individuals (e.g. HLA-DRB1* 15:01, HLA-DQB1*06:02, HLA-DRB1£08, HLA-C*04:01) and it was extrapolated that these alleles may predispose to infection [38,49À51] . It is not surprising that some of these alleles differed from ours, as we identified a genetic factor specifically for severe clinical course amongst patients with COVID-19, while other groups compared healthy versus diseased. A recent GWAS study identified blood group A as a major risk factor for COVID-19, but did not detect associations of the classical HLA loci with either COVID-19 infection or disease severity [8] . That study used target short-read sequencing of HLA regions for a sub-cohort and investigated association with HLA-C:04 only on a 2-digit rather than on the 4-digit level used in our study. Both factors may contribute to a reduced power that made it impossible to detect the association that our approach finds supported in three data sets from different studies and in different populations.

In addition to host factors, it has been shown that genetic factors of the virus may affect affinity of host HLA to viral structures. In that regard, de Sousa and colleagues have shown that a single mutation in the wild type sequence of SARS-CoV-2 could affect peptide binding of the virus to HLA Class II alleles [52] .

In summary, we applied full-length HLA sequencing to patients with COVID-19 to enable identification of HLA alleles with high precision in a large international data set and validated our findings successfully in an independent cohort. We used analysis of clinical and laboratory parameters, viral load data, and whole exome sequencing to evaluate biological plausibility.

A limitation of this study was that in order to promptly respond to the COVID-19 pandemic with an international multi-center study, even patients with limited metadata were enrolled in the study. However, risk factors and clinical outcomes that we considered most relevant based on prior literature were collected and analyzed for all patients included in our cohorts.

Reproducibility of our data in large GWAS analyses supports our findings despite this limitation.

Furthermore, it warrants mention that any study recruiting patients from a given clinical center (such as our center in Berlin) will be somewhat biased based on its patient selection À the Berlin population will not be representative for all of Germany or Europe, and the DS3 populations are not likely to be representative for all of US. However, we evaluated whether allele frequencies are similar to the allele frequencies reported based on larger population studies. Furthermore, we tested our findings in multiple international data sets and were able to replicate our findings, suggesting broad applicability of our results.

Larger cohorts with consistent pheno typing are required to draw more precise estimates about effect-sizes for accurate translation into further research and clinical care. confidentiality statements approvals. Summary statistics are provided in supplemental Table 1 .

Additional encrypted / anonymized data are available upon request. Custom R scripts and functions are available from https:// github.com/bihealth/covidhla/.

China Novel Coronavirus I and Research T. a novel coronavirus from patients with Pneumonia in China

Pillai SK and Washington State -nCo VCIT. first case of 2019 novel coronavirus in the United States

Characteristics of and important lessons from the Coronavirus Disease 2019 (COVID-19) Outbreak in China: summary of a report of 72314 cases from the Chinese Center for Disease Control and Prevention

Aerosol emission and supere mission during human speech increase with voice loudness

Zhong NS and China Medical Treatment Expert Group for C. clinical characteristics of Coronavirus Disease 2019 in China

Associations between blood type and COVID-19 infection, intubation, and death

Transcriptomic biomarkers for the accurate diagnosis of myocarditis

Genomics and the multifactorial nature of human autoimmune disease

Polymorphism of HLA class I and class II alleles in influenza A(H1N1)pdm09 virus infected population of Assam, Northeast India

The genetic polymorphisms of HLA are strongly correlated with the disease severity after Hantaan virus infection in the Chinese Han population

Association of HLA class I with severe acute respiratory syndrome coronavirus infection

Human leukocyte antigen susceptibility map for SARS-CoV-2

Allele Frequency Net Database

Large-Scale Jaitovich A. Multiomic analysis of COVID-19 Severity

Blood DNA methylation and COVID-19 outcomes

Clinical and virological characteristics of hospitalised COVID-19

Virological assessment of hospitalized patients with COVID-2019

arcasHLA: high-resolution HLA typing from RNAseq

A framework for variation discovery and genotyping using next-generation DNA sequencing data

A high-performance computing toolset for relatedness and principal component analysis of SNP data

FactoMineR: an R package for multivariate analysis

HIBAGÀHLA genotype imputation with attribute bagging

Optimally weighted Z-test is a powerful method for combining probabilities in meta-analysis

Common genetic variants identify therapeutic targets for COVID-19 and individuals at high risk of severe disease

A global reference for human genetic variation

Immune diversity sheds light on missing variation in worldwide genetic diversity panels

Science and Society. Taking race out of human genetics

Possible role of HLA class-I genotype in SARS-CoV-2 infection and progression: a pilot study in a cohort of Covid-19 Spanish patients

The HLA-C*04: 01/KIR2DS4 gene combination and human leukocyte antigen alleles with high population frequency drive rate of HIV disease progression

Transcriptome and genome sequencing uncovers functional variation in humans

How selection shapes variation of the human major histocompatibility complex: a review

A genomic perspective on HLA evolution

Male sex identified by global COVID-19 meta-analysis as a risk factor for death and ITU admission

Human Leukocyte Antigen Complex and Other Immunogenetic and Clinical Factors Influence Susceptibility or Protection to SARS-CoV-2 Infection and Severity of the Disease Course

Comparative epidemiology of pandemic and seasonal influenza A in households

Team CC-I. Presymptomatic SARS-CoV-2 Infections and transmission in a skilled nursing facility

Duration of infectiousness and correlation with RT-PCR cycle threshold values in cases of COVID-19

Estimating infectiousness throughout SARS-CoV-2 infection course

Clinical progression and viral load in a community outbreak of coronavirus-associated SARS pneumonia: a prospective study

SARS-CoV-2 Viral Load in Upper Respiratory Specimens of Infected Patients

Evidence for transmission of COVID-19 prior to symptom onset

Associations of procalcitonin, C-reaction protein and neutrophil-to-lymphocyte ratio with mortality in hospitalized COVID-19 patients in China

Inflammatory biomarker trends predict respiratory decline in COVID-19 patients

Mapping the human genetic architecture of COVID-19 by worldwide meta-analysis. medRxiv

HLA-B*44 and C*01 prevalence correlates with Covid19 spreading across Italy

Regional transplant coordinating C. HLA and AB0 Polymorphisms May Influence SARS-CoV-2 Infection and COVID-19 Severity

HLA allele frequencies and susceptibility to COVID-19 in a group of 99 Italian patients

Mortality in COVID-19 disease patients: correlating the association of major histocompatibility complex (MHC) with severe acute respiratory syndrome 2 (SARS-CoV-2) variants

Aligning sequence reads

Accurate and Efficient KIR Gene and Haplotype Inference from Genome Sequencing Reads with Novel K-mer Signatures

The killer cell immunoglobulin-like receptor (KIR) genomic region: gene-order, haplotypes and allelic polymorphism

Contribution to methodologies, data curation and software conception for study data analysis: RA, IV, IP, JW, MS, JZ and SR.BH, BHi, CC, CQ, PCSG, PS, UL and WP contributed to study supervision and organization.BH, BHi, CD, CQ, JW, PS, TB, TCJ, UK, XW and ZK conducted the laboratory processing and data preparation.BH, BHi, CMH, CQ, CRL, CSC, DJE, DP, FG, IP, IV, JA, JB, JW, JZ, KNK, MFK, MH, PGW, RA, SR, UK, UV contributed to study data validation.All authors were involved in the drafting and reviewing of the manuscript and provided approval for submission. 

We thank all patients who participated in this study for giving us a chance to better understand this novel illness. We would like to express our condolences to the families and friends of patients who died from COVID-19.We thank the clinical teams at all our enrollment sites for supporting this project while fulfilling their clinical duties during this challenging time of the COVID-19 outbreak.We thank Gary R. Pasternack, MD, PhD for editorial assistance. Biobanking was supported by the Central Biobank Charite. We acknowledge support from the German Research Foundation (DFG) and the Open Access Publication Fund of Charit e À Univer-sit€ atsmedizin Berlin.

Supplementary material associated with this article can be found in the online version at doi:10.1016/j.eclinm.2021.101099.