key: cord-0335360-02ht81ca authors: Degenhardt, F.; Ellinghaus, D.; Juzenas, S.; Lerga-Jaso, J.; Wendorff, M.; Maya-Miles, D.; Uellendahl-Werth, F.; ElAbd, H.; Ruehlemann, M. C.; Arora, J.; oezer, O.; Lenning, O. B.; Myhre, R.; Vadla, M. S.; Wacker, E. M.; Wienbrandt, L.; Blandino Ortiz, A.; de Salazar, A.; Garrido Chercoles, A.; Palom, A.; Ruiz, A.; Mantovani, A.; Zanella, A.; Rygh Holten, A.; Mayer, A.; Bandera, A.; Cherubini, A.; Protti, A.; Aghemo, A.; Gerussi, A.; Popov, A.; Ramirez, A.; Braun, A.; Nebel, A.; Barreira, A.; Lleo, A.; Teles, A.; Kildal, A. B.; Biondi, A.; Ganna, A.; Gori, A.; Glueck, A.; Lind, A.; Hinney, A. title: New susceptibility loci for severe COVID-19 by detailed GWAS analysis in European populations date: 2021-07-23 journal: nan DOI: 10.1101/2021.07.21.21260624 sha: dd4d096b61fa4b2e3a918c1c74fa5b6bc4820732 doc_id: 335360 cord_uid: 02ht81ca Due to the highly variable clinical phenotype of Coronavirus disease 2019 (COVID-19), deepening the host genetic contribution to severe COVID-19 may further improve our understanding about underlying disease mechanisms. Here, we describe an extended GWAS meta-analysis of 3,260 COVID-19 patients with respiratory failure and 12,483 population controls from Italy, Spain, Norway and Germany, as well as hypothesis-driven targeted analysis of the human leukocyte antigen (HLA) region and chromosome Y haplotypes. We include detailed stratified analyses based on age, sex and disease severity. In addition to already established risk loci, our data identify and replicate two genome-wide significant loci at 17q21.31 and 19q13.33 associated with severe COVID-19 with respiratory failure. These associations implicate a highly pleiotropic ~0.9-Mb 17q21.31 inversion polymorphism, which affects lung function and immune and blood cell counts, and the NAPSA gene, involved in lung surfactant protein production, in COVID-19 pathogenesis. Due to the highly variable clinical phenotype of Coronavirus disease 2019 (COVID- 19) , deepening the host genetic contribution to severe COVID-19 may further improve our understanding about underlying disease mechanisms. Here, we describe an extended GWAS meta-analysis of 3,260 COVID-19 patients with respiratory failure and 12,483 population controls from Italy, Spain, Norway and Germany/Austria, as well as hypothesis-driven targeted analysis of the human leukocyte antigen (HLA) region and chromosome Y haplotypes. We include detailed stratified analyses based on age, sex and disease severity. In addition to already established risk loci, our data identify and replicate two genome-wide significant loci at 17q21. 31 and 19q13 .33 associated with severe COVID-19 with respiratory failure. These associations implicate a highly pleiotropic ~0.9-Mb 17q21.31 inversion polymorphism, which affects lung function and immune and blood cell counts, and the NAPSA gene, involved in lung surfactant protein production, in COVID-19 pathogenesis. In the past year, Coronavirus disease 2019 , caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has evolved into a global pandemic with more than 182 million confirmed cases and 3.9 million COVID-19 related deaths worldwide (frequencies reported by the World Health Organization, July 2nd, 2021). The clinical manifestations of COVID-19 are variable and range from complete absence of symptoms to severe respiratory failure and death. Severe COVID-19 requires intensive medical care with respiratory support and can result in long-term damages detrimental to the individual. The pathogenesis of severe COVID-19 is, however, still poorly understood. This condition has been associated with clinical risk factors such as old age, male sex and comorbidities such as diabetes, active cancer, hypertension and coronary artery disease, and solid organ transplant or other conditions that promote an immunosuppressive condition. 1-4 Studies by this group and others have shown that genetic predisposition plays a role in COVID-19 susceptibility and severity. [5] [6] [7] Previously, we reported significant associations between genetic variants at loci 3p21.31 (around LZTFL) and 9q34.2 (ABO blood group locus) to severe respiratory failure and SARS-CoV-2 infection 5 , which have been replicated in subsequent studies. 7,8 While 3p21.31 was associated with disease severity, 9q34.2 was more associated with disease susceptibility 5 . Analysis of blood types of the ABO blood group additionally, showed that individuals carrying blood type A have a higher risk of COVID-19 infection. Since then, 11 additional genome-wide significant loci, associated with SARS-CoV-2 infection or COVID -19 manifestations have been reported by various studies, including the Genetics Of Mortality In . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. ; https://doi.org/10.1101/2021.07.21.21260624 doi: medRxiv preprint Critical Care (GenOMICC) Initiative and most recently the COVID-19 Host Genetics initiative (HGI). 6, 7 Six of these loci have been linked to critical illness by COVID-19, and include loci that were previously associated with pulmonary or autoimmune and inflammatory diseases. 6 Regarding the association with blood type Mankelow et al. 9 report a higher disease susceptibility for blood type A secretors, determined by genetic variation at the Fucosyltransferase 2 (FUT2) gene. Here, we report an extended genome-wide association study (GWAS) meta-analysis of severe COVID-19 with full and rigorously quality-controlled information on age, sex and disease severity, including 3,260 COVID-19 patients with respiratory failure. The latter was defined as respiratory support with supplemental oxygen [class 1] or non-invasive and invasive ventilation [classes 2 and 3, respectively], or by extracorporeal membrane oxygenation (ECMO) [class 4]) 5 and 12,483 population controls of unknown COVID-19 status from Italy, Spain, Norway and Germany/Austria. The availability of information on age, sex, and comorbidities such as hypertension, diabetes and coronary artery disease within each study cohort additionally allowed for in-depth and stratified analysis of especially age-and sex-specific risks for these loci. The discovery study (first and second analysis) was followed by an in-silico replication analysis in up to 12,888 hospitalized cases (including 5,582 critically ill cases) and 1,295,966 population controls from the COVID-19 HGI, which allowed us to replicate previous findings and characterize in detail new candidate loci for disease severity. Genetic analysis was followed by a thorough in-silico functional characterization of new loci. In response to clear expectations for a potential role for the human leukocyte antigens (HLA) in the disease course of COVID-19 and preliminary evidence from some smaller pilot studies [10] [11] [12] , our genetic analysis includes a detailed investigation of the genetic variations in the HLA region. With male sex identified as a risk factor for severe COVID-19 and COVID-19 related death 3 , we explore possible connections between genetic variants on the Y chromosome and the risk of developing severe COVID-19 in males 3 . Variations on the Y chromosome describe so-called Y-chromosome haplogroups with letters A-Z (defined by the Y Chromosome Consortium) 13 and follow a pattern of ancestral population migrations in Europe and on a global scale. HLA analysis includes classical fine-mapping of the HLA region based on local imputation of SNP, amino acid and classical allele information, as well as a broad range of other approaches, including a peptidome-wide association study (PepWAS) 14 computational prediction of SARS-CoV-2 peptide presentation, HLA class I supertype association analysis, and tests for heterozygote advantage, divergent allele advantage and molecular mimicry. Y-chromosome analysis includes analysis of known Y-chromosome haplogroups with a focus on the haplogroup R, the predominant haplogroup in Europe. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 23, 2021. Table 4 ). To calculate posterior probabilities of replicability (PPRs) of our genome-wide and suggestive . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 23, 2021. ; variants across individual discovery and replication GWAS studies, we performed hierarchical mixture model analysis with MAMBA 17 (Online Methods). The analysis showed a high probability (PPR>0.8), of consistent effect sizes across all analyzed cohorts for all but the variants at 14q32.11 (TTC7B, PPR=0.005) and at 17q11.2 (Supplementary Table 5 Table 3 ). Subsequently, we performed a fixed-effect inverse variance meta-analysis using METAL 16 across the first analysis and COVID-19 HGI B2 statistics as well as the second analysis and Table 3 ). Neither of the novel associations at PCDH7 at 4p15.1; FREM1 at 9p22.3; OLMF4 at 13q21.1; TTC7B at 14q32.11 and CPD at 17q11.2 had suggestive evidence after combining data with the HGI statistics. Since replicability with MAMBA, however showed high replication of the variants in this studies' cohort, subsequent analyses need to show whether this is attributable to an artefact in the cohorts of this study or maybe to high heterogeneity of cohorts and data sets included in the COVID-19 HGI analyses. The meta-analysis of our second discovery cohort (critically ill only) with the COVID-19 HGI summary statistics revealed an additional genome-wide significant locus not previously associated with severe COVID-19 (NR1H2 at 19q13.33; Pdiscoveryreplication=3.25×10 -8 for rs1405655; OR=1.09 for minor allele C; 95%CI=1.06-1.1) (regional association plot is shown in Supplementary Figure 8, Supplementary Table 3 ). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 23, 2021. ; To estimate possible hidden genetic effects from age, sex and disease severity, we performed an in-depth stratified analysis of the 3 genome-wide significant and 7 replicable suggestive loci from our first and second analyses as well as the 19q13.33 variant from the meta-analysis with COVID-19 HGI summary statistics. We additionally investigated association of these variants to known comorbidities such as hypertension, coronary artery disease and diabetes Fine-mapping of signals at 17q21. 31 and 19q13.33 We next performed an in-depth characterization of the previously unknown genome-wide significant locus 19q13.33 (NR1H2) and the 17q21.31 locus, which contains a known ~0.9-Mb inversion polymorphism spanning across several different genes, whose implication in COVID-19 has not been accurately investigated. Bayesian fine-mapping analysis with FINEMAP 18 (Online Methods) identified a total of 1,531 (log10(Bayes factor)=10.46) and 15 (log10(Bayes factor)=5.95) variants that belong to the 95% credible sets of variants most likely to be causal at 17q21.31 or 19q13.33, respectively ( Figure 1, Supplementary Table 7) . For 17q21.31, the 95% credible set included rs1819040 as the best SNP candidate with only 1.1% certainty, followed by 1,530 variants in high linkage disequilibrium (LD) (mean(LD)=0.997 and min(LD)=0.954) with certainty <0.3%, indicating that the individual SNP associations are only proxy variants for the actual causal variant (see below). For 19q13.33, the 95% credible set included rs1405655 as the best SNP candidate . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 23, 2021. ; https://doi.org/10.1101/2021.07.21.21260624 doi: medRxiv preprint with 59.1% certainty, followed by rs1274514 and rs1274510 (5.3% certainty), and 12 variants with certainty <4%, so we assume that rs1405655 represents the candidate causal variant The lead variant rs1819040 and the most strongly associated variants of the 95% credible set at this locus point directly to the common 17q21.31 inversion polymorphism, spanning across several different genes, including and mapping to two highly divergent haplotypes, H1 and H2, which are estimated to have evolved separately >2 million years ago 19 (Figure 1a) . The inversion haplotypes H1 and H2, were determined more accurately for COVID-19 respiratory support failure cases and controls by genotype imputation with IMPUTE2 20 , employing as reference 109 individuals from the 1000 Genomes Project for which 17q21.31 inversion genotypes were obtained experimentally by FISH and droplet digital PCR (Online Methods). LD between the rs1819040 variant, prioritized in the meta-analysis with the COVID-19 HGI summary statistics and the inversion in our cohorts is near perfect (r 2 =0.98, D'=0.99). rs8065800-G, observed as associated in our first and second analysis as a risk allele in the Spanish population, is inherited together with the major H1 haplotype, although being in low LD (in our cohorts: r 2 =0.12-0.18; D'=0.97-1), which also points to the effect of the inversion. Genome-wide significant association with severe respiratory COVID-19 for the inversion was confirmed using logistic regression followed by meta-analysis across this study's discovery and replication panels from the COVID-19 HGI (meta-analysis first discovery panel and We next performed several follow-up analyses to understand better possible functional implications of associations at the 17q21.31 and 19q13.33 loci. A phenome-wide association study (PheWAS) for 17q21.31 and 19q13.33 using a wide range of phenotypes from the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. 22 and other available GWAS data revealed no known phenotypes to be linked with the rs1405655 lead SNP at 19q13.33, while 162 GWAS associations were identified for the 17q21.31 inversion in the GWAS Catalog, illustrating its pleiotropic effects. These associations included several traits potentially related to COVID-19 pathology, such as blood and immune cell composition or lung function (Figure 1c) . Credible sets from Bayesian fine mapping at 17q21.31 and 19q13.33, overlap with several genes including MAPT, KANSL1, FMNL1 and CRHR1 at the inversion locus 17q21.31, and NAPSA, NR1H2, KCNC3 at locus 19q13.33 (Supplementary Figures 4, 5 and 8) . We performed an exploratory gene expression analysis using several publicly available datasets to: 1) Identify in which tissues or cell types our candidate genes are expressed by analyzing their RNA expression at bulk and single-cell level; 2) Examine the direct effect of both loci on gene expression by using expression and splicing quantitative trait loci (eQTL and sQTL) and; 3) Infer the possible contribution of these genes to COVID-19 pathology by looking at their expression patterns in a) monocytes exposed to different viral and non-viral immune stimulators; b) organoids infected with COVID-19 and c) single cell RNA-seq of several tissues including lung coming from patients who died after experiencing a SARS-CoV-2 severe disease (Online Methods). The potential functional role of the 17q21.31 inversion is supported by the analysis of 2,902 linked variants (r 2 >0.9) that are already reported as eQTL or sQTL in the GTEx Project. 23 The inversion is in LD with lead eQTLs and sQTLs (i.e. displaying the strongest association with target), for 24 and 7 genes, respectively, in at least one tissue (Supplementary Table 9 infection-like conditions, which appears to be related to a significant increase of coding and non-coding isoforms (Supplementary Figure 15) . Similarly, in SARS-CoV-2 infected brain . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. ; organoids, the expression of MAPT was significantly downregulated in premature and mature neuronal cells (Supplementary Figure 16, Supplementary Table 11 ). In the case of the 19q13.33 locus, expression of candidate genes shows high tissue specificity, with the NAPSA mRNA being specific to lung and lung parenchyma and KCNC3 being highly expressed in brain and thyroid tissue, while NR1H2 is more broadly expressed among human tissues, including many immune cell types (Figure 2b, Supplementary Figure 13) . Of those candidates, expression of KCNC3 and especially NAPSA appears to be clearly affected by the rs1405655 lead SNP (Figure 2a) . Single SNP Mendelian randomization analysis using The HLA fine-mapping approach yielded no association at the genome-wide or nominal (P<10 -5 ) significance threshold, neither in the overall meta-analysis across the four cohorts, nor within the separate cohorts (Supplementary Table 13, Supplementary Figure 17) . Furthermore, we found no significant association for any HLA-presented viral peptide in a so-called PepWAS approach (Supplementary Table 14 , Supplementary Figures 18-19) , where associations between HLA-presented peptides and disease is unravelled by integrating similarities and differences in peptide binding among HLA alleles across patients nor robust statistical associations with any of the other tested HLA parameters (Supplementary Table 15 ). Results of the Y-chromosome haplogroup analysis are shown in Supplementary Table 16 . We observed a significant risk association of the R-haplogroup R (M207) in the Italian is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. ; https://doi.org/10.1101/2021.07.21.21260624 doi: medRxiv preprint other age group or cohort. COVID-19 related mortality was significantly associated with haplogroup R1b1a2a1 (U106) (P=0.01, OR=2.8, 95%CI=1. 27-6.12) . This association remained after adjusting for the comorbidities hypertension, coronary artery disease and diabetes. COVID-19 related death remains, however, a challenging endpoint influenced by many factors but did not surpass correction for multiple testing. We here present a large collaborative COVID-19 genetics study of different centers from Italy, Spain, Norway and Germany/Austria. With our clearly defined phenotype of severe respiratory COVID-19 and a centralized genotyping and rigorous quality control, we have generated a valuable resource for further COVID-19 related genetic analyses. We identified and analyzed in detail two new loci of interest associated with severe COVID-19, the 17q21.31 inversion and the 19q13.33 locus. Furthermore, we examined for the first time effects of age and sex on various variants identified as genome-wide significant or suggestive in this study. As for the novel association at the 19q13.33 locus, additional analyses provided first hints for a functional involvement in COVID-19 through its regulation of the NAPSA gene, a gene encoding a protease highly expressed in Type 1 (AT1) and Type 2 (AT2) alveolar cells, two cell types required for the gas exchange at the lung surface and the secretion of surfactant proteins as well as immunomodulatory factors (AT2). 26 Complementary findings were observed in another recent study dissecting the lung transcriptome of COVID-19 infected patients in which NAPSA expression is increased also in AT1 cells. This study also linked NAPSA to the marker gene expression signature of "damage-associated transient progenitors" (DAMPs), an intermediate cell state between AT1 and AT2 cells promoted by inflammation, characterized by a failure of AT2 cells to differentiate to AT1 cells. 27 Thus, given that the NAPSA protein is involved in lung surfactant production, which is dysregulated in COVID-19 28 , the fact that the COVID-19 risk allele is associated with decreased NAPSA expression while increased expression of NAPSA is a protective factor for severe COVID-19, suggests a potential role of NAPSA in susceptibility for severe COVID-19. Moreover, we detected a clear association of the well-known and pleiotropic 17q21.31 inversion polymorphism, which is linked to many traits potentially relevant to COVID-19 outcome. For instance, the inverted haplotype H2 was previously associated with higher number of red blood cells and hemoglobin levels, whereas each haplotype correlates with different proportions of lymphocytes and granulocytes, which could potentially modulate the immune response during SARS-CoV-2 infection 29 . In addition, the H2 protective allele is also associated with decreased lung function and increased risk of chronic obstructive pulmonary . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. disease but protects against development of pulmonary fibrosis 30 , and is associated with higher ventilatory response to corticosteroids in individuals with asthma 31 , showing potential trade-offs and shared pathways that may be important in lung health. This variant had has been proposed to be under positive selection in Europeans through its effect on fertility 19 . Our results point to a role of this polymorphism in immunity and virus infection defense. Interestingly, inversion effects were found to be stronger in the younger age group in both severity classes, which could explain the weaker association in the HGI more severe A2 phenotype due likely to a larger proportion of older individuals 32 . The inversion probably affects COVID-19 disease course through its large effects on gene expression shown by us and others 6 . Although the function of many of the affected genes are not well known, the inversion acts as an eQTL and sQTL of several interesting candidate genes for severe COVID-19. In particular, there are several genes potentially associated with immune function and immune response. For example, KANSL1, involved in histone acetylation, is broadly expressed in many types of immune cells in upper airways and lung tissue (Figure 2 ) and has been proposed to play a role in the macrophage transition to an anti-inflammatory phenotype in mice 33, 34 . Here, we have found that its expression decrease in infection-like stimulated monocytes is partially compensated in homozygotes for the H2 inversion haplotype. CHRH1, associated to higher expression in the H2 haplotype in several tissues (Figure 2) , encodes a receptor that binds to corticotropin-releasing hormones, which are major regulators of the hypothalamic-pituitary-adrenal axis, and regulates immune and inflammatory responses 35 . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. ; the so-far largest, HLA analysis from Shachar et al. 40 Interestingly, we observed statistical association of the Y-haplogroup R with COVID-19 disease and mortality, however none of the results remain significant after correction for multiple testing. To gain more knowledge regarding the potential role of the Y-chromosome haplogroups in the COVID-19 pandemic, larger study samples are necessary as well as studies following the pandemic over time investigating whether the associations weaken or strengthen for different haplogroups. In summary, our findings add to the number of genome-wide significant hits for COVID-19totaling now around 16 independent loci -and provide new insights to the molecular basis of COVID-19 severity that could potentially trigger subsequent and more targeted experiments to develop therapies for severe COVID-19. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. ; https://doi.org/10.1101/2021.07.21.21260624 doi: medRxiv preprint We recruited 5,228 patients with mild to severe COVID-19, which was defined as The project protocol involved the rapid recruitment of patient-participants and no additional project-related procedures (we primarily used material from clinically indicated venipunctures) and afforded anonymity, owing to the minimal dataset collected. Differences in recruitment and consent procedures among the centers arose because some centers integrated the project into larger COVID-19 biobanking efforts, whereas other centers did not, and because there were differences in how local ethics committees provided guidance on the handling of anonymization or deidentification of data as well as consent procedures. Written informed consent was obtained, sometimes in a delayed fashion, from the study patients at each center when possible. In some instances, informed consent was provided verbally or by the next of kin, depending on local ethics committee regulations and special policies issued for COVID-19 research. For some severely ill patients, an exemption from informed consent was obtained from a local ethics committee or according to local regulations to allow the use of completely anonymized surplus material from diagnostic venipuncture. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. Detailed description on sample processing, genotyping, genotype quality control and genotype . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. ; https://doi.org/10.1101/2021.07.21.21260624 doi: medRxiv preprint To take imputation uncertainty into account, we tested for phenotypic associations with allele dosage data separately for the Italian, Spanish, German, and Norwegian case-control data. We carried out a logistic regression analysis corrected for potential population stratification, age and sex bias using the SAIGE software 43 We additionally performed age-and sex-stratified as well as severity analyses on candidate SNPs from our first and second discovery cohorts, as well as candidate SNPs from the COVID-19 HGI analysis 6 . We carried out a logistic regression analysis corrected for potential population stratification, age and sex bias using the software R version 3.6.2 (Supplementary Methods). The inverse-variance weighted fixed-effects meta-analysis was conducted using the R-package metafor 44 including only statistics from cohorts with NCase and NControl > 50. Subanalyses in age groups of 20-40, 41-60, 61-80 and > 80 years were carried out, with the highest sample numbers and statistical power in the age groups of 41-60 and 61-80 years, such that only these are reported (Supplementary Table 1e) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. Supplementary Methods) . 21 The inversion was coded as 0 for the major allele H1 and 1 for the minor allele H2. All association analyses were carried out as described above on the minor allele H2. Phenotype associations with the different variants were obtained from the NHGRI-EBI GWAS Catalog. 47 Similarly, tissue-specific expression or splicing effects were obtained by searching for SNPs in high LD (r 2 >0.9) that have been already identified as expression quantitative trait loci (eQTLs) or splicing quantitative trait loci (sQTLs) in cis by the GTEx Project (GTEx Analysis Release v8). 22 Candidate coding genes were selected based on their inclusion in the GWAS credible sets and/or if any of the variants had been identified as lead eQTL or sQTL. Exploratory gene expression analysis of selected candidates was performed on publicly available pre-processed RNA-seq datasets generated from organ tissues (GTEx Analysis Release v8 immune cell types (BLUEPRINT 23 ), as well as respiratory tract 48 and brain cells 49 (COVID-19 Cell Atlas) 50 . Differential expression of candidate genes in COVID-19 infected lung cells were obtained from pseudo-bulk differential expression analysis performed by Delorey et al. 51 Since several candidate genes (including MAPT, CRHR1 and KANSL1) were highly expressed in the neural system, differential gene expression was also analyzed on single-cell RNA-seq dataset of COVID-19 infected brain organoid cells from Song et al. 24 (obtained upon request). The analysis was carried out using hurdle modeling, implemented in the R package MAST. 52 Finally, to check the effect of the 17q21.31 inversion on monocytes stimulated by infection-like conditions we also performed a differential expression analysis in the RNA-seq data from Quach et al. 53 Detailed descriptions of this analysis can be found in the Supplementary Methods. In brief, quality-controlled genotypes at the HLA region (chr6:29-34Mb) were extracted. HLA allele, amino acid, and SNP imputation was performed using the random-forest based HLA genotype . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. imputation with attribute bagging (HIBAG) and applying specially tailored as well as publicly available reference panels. 5, 55, 56 The resulting data were used as a basis for several subsequent analyses, including: 1) basic association analysis (fine mapping) as described in the section Statistical Analysis and the Supplementary Methods, 2) a peptidome-wide association study 14 (pepWAS), to screen for disease-relevant peptides from SARS-CoV-2, that may present a possible functional link between severe COVID-19 and variation at classical HLA loci, 3) quantitative HLA analyses directed at the number of peptides bound by an HLA allele, as well as 4) an analysis of HLA-presentation of shared peptides ('molecular mimicry'). ABO blood group typing was performed as described by Ellinghaus et al. 5 . Briefly, genotypes of the SNPs rs8176747, rs41302905 and rs8176719 were extracted from the imputed data (R 2 =1 for all SNPs and cohorts) and used to infer the A, B and O blood types. The ABO-"secretor" status was inferred from the genotypes of the rs601338 SNP (G>A) at the FUT2 gene, located at 19q13.33, extracted from the imputed data (R 2 =0.98-0.99 for all cohorts). Individuals carrying genotypes GA or GG were assigned secretor status and individuals carrying genotype AA were assigned non-secretor status based on the genotype dosages, ranging from 0 to 2, retrieved from the imputed data. Individuals with allelic dosages 1.3-1.7 were called as "no call", individuals with dosages ≤1.3 were called "secretors" and individuals with dosages ≥1.7 were called "non-secretors". Association analysis was carried out as described in the Section Statistical Analysis -Association analysis of candidate SNPs on blood type or blood type secretor status coded as absent (0) or present (1). Blood types A and AB were also analyzed combined as "A and AB". . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. ; https://doi.org/10.1101/2021.07.21.21260624 doi: medRxiv preprint Genome-wide summary statistics of our analyses will be made available upon reasonable request to the corresponding author. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. ; https://doi.org/10.1101/2021.07.21.21260624 doi: medRxiv preprint Tables Table 1. Overview of patients included in the genome-wide discovery analysis. Overview of patients included in our first analysis (3, 260 patients) and second analysis (1,911 patients). Individuals of the Italian, Spanish, Norwegian and German cohorts were recruited at five, seven, eight and ten different hospitals/centers, respectively. Shown are respiratory support status groups 1-4, age and median age across all individuals as well as within each respiratory support group, percentage of females within each cohort, as well as percentage of individuals affected by known comorbidities of COVID-19, factors related to lung health and mortality. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 23, 2021. ; https://doi.org/10.1101/2021.07.21.21260624 doi: medRxiv preprint − log 10 (P ) Direction of effect: • Decrease Increase Unknown c . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 23, 2021. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 23, 2021. ; https://doi.org/10.1101/2021.07.21.21260624 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 23, 2021. ; https://doi.org/10.1101/2021.07.21.21260624 doi: medRxiv preprint non-financial support from MSD, outside the submitted work. Dr. Blasi reports grants and personal fees from Astrazeneca, grants and personal fees from Chiesi, grants and personal fees from Gsk, personal fees from Grifols, personal fees from Guidotti, personal fees from Insmed, grants and personal fees from Menarini, personal fees from Novartis, grants and personal fees from Pfizer, personal fees from Zambon, personal fees from Vertex, personal fee from Viatris outside the submitted work. Christoph Lange reports personal fees from Chiesi, Gilead, Janssen, Novartis, Oxfordimmunotec and Insmed outside the submitted work. Jan Heyckendorf reports personal fees from Chiesi, Gilead, and Janssen outside the submitted work. Stefano Duga received funding from the Banca Intesa San Paolo. Alberto Mantovani has received research funding from or reports personal fees from Dolce&Gabbana Fashion Firm Enzo Life (ex Alexis Corp.), Affymetrix, BMS, Johnson&Johnson; in addition, he has a patent WO2019057780 "Anti-human migration stimulating factor (MSF) and uses thereof" issued, a patent WO2019081591 "NK or T cells and uses thereof" issued, a patent WO2020127471 "Use of SAP for the treatment of Euromycetes fungi infections Medimmune and Prosceinto; and has received research grants from Abbvie, Gilead Sciences and Intercept.Jan Cato Holter received a philanthropic donation from Vivaldi Invest A/S owned by Jon Stephenson von Tetzchner during the conduct of this study. Christoph Spinner reports grants, personal fees, and nonfinancial support from AbbVie BBraun, grants from Cepheid, personal fees from Formycon, grants, personal fees, and non-financial support from Gilead Sciences; grants and personal fees from Eli Lilly; grants, personal fees, and non-financial support from Janssen-Cilag; grants, personal fees, and non-financial support from GSK/ViiV Healthcare; grants Anja Tanck and Xiaoli Yi for performing DNA extraction and genotyping at the Institute of Genotyping of the German control dataset was performed at the Regeneron Genetics Center Genotyping of the COMPRI study was performed by the Genotyping laboratory of Institute for Molecular Medicine Finland FIMM Technology Centre, University of Helsinki. We are very grateful to Professor Akiko Iwasaki and Eric Song for generously sharing pre-processed singlecell RNA-seq data generated from SARS-CoV-2 David Ellinghaus was supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the Computational Life Sciences funding concept (CompLS grant 031L0165). David Ellinghaus, Karina Banasik and Søren Brunak acknowledge the Novo Nordisk Foundation (grant NNF14CC0001 and NNF17OC0027594). Tobias L. Lenz, Ana Teles and Onur Özer were funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), project numbers 279645989; 433116033; 437857095. Mareike Wendorff and Hesham ElAbd are supported by the German Research Foundation (DFG) through the Research Training Group 1743 Luca Valenti received funding from: Ricerca Finalizzata Ministero della Salute RF-2016-02364358, Italian Ministry of Health Programme Horizon 2020 (under grant agreement No. 777377) for the project LITMUS-and for the project Liver-BIBLE"" (PR-0391), Fondazione IRCCS Ca' Granda This study makes use of data generated by the GCAT-Genomes for Life. Cohort study of the Genomes of Catalonia, Fundacio IGTP. IGTP is part of the CERCA Program / Generalitat de Catalunya. GCAT is supported by Acción de Dinamización del ISCIII-MINECO and the Ministry of Health of the Generalitat of Catalunya (ADE 10/00026); the Agència de Gestió d'Ajuts Universitaris i de Marta Marquié received research funding from ant PI19/00335 Acción Estratégica en Salud, integrated in the Spanish National RDI Plan and financed by ISCIII-Subdirección General de Evaluación and the Fondo Europeo de Desarrollo Regional (FEDER-Una manera de hacer Europa"). Beatriz Cortes is supported by national grants PI18/01512. Xavier Farre is supported by VEIS project A way to build Europe"). Additional data included in this study was obtained in part by the COVICAT Study Group Antonio Julià was also supported the by national grant PI17/00019 from the Acción Estratégica en Salud (ISCIII) and the FEDER. The Basque Biobank is a hospitalrelated platform that also involves all Osakidetza health centres, the Basque government's Department of Health and Onkologikoa, is operated by the Basque Foundation for Health Innovation and Research-BIOEF. Mario Cacéres received Grants BFU2016-77244-R and PID2019-107836RB-I00 funded by the Agencia Estatal de Investigación (AEI, Spain) and the European Regional Development Fund (FEDER, EU) Genotyping was performed by the Genotyping laboratory of Institute for Molecular Medicine Finland FIMM Technology Centre, University of Helsinki. This work was supported by grants of the Rolf M. Schwiete Stiftung, the Saarland University, BMBF and The States of Saarland and Lower Saxony. Kerstin U. Ludwig is supported by the German Research Foundation (DFG, LU-1944/3-1). Genotyping for the BoSCO study is funded by the Institute of Human Genetics, University Hospital Bonn. Frank Hanses was supported by the Bavarian State Ministry for Science and Arts. Part of the genotyping was supported by a grant to Alfredo Ramirez from the German Federal Ministry of Education and Research (BMBF, grant: 01ED1619A, European Alzheimer DNA BioBank, EADB) within the context of the EU Joint Programme -Neurodegenerative Disease Research (JPND) Population risk factors for severe disease and mortality in COVID-19: A global systematic review and meta-analysis Clinical presentations, laboratory and radiological findings, and treatments for 11,028 COVID-19 patients: a systematic review and meta-analysis Male sex identified by global COVID-19 meta-analysis as a risk factor for death and ITU admission Prevalence and severity of corona virus disease 2019 (COVID-19): A systematic review and meta-analysis Genomewide association study of severe Covid-19 with respiratory failure Mapping the human genetic architecture of COVID-19 by worldwide meta-analysis. medRxiv Genetic mechanisms of critical illness in COVID-19 Trans-ancestry analysis reveals genetic and nongenetic associations with COVID-19 susceptibility and severity Blood group type A secretors are associated with a higher risk of COVID-19 cardiovascular disease complications Possible role of HLA class-I genotype in SARS-CoV-2 infection and progression: A pilot study in a cohort of Covid-19 Spanish patients HLA allele frequencies and susceptibility to COVID-19 in a group of 99 Italian patients HLA studies in the context of coronavirus outbreaks A nomenclature system for the tree of human Y-Chromosomal binary haplogroups HIV peptidome-wide association study reveals patient-specific epitope repertoires associated with HIV control Efficiently controlling for case-control imbalance and sample relatedness in largescale genetic association studies METAL: Fast and efficient meta-analysis of genomewide association scans Model-based assessment of replicability for genome-wide association metaanalysis FINEMAP: Efficient variable selection using summary data from genome-wide association studies A common inversion under selection in Europeans Genotype imputation with thousands of genomes Fast and accurate P-value Imputation for genome-wide association study The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science (80-. ) COVID-19 tissue atlases reveal SARS-CoV-2 pathology and cellular targets Genetic Adaptation and Neandertal Admixture Shaped the Immune System of 26 A molecular single-cell lung atlas of lethal COVID-19 Lung transcriptome of a COVID-19 patient and systems biology predictions suggest impaired surfactant production which may be druggable by surfactant therapy The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations Association of a large inversion polymorphism with corticosteroid response in asthma Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease Common inversion polymorphism at 17q21.31 affects expression of multiple genes in tissue-specific manner Determining the impact of uncharacterized inversions in the human genome by droplet digital PCR TNF-α regulates diabetic macrophage function through the histone acetyltransferase MOF A CRHR1 antagonist prevents synaptic loss and memory deficits in a traumainduced delirium-like syndrome Formin-like 1 mediates effector t cell trafficking to inflammatory sites to 38 Virus pathogen Database and Analysis Resource (ViPR): A comprehensive bioinformatics Database and Analysis Resource for the Coronavirus research community The role of MAPT sequence variation in mechanisms of disease susceptibility MHC haplotyping of SARS-CoV-2 patients: HLA subtypes are not associated with the presence and severity of Covid-19 in the Israeli population How and why chromosome inversions evolve Eco-Evolutionary Genomics of Chromosomal Inversions A global reference for human genetic variation Conducting meta-analyses in R with the metafor Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program Next-generation genotype imputation service and methods Uncovering the total heritability explained by all true susceptibility variants in a genome-wide association study A cellular census of human lungs identifies novel cell states in health and in asthma Massively parallel single-nucleus RNA-seq with DroNc-seq SARS-CoV-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes The International Human Epigenome Consortium: A Blueprint for Scientific Collaboration and Discovery Neuroinvasion of SARS-CoV-2 in human and mouse brain MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data A complete tool set for molecular QTL discovery and analysis Construction and benchmarking of a multi-ethnic reference panel for the imputation of HLA class I and II alleles Trans-ethnic analysis of the human leukocyte antigen region for ulcerative We would like to thank the COVID-19 Human Genetics Initiative (HGI) for collecting and openly sharing extensive summary statistics with the community.