key: cord-0885779-zxlr9y73 authors: Niemi, Mari E. K.; Daly, Mark J.; Ganna, Andrea title: The human genetic epidemiology of COVID-19 date: 2022-05-02 journal: Nat Rev Genet DOI: 10.1038/s41576-022-00478-5 sha: a22d474ee1d13b2303eefb5dcc920073c871edd3 doc_id: 885779 cord_uid: zxlr9y73 Human genetics can inform the biology and epidemiology of coronavirus disease 2019 (COVID-19) by pinpointing causal mechanisms that explain why some individuals become more severely affected by the disease upon infection by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus. Large-scale genetic association studies, encompassing both rare and common genetic variants, have used different study designs and multiple disease phenotype definitions to identify several genomic regions associated with COVID-19. Along with a multitude of follow-up studies, these findings have increased our understanding of disease aetiology and provided routes for management of COVID-19. Important emergent opportunities include the clinical translatability of genetic risk prediction, the repurposing of existing drugs, exploration of variable host effects of different viral strains, study of inter-individual variability in vaccination response and understanding the long-term consequences of SARS-CoV-2 infection. Beyond the current pandemic, these transferrable opportunities are likely to affect the study of many infectious diseases. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus emerged at the end of 2019 and spread rapidly across the world, with the WHO announcing a global pandemic on 11 March 2020. This new betacoronavirus had not been seen before, but it is related to the severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) coronaviruses 1 . We now know that SARS-CoV-2 uses the human ACE2 receptor for viral entry 2 , initially infecting and replicating in epithelial cells in the nasopharynx and subsequently gaining access to the distal alveolar space 3, 4 . The virus is recognized by immune cells through pattern-recognition receptors, prominently by members of the Toll-like receptor group such as TLR3 and TLR7, which promote the synthesis of type I interferons [5] [6] [7] , and by cytoplasmic RNA sensors retinoic acid-inducible gene I (RIGI; also known as DDX58) and interferon-induced helicase C domain-containing protein 1 (IFIH1; also known as MDA5) inducing type I/III interferon responses 8, 9 . Secreted type I interferons signal via interferon receptors (IFNARs) to switch on Janus kinase 1 (JAK1) and tyrosine kinase 2 (TYK2) and, consequently, promote the expression of interferonstimulated genes such as oligoadenylate synthetase 1 (OAS1), OAS2 and OAS3 (ref. 10 ). Severe forms of coronavirus disease 2019 (COVID-19) involve a dysregulation of the immune response that results in insufficient or delayed type I interferon response 11, 12 . Eventually, sustained hyperinflammation results in increased immune infiltration in the lungs, reduction in alveolar lacunar space, cell death by apoptosis and lung fibrosis 13, 14 . COVID-19 manifests with a wide range of symptoms and degrees of severity. Although most cases are now known to be asymptomatic or mild, some patients develop a severe form of the disease that results in acute respiratory distress syndrome and consequent multi-organ complications 15, 16 . Disease severity is correlated with several risk characteristics including older age, being of male sex and smoking, various clinical comorbidities such as being obese or immunocompromised 17 and clinical biomarkers such as autoantibodies to type I interferons, cytokines and inflammation markers 18 . In the early days of the pandemic, it was already noted that these clinical factors did not fully explain the variability in COVID-19 disease severity between individuals, and severe cases were observed among young individuals without apparent previous pre-conditions, sometimes clustering in families 19 , suggesting a role for human genetics as a risk factor. Finding host genetic factors for infection susceptibility and disease severity is important, because it leads to better understanding of the viral infection, the pathophysiological changes that occur owing to disease and to the discovery of potential drug targets. It can also shed light on the causal relationships between risk factors, biomarkers and disease outcomes, and can inform prevention strategies. Well-known examples of successful human genetic studies of infectious diseases include identification of the CCR5Δ32 mutation for protection The human genetic epidemiology of against HIV infection 20, 21 , and the protection against Plasmodium falciparum infection (malaria disease) in individuals who are heterozygous carriers of a sickle cell allele of the haemoglobin-β (HBB) gene [22] [23] [24] . We refer to the Review by Kwok et al. 25 for a broader overview of human genetic influences on infectious diseases. Compared with other common complex diseases, studying the human genetics of infectious disease poses additional challenges including uneven exposure to the virus within a population, the differential treatment of patients with severe disease under a pandemic emergency and the implementation and uptake of vaccination programmes. Nonetheless, the existing worldwide expertise in generation and analysis of human genetic data has allowed for rapid large-scale studies in host genetics of COVID-19. In this Review we provide an overview of current study designs enabling discovery of human genetic variation associated with COVID-19, with a focus on large-scale population-based association studies, the genetic discoveries made so far and what we have learnt in terms of biology and public health impact. Finally, we provide some of the key challenges ahead for the field in this moving pandemic and beyond. Many types of study have contributed to host genetic investigations for COVID-19 during the pandemic. Clinical studies. Clinical studies collect deep and disease-relevant phenotypic information and typically focus on patients with severe COVID-19 (refs 26-29 ). Most are of small to medium size with up to a few thousand patients and were initiated after the emergence of SARS-CoV-2 specifically to study COVID-19. However, one of the largest clinical studies, GenOMICC/ISARIC 28 , predated the pandemic by already studying the genetics of critical illness due to infection. These researchers were able to rapidly harness existing clinical study and recruitment frameworks for the study of COVID-19. Clinical studies are well positioned to study disease severity, once appropriate controls are also collected and can be used to investigate how genetic risk factors affect a patient's clinical trajectories after infection. To investigate the genetic bases of COVID-19, these studies generally invest in whole-exome sequencing (WES) and/or whole-genome sequencing (WGS) data generation and analysis. Biobank and cohort studies. Existing biobank and cohort studies can be used to study COVID-19 given a large enough sample size and sufficient infection rate within the population. These studies typically identify COVID-19-positive cases through linkage with electronic health records or questionnaires. Individuals who are not COVID-19 positive or who tested negative can be used as controls. These studies can provide a more representative sample of patients with COVID-19 than clinical studies, although participants enrolling in biobank and cohort studies are often not fully representative of the general population. For some of the established epidemiological cohorts, participants have been extensively recontacted for the collection of longitudinal information about COVID-19 symptoms 30 . With few exceptions (for example, the UK Biobank and DiscovEHR collaboration 31 ) most of these studies use genotyping microarrays and are not well suited to study variants with population frequency below 0.1%. Direct-toconsumer genetic companies have engaged in COVID-19 research to an unprecedented extent. For example, 23andMe 32 and AncestryDNA 33, 34 , two of the largest companies in this space, have designed surveys allowing collection of detailed self-reported information. Given the large number of customers, these companies were well powered to identify new common genetic variants associated with various COVID-19 phenotypes, including vaccination side effects 35 and specific COVID-19 symptoms 36 . The disadvantage of such studies is that COVID-19-positive status was self-reported and severe cases are under-represented, although SARS-CoV-2 PCR test result and hospitalization from COVID are presumed to be quite reliably self-reportable. Most of the host genetic studies for COVID-19 have focused on identifying variation in the genome that is associated with susceptibility to infection, disease severity and disease-related symptoms. is typically defined as being COVID-19 positive given exposure to the virus. This is the most challenging phenotype to collect because viral exposure is difficult to trace. Roberts and colleagues from the AncestryDNA Science Team 37 have best attempted to capture susceptibility by comparing COVID-19 negative and positive individuals who had a housemate with a confirmed COVID-19 diagnosis. The COVID-19 Host Genetics Initiative (HGI) 38 used a simpler approach, comparing individuals who are COVID-19 positive versus population controls and named this phenotype 'reported SARS-CoV-2 infection' . Despite the suboptimal choice of the control group, probably including controls who had not been exposed to the virus, the results overlapped with those from AncestryDNA. Disease severity and progression. Disease severity is often captured by comparing individuals who are COVID-19 positive who have been hospitalized or who have been admitted to an intensive care unit (ICU) with those who have less severe disease or are asymptomatic but still positive for the virus. Hospitalization, admission to an ICU and requirement for respiratory support represent ad hoc definitions of severity that are robust enough to be captured across studies with heterogeneous designs. The COVID-19 HGI 38 and the GenOMICC/ISARIC study 28 , in their main analyses, used population controls instead of individuals who are COVID-19 positive with non-severe disease. This can result in case misclassification because some controls might turn out to be cases if exposed to the virus. Nonetheless, this approach is more powerful than using individuals who are COVID-19 positive with non-severe disease as controls because of the large availability of population controls, especially www.nature.com/nrg 0123456789();: within biobank studies 38 . In support of the usefulness of population controls, the results have shown to be robust once a more appropriate control definition is used 38 . Disease-related symptoms. Some genetic studies have focused on a single symptom (for example, loss of taste and smell 36 ) or on a combination of symptoms that can be used to detect undiagnosed COVID-19 cases 39 . Such study designs were particularly valuable in the absence of widespread testing, as at the beginning of the pandemic. Complexity in the phenotype definitions. In addition to some of the limitations described above, there are several layers of complexity when studying infectious diseases such as COVID-19 ( fig. 1 ). First of all, although SARS-CoV-2 has spread rapidly, not all individuals in any population have been exposed at the time of study recruitment. Furthermore, this level of exposure is clearly time dependent throughout the pandemic. There are also large differences in socio-economic and demographic factors that contribute to viral exposure, such as ethnicity, job and age. When the whole population has not yet been exposed to the virus, those identified as cases or controls are not a random sample owing to the selection biases currently present in the population in question 40 . Ongoing vaccination programmes are also shifting the rates and demographics of infection, and there are large differences in epidemic management and inequalities between vaccination programmes across countries. The severity of the disease, as captured by hospitalization or ICU admission, is also dependent on the health practice in different countries, which might have also varied in different phases of the pandemic. Finally, different viral strains can affect infection susceptibility and COVID-19 disease severity. Host genetics can influence all of these stages from the socio-economic factors contributing to the chance of exposure, through infection and the development of initial symptoms, to progression to severe disease. Genetic association studies can identify genomic regions linked to infection susceptibility and disease, but these studies are also susceptible to various biases that may arise during sample collection, data generation and processing. Furthermore, such findings require additional analyses and functional follow-up to pinpoint the specific variants and genes that directly affect the observed phenotypes. We next discuss the current findings primarily from the largest genetic studies for SARS-CoV-2 infection and COVID-19 disease. , and the decreasing cylinder sizes represent that only a subset of individuals at each stage progress to more-advanced disease states. The true stages of the disease do not always correspond to what is captured in most COVID-19 studies. For example, many asymptomatic individuals are not captured. Thus, the dashed ellipses represent 'checkpoints' that one needs to cross to be identified with a certain COVID-19-related phenotype and be included in most COVID-19 studies. Environmental and external factors (shown above the cylinders) influence not only the checkpoints but also the underlying chance and speed of transition between various stages of the disease. Each factor can influence various stages of disease progression, and some (for example, socio-demographic factors) affect each step in the progression from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) exposure to death. On the bottom, we represent the impact of the host genome. The host genome affects each phase of disease progression either by acting directly on infection susceptibility and disease severity or via environmental factors. In Table 1 we summarize the key evidence for some of the most robust and interpretable associations and report our confidence for the suspected causal gene. There is an extensive literature on rare variants that cause inborn errors of innate immunity that can result in severe, idiosyncratic outcomes from common infectious diseases. We refer readers to the work by Casanova and Abel 41 for further details on the topic. These rare variants have been typically discovered by studying small family pedigrees and individuals with extreme phenotypic manifestations. By contrast, well-powered population-based WES and WGS studies have been lacking, and more widely available genotyping microarray data are not as useful for this purpose (for further information, see the sections covering common variants), as such variants can be extremely rare and specific to individual families. Sequence data have the advantage of capturing variants that have usually occurred in relatively recent generations or de novo and may have large effects on a disease outcome. Typically, variants with large effects remain at low frequency in the population or are purged out owing to selective pressure. Rare non-synonymous coding variants are of particular interest because they can easily point out the causal gene and, thus, reveal potential for therapeutic targets. Van der Made and colleagues 42 published one of the first studies on rare variants in the context of COVID-19 severity. They searched for rare non-synonymous and possibly damaging variants in a group of genes with known associations with immunodeficiencies. Their analyses on data from two families with affected males (brother pairs) pointed to X chromosome variants in the TLR7 gene, which is involved in the pathogen recognition pathway and innate and adaptive immunity. This finding has been replicated by Fallerini et al. 7 in 561 individuals and Asano et al. 5 in a larger sample of 1,533 individuals. Both studies and a further follow-up 44 by comparing exome sequence data from 659 patients with life-threatening COVID-19, including children, with data from 534 individuals with mild or asymptomatic COVID-19. They focused on 13 candidate genes previously associated with monogenic immunological disorders or that are involved in these pathways and concluded that at least 3.5% of patients with life-threatening COVID-19 pneumonia had genetic defects in some of these genes implicated in the type I interferon pathway. Because the aforementioned studies focus on candidate genes instead of using a hypothesis-free genomewide approach that requires a larger sample size and a more stringent significance threshold, the results need to be carefully scrutinized and replicated. Only the TLR7 association reached exome-wide significance in unpublished work by the COVID-19 HGI WES/WGS working group, which now includes up to 23,000 cases and 500,000 controls (G. Butler-Laporte, personal communication). A smaller study of 7,491 patients who were critically ill and 48,400 controls did not identify any significant rare variant associations 45 . This and two other studies 31, 46 have not been able to replicate the rare variant associations with the 13 immune genes reported by Zhang et al. 44 , despite substantially larger sample sizes. These differences may be partially due to different definitions of COVID-19 severity, age distribution and in silico versus experimental validation of non-synonymous variants 47, 48 . The power of rare variant discovery in COVID-19 will be improved by increased sample sizes of WES and WGS data sets, which in time may provide definitively conclusive associations. To summarize, TLR7 is currently the only gene uniformly replicated for association of rare non-synonymous variants with severe COVID-19, although it is expected that more findings will be confirmed as studies increase in power. Although rare variant studies for COVID-19 are still in their nascent phase, there is now robust, replicated evidence for multiple loci harbouring common variants associated with infection susceptibility and disease severity. These studies have mainly used microarray-based genotyping technology, which is scalable and cost-effective. Genotyping microarrays are designed to capture the more common variation across the genome using a sparse number of genetic markers in coding and non-coding regions, followed by statistical imputation of the remaining known sites of genetic variation, both common and rare. Genome-wide association studies (GWAS) using genotype data are powerful for capturing associations for variants with population frequency >0.1% that typically have mild to moderate effects on the phenotype 49, 50 . Owing to the relatively quick and cheap generation of genotype data, GWAS have proved an important starting point for distinguishing between the genetic variants that affect susceptibility to SARS-CoV-2 infection and those increasing the risk of developing a severe form of COVID-19 disease once infected. We have previously mentioned how current genetic studies can only imprecisely capture susceptibility to SARS-CoV-2 infection. Nonetheless, well-powered analyses clearly point to a group of loci that are associated with COVID-19 disease, but are not specific to disease severity. The COVID-19 HGI has recently formalized this observation, by developing a Bayesian framework to assign posterior probability for a variant to belong to either disease severity or susceptibility to infection 51 . Briefly, by contrasting effect sizes in severe COVID-19 with those seen in COVID-19 populations with severe cases removed, one can analytically distinguish those variants involved in susceptibility to infection (equal in the two groups when compared with controls) and those specifically involved in severe progressions that manifest uniquely or much more substantially in the severe group. The strongest signal within the susceptibility group of loci is the ABO (histo-blood group ABO system transferase) gene, which was initially identified by the Severe Covid-19 GWAS group 26 . The ABO alleles determine an individual's blood group by enzymatically catalysing the production of A and B antigens in human cells. There is now robust evidence that ABO is associated with susceptibility to SARS-CoV-2 infection, with both Shelton et al. 32 and HGI 38 reporting similar effect sizes for the infection susceptibility and disease severity phenotypes. The data suggest that individuals with O blood group, who have neither A nor B antigens, are protected against the viral infection (odds ratio (OR) ≈0.90). This result is consistent with several observational studies that found that blood group A was associated with infection susceptibility 52 . The exact mechanism is, however, unclear. It has been suggested that this association can be attributed to protective effects exerted by anti-A IgG antibodies and not the blood group itself 53 . Others have shown that the ABO variant associates with higher levels of CD209 protein, which has been shown to directly interact with the spike protein of SARS-CoV-2 (ref. 54 ). Nonetheless, the association between ABO and susceptibility to infection adds to an extensive list of evidence linking blood type with infectious diseases 55 , including the recent observation by Shelton et al. 32 that blood group O appeared to be a risk-increasing factor for influenza symptoms in the years before the COVID-19 pandemic. A second infection susceptibility locus is ACE2, which is worth mentioning because the gene encodes a key protein involved in the viral entry pathway of SARS viruses [2] [3] [4] . GWAS by Horowitz et al. 56 and COVID-19 HGI 51 point to a protective variant (rs190509934) 60 bp upstream of the ACE2 gene. This variant, which is rare among individuals of European ancestry (0.2% in the Genome Aggregation Database (gnomAD)), but more common in South Asians (2.7%) was associated with a 39% reduction in ACE2 expression in liver tissues. A third infection susceptibility signal lies in the 3p21.31 locus and it is independent of the largest signal for severe COVID-19 disease, which is also in the same region ( fig. 2 ). This rather surprising proximity has caused this signal for susceptibility to be overlooked in some studies. Roberts et al. 34 were the first to highlight the presence of a susceptibility signal in 3p21.31, and later the COVID-19 HGI 38 has shown that there are several independent signals (r 2 ≈ 0) associated with SARS-CoV-2 infection susceptibility, all located within the gene body of SLC6A20, which encodes an amino acid transporter protein that is known to functionally interact with the SARS-CoV-2 receptor ACE2 (ref. 57 ). We discuss some of the functional work that has been done to decipher this locus in more detail in box 1. In addition to the three loci highlighted above, there are additional loci that can be linked to SARS-CoV-2 infection susceptibility and for which we describe the potential causal genes in Table 1 . have identified more loci than GWAS of infection susceptibility. The largest signal in the 3p21.31 locus was described by the Severe Covid-19 GWAS Group only 3 months after the declaration of the pandemic and posted as a preprint in June 2020 (ref. 26 ). Given the relevance of this locus to severe COVID-19 disease we provide more detailed insights in box 1 and show the associations graphically in fig. 2 . The next leap in the discovery of new severityassociated loci came by the combined effort of the GenOMICC/ISARIC 28 and COVID-19 HGI studies 38, 58 . The GenOMICC/ISARIC study 28 included just over 2,000 patients who were critically ill with COVID-19 from ICUs across the UK, and their strategy of enrichment for very severe cases resulted in improved power and discovery of eight new loci. The COVID-19 HGI 38 provided replication for the GenOMICC/ISARIC study, and independent results were released online. The main findings from these severity analyses directly indicate that both genes involved in immune response and others involved in lung disease pathology are central to severe COVID-19 progression. First, we highlight three instances in which genes modulating the immune response to viral infection are plausibly implicated. TYK2 has been extensively explored in the human genetic literature owing to its relevance as a potential therapeutic target for auto immune diseases and cancer. Individuals with complete loss of TYK2 function present with immunodeficiencies 59,60 , whereas individuals heterozygous or homozygous for low-frequency hypomorphic variants that cause lowered TYK2 signalling (via decreased phosphorylated STAT) have a more complex presentation. Although these individuals do not seem to be impacted in health measures or mortality in a large cohort study and are in fact protected from common autoimmune diseases 61 , they are susceptible to tuberculosis infection owing to impaired immune signalling [62] [63] [64] . By current understanding, TYK2 is involved in balancing the cytokine response and is therefore an interesting target for drug development. Of note, the missense variant (rs34536443:G>C or p.Pro1104Ala) previously associated with protection from certain autoimmune diseases, increases the risk for severe COVID-19. The second locus points to IFNAR2 (IFNα and IFNβ receptor subunit 2/3), which has been replicated in multiple studies 28, 38, 56 and proposed as a druggable target through Mendelian randomization (MR) studies 65 . However, we note the close proximity between IFNAR2, IL10RB and IFNAR1, and it is not yet fully established that IFNAR2 is the only relevant gene in this locus. Patients disease severity across every genome-wide association study (Gwas). the 3p21.31 locus contains many potential gene targets for severe COviD-19 risk that have plausible biology, albeit some are better characterized than others. the most recent evidence from multi-omic analyses 129 indicated LZTFL1 as the candidate for the association with severe disease. the authors showed that a lead variant from the early studies of respiratory failure due to COviD-19 (refs 26,28 ) rs17713054 is a gain-of-function enhancer motif variant that leads to increased expression of LZtFL1 and sLC6a20 (ref. 129 ). However, LZtFL1 is expressed in lung epithelial cells whereas sLC6a20 is not. in the context of COviD-19, the lung epithelium is of interest for understanding mechanisms of severe acute respiratory syndrome coronavirus 2 (sars-Cov-2) infection, and these cells showed signs of activation of an immune response mechanism termed epithelialmesenchymal transition (eMt) 129 . the eMt response potentially acts as an acute pathway to hinder infection efficiency by downregulating known host entry receptors 2, 130 in the respiratory tract and to eventually allow for repair of the affected tissue. increased expression of LZtFL1 is known to downregulate the eMt pathway, potentially explaining the association of the enhancer variant with worse outcome and indicating the relevant cell type for the effect 129 . as mentioned in the main text, sLC6a20 also has plausible involvement in disease susceptibility owing to its functional interaction with the sars-Cov-2 receptor, aCe2 (ref. 57 ). additionally, the 3p21.31 locus harbours several important chemokine receptor genes: CCR9, CXCR6 and XCR1. In particular, CXCR6 recruits CD8-resident memory T cells in the respiratory tract to combat respiratory pathogens 131 . the involvement of CXCr6 (and also CCr9) has been supported by transcriptome-wide association analysis 28 Describes an allele that confers reduced signalling or expression of the gene product. (Mr). a statistical method that can use genetic variants as instruments to study the causal relationship between risk factors and a disease outcome. with severe COVID-19 show evidence of a dysregulated type I interferon response to the SARS-CoV-2 virus [66] [67] [68] , and drugs inducing the interferon pathway in the early stages of infection have also been shown to be beneficial 69 . This could imply that the timing of either stimulation or down-regulation of the interferon pathway during the course of infection could affect the outcome in patients 66 . The third locus overlaps the OAS gene cluster, which encodes proteins involved in viral clearance. Several lines of evidence point to OAS1 as the causal gene 4,70,71 : genetically predicted higher levels of circulating OAS1 are protective against severe COVID-19 (ref. 4 ), and the causal haplotype is associated with decreased nonsensemediated decay of OAS1 transcripts, and thereby potentially faster initial responses to viral infections and viral clearance 71 . Through a detailed functional study, Wickenhagen and colleagues 72 showed that SARS-CoV-2 was inhibited by the action of OAS1 interacting with several regions of the SARS-CoV-2 genome, with the most prominent sites mapping to the first 54 nucleotides of the 5′ untranslated region, which is present in all SARS-CoV-2 positive-sense viral RNAs. These findings are interesting in the light of COVID-19 treatment, as OAS1-activating drugs already exist. Additionally, a recent targeted fine-mapping study identified a candidate causal splice variant, leading to a more active OAS1 enzyme and downstream antiviral activity 73 . The other major insight gained from the human genetic findings comes from the overlap between genetic signals for COVID-19 severity and lung diseases. This overlap is consistent with the epidemiological evidence associating pre-existing lung conditions with COVID-19 severity 74, 75 and respiratory failure being the major cause of death among hospitalized patients with COVID-19 (ref. 69 ). At least four loci associated with COVID-19 severity have been previously linked to interstitial lung disease, lung fibrosis, lung carcinomas and/or decreased lung function 28, 38 . Genes harboured within these published loci include dipeptidyl peptidase 9 (DPP9), Forkhead box protein P4 (FOXP4), surfactant protein D (SFTPD) and mucin 5B (MUC5B) 51 . The lead variant at the MUC5B locus (rs35705950-T) is associated with increased MUC5B expression in lung tissue 76 , which has been associated with muco-ciliary dysfunction and increased bleomycin-induced fibrosis in mice 77 . This specific variant is protective against severe COVID-19 but is the strongest known association for substantially increased risk of idiopathic pulmonary fibrosis (IPF) 76 . This opposite direction of effect is intriguing given the concordant direction observed for two other genome-wide significant loci and the overall positive genetic correlation between IPF and COVID-19 (ref. 78 ). Nonetheless, this result is also consistent with the MUC5B promoter variant being associated with twofold improved survival among patients with IPF 79 . For FOXP4, a promoter region signal is associated with increased COVID-19 severity 38, 51 and is also associated with increased expression of FOXP4. This specific variant is infrequent in samples with European ancestry and much more common in East and South Asia and in admixed Hispanic-Latino samples of the Americas 80 , underscoring the importance of taking a global approach for more comprehensive and equitable gene discovery. Importantly, this same association has been previously noted in lung cancer 81, 82 and in interstitial lung diseases 83 -all in a concordant direction -suggesting another potential therapeutic target. For SFTPD, the missense variant identified by the HGI 51 is consistent with emerging results pointing to the involvement of surfactant proteins in severe COVID-19 risk. Surfactant proteins are secreted by alveolar cells in the lung, and maintain healthy lung function and facilitate pathogen clearance 84 . SFTPD is involved in the immune response pathway and the SFTPD missense variant has been linked to reduced lung function and severe COVID-19 (ref. 85 ). Together with the other findings, these paint an overall picture in which variants in genes involved in upkeep of healthy lung tissue and maintenance of the immune system and its regulation upon viral exposure can affect the course of the disease in an individual. The human leukocyte antigen (HLA) system orchestrates immune regulation, and the largest GWAS of common infections have implicated HLA in 13 of them 86 . Thus, it was thought that this region would have a prominent role in explaining variability in COVID-19 severity and infection susceptibility, yet the region is far from being the strongest signal in GWAS. However, associations for HLA class II have now been detected by GenOMICC/ISARIC 28 and COVID-19 HGI 51 . Additionally, smaller targeted studies that were able to impute the HLA genotypes and thus gain better resolution of the region have also implicated HLA class I genotypes 87, 88 . What is still needed are definitive large-scale studies that properly account for the complexity in linkage disequilibrium (LD) and ancestry differences in the region. Therefore, the lack of HLA associations from some GWAS of COVID-19 severity might partially reflect limitations of the study designs rather than a genuine lack of biological association. The recent availability of multi-ancestry HLA imputation panels 89 and integration with imputation servers might facilitate this much-needed activity. Overall, what has perhaps come as a surprise from GWAS of COVID-19 is how relatively many loci point to plausible biology, compared with other complex traits and considering the challenges in defining a reliable and consistent phenotype during an ongoing pandemic. Nonetheless, these results have been mainly used to confirm existing biological hypotheses and have not yet provided profoundly novel insights into COVID-19 disease, thus highlighting the challenges in rapidly connecting variants to function. The genetic architecture of complex disease is not fixed, and genetics tends to have a larger proportional contribution to disease burden in younger age groups 90 . Given the extreme importance of age as a risk factor for severe COVID-19 (refs 26,91 ), age should be considered in genetic analyses. Some evidence is emerging for age-specific effects at candidate rare variant loci 7, 44 and one common risk locus 29 . Large meta-analyses with access to detailed individual-level data will be needed to better understand the relationship of age and severe disease, particularly for individuals with rare variants. The combination of alleles on a chromosome. Typically the variant in a locus with the strongest statistical association with the trait in a genome-wide association study (gWas). The distinction between lead and causal variant is that the causal variant reflects underlying biological cause, whereas a test for association factors such as allele frequency and linkage disequilibrium (lD) patterns can lead to another variant having a stronger association test statistic. identifying the causal variants requires downstream fine-mapping of gWas results and functional follow-up. (lD). When two alleles from different loci segregate together non-randomly in a population. Physical proximity, recombination rate and evolutionary time shape the patterns of lD across a chromosome. Effect of sex. Male sex is one of the most impactful epidemiological risk factors for hospitalization and severe respiratory syndrome due to COVID-19, but initially large-scale genetic studies did not report sex-specific effects for infection susceptibility or severe disease. However, some reports of sex-specific effects are starting to emerge for loci containing immune-related genes 34, 92 . Moreover, the rare variants in the chromosome X gene TLR7 affect males and are associated with severe COVID-19 outcomes 93 . Overall, genetics is unlikely to explain much of the increased COVID-19 severity among men. The general lack of sex-specific factors is not totally surprising as the genetics of numerous, well-studied immune-mediated diseases that significantly differ in their prevalence between sexes have not demonstrated a significant contribution of sex-specific genetic factors to such differences. Epidemiological studies have shown that people from non-white ethnic backgrounds are more at risk of infection and of severe COVID-19 (refs 17,94,95 ), raising questions about whether human genetics can explain some of these differences. Generally, non-genetic factors are much more relevant than genetic factors in explaining health disparities. However, the scale and diversity of participants in the COVID-19 HGI provide an opportunity to determine whether any of this difference might be explained by genetic variants that are risk factors for COVID-19 having higher frequencies in certain ancestries, and/or genetic variants having similar frequencies, but different magnitude of effects, across ancestries or environments. Heterogeneity of variant effects across populations has been compared in several studies. Shelton et al. 32 showed no significant difference in effect across several genetically defined ancestry groups at the most prominent risk loci, the 3p21.31 and ABO loci. However, with increasing sample sizes and improved representation of non-European ancestry groups, the COVID-19 HGI has recently reported a significantly different effect between ancestry groups for the FOXP4 locus 51 . Apart from this locus, the authors suggest that the observed heterogeneity at the remaining loci is more likely to be due to differences in study inclusion criteria (for example, variable definition of COVID-19 severity owing to different thresholds for testing, hospitalization and patient recruitment). Additionally, a smaller study by Parikh et al. 96 used admixture mapping -a method of gene mapping that uses differential risk by ancestry to identify ancestry-specific effects -and identified two genomic regions associated within local ancestries, suggesting that some ancestry-specific effects might exist. Where the magnitudes of effect at currently established loci seem to be consistent across ancestry groups, lead variants at several loci show substantial frequency differences across populations (see the example of the 3p21.31 locus in box 1). Some of the differences can be explained by negative selection as in the case of TYK2 (ref. 64 ). However, for other loci such as the 3p21.31 locus and the OAS gene cluster in which variants originated from Neanderthal introgression 70, 97 , it is as yet unknown whether the introgression drove selection or whether (as for other loci) the allele frequency differences might simply be consistent with genetic drift. Overall, we do not observe any specific ancestry group with consistently higher or lower frequencies at established COVID-19associated variants. However, in-depth analysis of this issue has not been conducted, and existing analysis reporting that signatures of adaptation might be linked to an ancient epidemic in East Asian populations did not use GWAS-associated loci 98 . Furthermore, as we do not know the exact causal variants for COVID-19 severity and susceptibility, it is difficult to draw conclusions even from accurate comparisons of ancestry-specific effect sizes. Beyond answering some key population genetics questions, more samples from diverse ancestries are needed to build a more comprehensive map of the effects of host genetics and to improve the statistical refinement of functional underpinnings of the loci associated with COVID-19, by, for example, co-localization and fine-mapping. Overall, current evidence does not suggest that human genetics has a major role in explaining differences in COVID-19 severity and infection susceptibility across different ancestry groups. Thus, the most likely explanation is that, like most health disparities, differences observed between ancestry groups are likely to be due to differences in environmental and socio-economic factors that impact an individual's chance of contracting COVID-19 and/or obtaining rapid and effective health-care interventions upon infection. Larger sample sizes in continental ancestry groups other than Europeans will allow further investigation of these questions. Genetics can be used to identify risk factors and biomarkers that correlate with COVID-19 and to support causal relationships with new or established risk factors [99] [100] [101] . For example, large-scale genetic studies can identify shared genetic effects between COVID-19 and other traits. This is typically achieved using genetic correlations 100 . The main advantage of genetic correlations compared with phenotypic correlations is that risk factors and COVID-19 phenotypes do not need to be measured on the same set of individuals. Genetic correlations for genetic liability to SARS-CoV-2 infection or more severe disease have recapitulated most of the established phenotypic (clinical) correlations with severe COVID-19 (for example, increased body mass index (BMI), smoking, diabetes, ischaemic stroke and educational attainment) 28, 38 . However, these results alone need to be interpreted with caution as they are subject to the same set of biases and confounders as standard epidemio logical analyses, with the additional caveat that genetic studies are normally conducted on non-representative populations. Genetic correlations can be combined with MR studies, which aim to identify causal associations between exposures and outcomes 101, 102 . This MR approach can reveal which risk factors might be causal for COVID-19 severity and which might be merely comorbid. For example, the HGI used MR to show that type 2 diabetes (T2D) was not a causal risk factor for severe COVID-19, but instead the association might be mediated by increased BMI. However, the most valuable application of MR studies in the context of COVID-19 is to evaluate the causal relationship with protein products that are targets of currently licensed drugs (drug repurposing) or drugs in clinical development. Specifically, if a putative drug target can be shown to have a causal effect on COVID-19 severity, then there can be more confidence that targeting that protein might be able to modify the disease course. An important consideration when honing in on potential drug targets though, is their potential pleiotropic effects; a drug target with specific downstream effects may be more desirable than modi fying the function of a target that is involved in multiple pathways or biological processes. We note here that although MR analyses can pinpoint interesting candidates for follow-up, various in silico analyses and in vitro and in vivo models have a crucial role in preclinical target identification. MR studies on COVID-19 have now suggested several proteins as potential drug targets, some of which are already targeted by existing drugs. For example, Gaziano et al. 65 found the best potential for druggable COVID-19 targets to be IFNAR2 and ACE2, which are known players in immune response and SARS-CoV-2 entry, respectively. The GenOMICC/ISARIC study 28 also performed MR for an a priori list of candidate genes, which were targets of drugs that at the time had been proposed as potentially effective treatments for COVID-19. Their analysis for causal associations with the risk of developing severe COVID-19 prioritized IFNAR2 and TYK2, which were previously implicated by GWAS. Another GWAS-implicated gene, OAS1, has also been supported by a study from Zhou et al. 4 who investigated the levels of hundreds of circulating proteins in individuals (non-infectious state) and identified a causal relationship between higher plasma OAS1 levels and COVID-19 severity. Perhaps the clearest example of where MR supports clinical findings is the IL-6 receptor (IL-6R). During the early pandemic, IL-6R inhibition was proposed as a potentially effective mechanism for treating severe COVID-19 (refs 103,104 ). Elevated levels of IL-6, which is a known immune-stimulating cytokine, have been regarded as a biomarker of severe COVID-19 in hospitalized patients who have elevated or dysregulated immune responses 15 . An MR analysis by Bovijn et al. 105 found a significant causal relationship between IL-6R genetic variants that resulted in reduced levels of the receptor and improved outcome in patients with COVID-19. Indeed, a recent meta-analysis of 27 randomized trials showed that administration of IL-6 antagonists, compared with usual care or placebo, was associated with lower 28-day all-cause mortality in patients hospitalized for COVID-19 (ref. 106 ), supporting the results of the MR analysis. Some debate on the similarities of the mechanism of action between the naturally occurring variants and the molecular inhibitors exist, as Garbers and Rose-John 107 have suggested that IL-6R inhibitors block both soluble and cell-bound IL-6R, thus eliminating the IL-6 signalling pathway, but functional genetic variants in the IL6R gene might instead affect the proportion of soluble to membrane-bound protein. Nevertheless, as the treatment has been shown to be beneficial, understanding the specific mechanisms of natural versus pharmacological modulation of the protein is likely to be of academic interest but will not affect the introduction of these drugs into clinical use in patients with COVID-19. A polygenic score (PS; also known as polygenic risk score (PRS)) summarizes the measurable individual genetic risk for a chosen trait or disease based on the genotypes at several loci from GWAS. These are con structed typically either from variants in loci that are statistically significantly trait associated or also including variants across loci that did not reach genome-wide statistical significance. At a population level, PS alone or in combination with other risk factors can be used to assign an estimate of risk to each individual 108, 109 . A few studies have now tried to calculate PSs for COVID-19, but these have so far been generally weakly powered, and most variation in the phenotype explained by PS is due to the inclusion of a few of the most significant signals, for example, the 3p21.31 locus 29, 31, 56 . A clinical application for PS of SARS-CoV-2 infection susceptibility or severity is unlikely in the short term. First, in a clinical setting, genetic information is not routinely collected at scale or available for consultation by clinicians. Second, although many risk prediction tools for COVID-19 have been developed [110] [111] [112] , to our knowledge none has been used in clinical practice. Thus, it would be unlikely for a COVID-19 PS to be widely adopted. However, there might be some value for PS in identifying individuals who are at higher risk of developing severe COVID-19 symptoms amongst younger individuals without pre-existing risk factors. A study by Nakanishi et al. 29 showed that in COVID-19-positive individuals younger than 60 years, a single genetic risk factor (the 3p21.31 locus) can be as predictive of death and respiratory failure as some established comorbidities such as T2D. Nonetheless, more research is needed not only to evaluate more powerful PSs, but also to address inherent limitations such as the lack of PS transferability across ancestry groups. Research applications of PS are nonetheless valuable. PS can be used to summarize our current knowledge on the genetic risk factors that underlie infection susceptibility and COVID-19 severity. For example, are individuals at higher genetic risk more likely to develop vaccine breakthrough infection, to experience more severe side effects or to develop post-COVID syndrome? In conclusion, GWAS results can be used to construct PSs that are valuable for research purposes, but are unlikely to have a clinical value in the short term. Genetic association studies have been exceptionally fast in delivering new genetic signals underlying COVID-19 severity and infection susceptibility. On a sobering note, these discoveries have had a limited impact on the management of the COVID-19 pandemic thus far, and it is our hope that the next phase of the pandemic will see Pleiotropic Pertains to pleiotropy, which is when a genetic variant influences more than one phenotype. The effect (trait risk increasing or decreasing) across the phenotypes can be in the same direction or opposing directions. www.nature.com/nrg more application of human genetics results and better functional insights. Here, we provide some perspective on the key opportunities ahead for the field, while taking for granted that increased sample size will fuel new discoveries. As reviewed here, most genetic studies of COVID-19 to date have focused on pinpointing factors that make some individuals more susceptible to SARS-CoV-2 infection and explaining why others develop severe symptoms. However, with ever-expanding understanding of the disease and the data collected, future genetic studies may expand to investigating, at scale, particular symptoms associated with the infection or severe comorbid conditions such as multisystem inflammatory syndromes [113] [114] [115] . Furthermore, some individuals who have contracted COVID-19 experience long-term symptoms that may result in a considerable health burden in the years to come 116, 117 . There is large variability in the symptoms experienced by those affected by post (long) COVID-19 (refs 116,117 ). Human genetics can be helpful in this context because some of the post-COVID-19 symptoms have directly or indirectly been studied by GWAS. For example, one might test the hypothesis that COVID-19 accelerates existing genetic predispositions to some of the symptoms. Together with observational epidemiological analysis, MR can be used as an additional pillar to triangulate evidence of causal relationship between COVID-19 and downstream consequences. Global networks such as the COVID-19 HGI can play a key part in such undertakings because they bring together studies with different designs, including biobank studies with longitudinal medical information pre-and post-infection and direct-to-consumer studies that can capture self-reported symptoms on a large number of individuals. The interaction between host and viral genomes is surprisingly understudied, partially reflecting the lack of interaction between the corresponding scientific communities, but, most importantly, the lack of studies in which both types of information have been collected at scale 118, 119 . A recent report 120 showed that the protective effect of the sickle cell allele of host HBB against severe malaria is not detected in the presence of certain alleles in the parasite's genome. These parasite alleles are particularly common in strains found in Africa, illustrating the importance of host-pathogen interaction analyses for understanding regional disease epidemiology and selective pressures in infectious disease. Variability in symptoms and resulting disease severity have also been observed across SARS-CoV-2 strains 121,122 , but it is not clear whether the underlying host genetic factors are the same. Parikh et al. 96 have conducted an initial study combining viral and human genetic data information, but they did not find significant results from the phylogenetic information constructed from the viral RNA. To overcome the lack of large samples, one might perform targeted studies focusing on genome-wide significant loci or PSs. Additionally, with recent temporal waves of disease dominated by delta and then omicron variants, the time and location of infection could potentially be used to infer a proxy for the likely variant. Rollout of vaccines brings challenges and opportunities to the study of the human genetic epidemiology of COVID-19. On one hand, the different strategies employed by countries can shape the epidemic differently in different parts of the world, inevitably changing the major demographic groups who become infected or severely affected by the disease, and can ultimately challenge the interpretation of genetic discoveries. On the other hand, widespread vaccination opens the possibility to study vaccination side effects and breakthrough infections. Bolze et al. 35 have reported that individuals who carry the HLA-A✳03:01 allele were more likely to experience severe difficulties with daily routine after vaccination. For other more severe and rare side effects, it will be of paramount importance to leverage existing international collaboration to obtain robust and replicable results. Although this pandemic has shown the importance of rapid data sharing, open methodological reporting and academic-commercial partnership science, the sharing of individual-level data is still far from being a reality. Widespread, yet safe, access to individual-level data can foster discoveries and methodological developments beyond what is currently possible with sharing of summary statistics. Yet despite repeated evidence showing that study participants endorse data sharing [123] [124] [125] , legal and data protection challenges have hindered these efforts within and beyond the human genetics community 126 . Consortia such as the COVID-19 HGI 38, 51, 58 have clearly demonstrated the impact of transparent science: despite the challenges of the pandemic, they set common goals early on and prioritized the sharing of resources and data, and the result was one of the largest genetic studies ever performed so far with representation from almost every continent. These types of effort should be considered as a roadmap to future collaborative initiatives. Currently, with the exception of the UK Biobank and a small subset of the HGI initiative (EGAC00001002188), there is no large data set with human genetic and COVID-19 disease information that is accessible to the entire scientific community via established repositories. We hope the next phase of the pandemic will see a shift in the attitude towards sharing of individual-level data. Outlook for COVID-19 host genetics. Continued investigations into host genetic factors that contribute to severe COVID-19 and susceptibility to SARS-CoV-2 viral infection will be essential to maximize the chances of finding new therapeutic avenues to treating the disease, whether it be through drug repurposing or the longer-term endeavour of new drug development. These findings should be integrated with multi-omics results to provide clearer biological insights. As for any other complex disease, genetic risk prediction is likely to add value to clinical risk prediction in a hospital setting for Summary statistics results summarizing the findings from the study population without disclosing individual-level data. Typically for a genome-wide association study (gWas), these results will include all the variants tested, the estimated effect size and direction of effect, and the outcome of a test for statistical significance (for example, P value). summary statistics can be meta-analysed with other studies, or used for other types of analysis, such as causal inference testing or calculation of Pss in a new study population. identification of patients who are more likely to develop further severe symptoms, and thus continued efforts on the identification of risk factors and the development of predictive biomarkers are warranted. Host genetics is not the sole key to cracking the code to successful and effective treatment of COVID-19, but with continuation of open science and partnerships between academic, industry, health-care providers and policy-makers, we will hopefully see large leaps towards that goal in the near future. Published online xx xx xxxx Characteristics of SARS-CoV-2 and COVID-19 SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor Circuits between infected macrophages and T cells in SARS-CoV-2 pneumonia A Neanderthal OAS1 isoform protects individuals of European ancestry against COVID-19 susceptibility and severity X-linked recessive TLR7 deficiency in ~1% of men under 60 years old with life-threatening COVID-19 Innate immune sensing of coronavirus and viral evasion strategies Association of Toll-like receptor 7 variants with life-threatening COVID-19 disease in males: findings from a nested case-control study SARS-CoV-2 sensing by RIG-I and MDA5 links epithelial infection to macrophage inflammation RIG-I triggers a signaling-abortive anti-SARS-CoV-2 defense in human lung cells Interferon-stimulated genes: a complex web of host defenses COVID-19 and the human innate immune system The COVID-19 puzzle: deciphering pathophysiology and phenotypes of a new disease entity The spatial landscape of lung pathology during COVID-19 progression Pathological inflammation in patients with COVID-19: a key role for monocytes and macrophages Clinical features of patients infected with 2019 novel coronavirus in Wuhan Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Factors associated with COVID-19-related death using OpenSAFELY Autoantibodies neutralizing type I IFNs are present in ~4% of uninfected individuals over 70 years old and account for ~20% of COVID-19 deaths Case report: death due to COVID-19 in three brothers Resistance to HIV-1 infection in caucasian individuals bearing mutant alleles of the CCR-5 chemokine receptor gene Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection Protection afforded by sickle-cell trait against subtertian malareal infection Protective effects of the sickle cell gene against malaria morbidity and mortality Sickle cell trait and the risk of Plasmodium falciparum malaria and other childhood diseases Host genetics and infectious disease: new tools, insights and translational opportunities The first GWAS of COVID-19 severity describing the ABO signal and the strongest signal for COVID-19 severity on chromosome 3 COVID Human Genetic Effort. A global effort to define the human genetics of protective immunity to SARS-CoV-2 Infection Genetic mechanisms of critical illness in COVID-19 A large GWAS of COVID-19 focusing on individuals with a critical illness Age-dependent impact of the major common genetic risk factor for COVID-19 on severity and mortality Lifelines COVID-19 cohort: investigating COVID-19 infection and its health and societal impacts in a Dutch population-based cohort Pan-ancestry exome-wide association analyses of COVID-19 outcomes in 586,157 individuals Trans-ancestry analysis reveals genetic and nongenetic associations with COVID-19 susceptibility and severity COVID-19 susceptibility and severity risks in a survey of over 500,000 individuals AncestryDNA COVID-19 host genetic study identifies three novel loci HLA-A*03:01 is associated with increased risk of fever, chills, and stronger side effects from Pfizer-BioNTech COVID-19 vaccination Implicates HLA in the severity of side-effects experienced after vaccination The UGT2A1/UGT2A2 locus is associated with COVID-19-related loss of smell or taste Novel COVID-19 phenotype definitions reveal phenotypically distinct patterns of genetic association and protective effects COVID-19 Host Genetics Initiative Mapping the human genetic architecture of COVID-19 Using symptom-based case predictions to identify host genetic factors that contribute to COVID-19 susceptibility This study describes how collider bias challenges the interpretation of many COVID-19 observational studies Lethal infectious diseases as inborn errors of immunity: toward a synthesis of the germ and genetic theories Presence of genetic variants among young men with severe COVID-19 Rare variants in Toll-like receptor 7 results in functional impairment and downregulation of cytokine-mediated signaling in COVID-19 patients Inborn errors of type I IFN immunity in patients with life-threatening COVID-19 Whole genome sequencing reveals host factors underlying critical Covid-19 Rare loss-of-function variants in type I IFN immunity genes are not associated with severe COVID-19 Association of rare predicted loss-offunction variants of influenza-related type I IFN genes with critical COVID-19 pneumonia Association of rare predicted loss-offunction variants of influenza-related type I IFN genes with critical COVID-19 pneumonia The impact of rare and low-frequency genetic variants in common disease Genome-wide association studies Mapping the human genetic architecture of COVID-19: an update The impact of ABO blood group on COVID-19 infection risk and mortality: a systematic review and meta-analysis COVID-19 and ABO blood group: another viewpoint A proteome-wide genetic investigation identifies several SARS-CoV-2-exploited host targets of clinical relevance The relationship between blood groups and disease Genome-wide analysis provides genetic evidence that ACE2 influences COVID-19 risk and yields risk scores associated with severe disease Human intestine luminal ACE2 and amino acid transporter expression increased by ACE-inhibitors COVID-19 Host Genetics Initiative. The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic Human tyrosine kinase 2 deficiency reveals its requisite roles in multiple cytokine signals involved in innate and acquired immunity Human TYK2 deficiency: mycobacterial and viral infections without hyper-IgE syndrome Resolving TYK2 locus genotype-to-phenotype differences in autoimmunity Tuberculosis and impaired IL-23-dependent IFN-γ immunity in humans homozygous for a common TYK2 missense variant Homozygosity for TYK2 P1104A underlies tuberculosis in about 1% of patients in a www.nature.com/nrg cohort of European ancestry Human ancient DNA analyses reveal the high burden of tuberculosis in Europeans over the last 2,000 years Actionable druggable genome-wide Mendelian randomization identifies repurposing opportunities for COVID-19 The type I interferon response in COVID-19: implications for treatment Dysregulation of type I interferon responses in COVID-19 Impaired local intrinsic immunity to SARS-CoV-2 infection in severe COVID-19 Retrospective multicenter cohort study shows early interferon therapy is associated with favorable clinical responses in COVID-19 A genomic region associated with protection against severe COVID-19 is inherited from Neandertals Genetic regulation of OAS1 nonsense-mediated decay underlies association with risk of severe COVID-19 This study links a prenylated OAS1 haplotype, which is common among humans and also present in horseshoe bats, with COVID severity Multi-ancestry fine mapping implicates OAS1 splicing in risk of severe COVID-19 This paper identifies the causal variant for the OAS1 locus associated with COVID-19 severity Risk factors associated with mortality among patients with COVID-19 in Intensive Care Units in Lombardy Comorbidity and its impact on 1590 patients with COVID-19 in China: a nationwide analysis A common MUC5B promoter polymorphism and pulmonary fibrosis Muc5b overexpression causes mucociliary dysfunction and enhances lung fibrosis in mice Shared genetic etiology between idiopathic pulmonary fibrosis and COVID-19 severity Association between the MUC5B promoter polymorphism and survival in patients with idiopathic pulmonary fibrosis The mutational constraint spectrum quantified from variation in 141,456 humans Meta-analysis of genome-wide association studies identifies multiple lung cancer susceptibility loci in never-smoking Asian women Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations Genome-wide association study of subclinical interstitial lung disease in MESA Immunoregulatory functions of surfactant proteins Human surfactant protein D binds spike protein and acts as an entry inhibitor of SARS-CoV-2 pseudotyped viral particles Genome-wide association and HLA region fine-mapping studies identify susceptibility loci for multiple common infections Association of HLA class I genotypes with severity of coronavirus disease-19 Current HLA investigations on SARS-CoV-2 and perspectives A high-resolution HLA reference panel capturing global population diversity enables multi-ethnic fine-mapping in HIV host response Findings and insights from the genetic investigation of age of first reported occurrence for complex disorders in the UK Biobank and FinnGen Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study Spanish COalition to Unlock Research on host GEnetics on COVID-19 (SCOURGE). A genome-wide association study of COVID-19 related hospitalization in Spain reveals genetic disparities among sexes COVID Human Genetic Effort Human genetic and immunological determinants of critical COVID-19 pneumonia Hospitalization and mortality among black patients and white patients with Covid-19 Assessing differential impacts of COVID-19 on black communities Deconvoluting complex correlates of COVID19 severity with local ancestry inference and viral phylodynamics: Results of a multiomic pandemic tracking strategy Description of the haplotype structure of the strongest common signal for COVID-19 risk and how this is linked with Neanderthal introgression An ancient viral epidemic involving host coronavirus interacting genes more than 20,000 years ago in East Asia Inferring causal relationships between risk factors and outcomes from genome-wide association study data Genetic correlations of polygenic disease traits: from theory to practice Using genetic data to strengthen causal inference in observational research Mendelian Randomization: Genetic Variants as Instruments for Strengthening Causal Inference in Observational Studies Potential role of anti-interleukin (IL)-6 drugs in the treatment of COVID-19: rationale, clinical evidence and risks Is IL-6 a key cytokine target for therapy in COVID-19? Genetic variants mimicking therapeutic inhibition of IL-6 receptor signaling and risk of COVID-19 WHO Rapid Evidence Appraisal for COVID-19 Therapies (REACT) Working Group. et al. Association between administration of IL-6 antagonists and mortality among patients hospitalized for COVID-19: a meta-analysis Genetic IL-6R variants and therapeutic inhibition of IL-6 receptor signalling in COVID-19 Developing and evaluating polygenic risk prediction models for stratified disease prevention The personal and clinical utility of polygenic risk scores Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal Development and validation of a clinical and genetic model for predicting risk of severe COVID-19 An integrated clinical and genetic model for predicting risk of severe COVID-19: A population-based case-control study Autoimmune and inflammatory diseases following COVID-19 Mechanisms underlying genetic susceptibility to multisystem inflammatory syndrome in children (MIS-C) An outbreak of severe Kawasaki-like disease at the Italian epicentre of the SARS-CoV-2 epidemic: an observational cohort study Burden of post-COVID-19 syndrome and implications for healthcare service planning: A population-based cohort study Prevalence of ongoing symptoms following coronavirus (COVID-19) infection in the UK. Office for National Statistics h tt ps :/ /w ww .o ns .g ov .u k/ pe op le po pu la ti on andc om mu ni ty /h ea lt ha nd so ci al ca re /c on di ti on sa nd di seas es /b ul le ti ns The coronavirus is mutating -does it matter? Viral and host heterogeneity and their effects on the viral life cycle Malaria protection due to sickle haemoglobin depends on parasite genotype Hospital admission and emergency care attendance risk for SARS-CoV-2 delta (B.1.617.2) compared with alpha (B.1.1.7) variants of concern: a cohort study Risk of hospitalisation associated with infection with SARS-CoV-2 lineage B.1.1.7 in Denmark: an observational cohort study Clinical trial participants' views of the risks and benefits of data sharing Patient views on research use of clinical data without consent: Legal, but also acceptable? Public perspectives regarding data-sharing practices in genomics research Remove obstacles to sharing health data with researchers outside of the European Union ELF5 is a respiratory epithelial cell-specific risk gene for severe COVID-19 LocusZoom.js: interactive and embeddable visualization of genetic association study results In silico functional analysis identifies LZTFL1 as the candidate gene beyond the strongest common locus for COVID-19 severity Lung cancer models reveal severe acute respiratory syndrome coronavirus 2-induced epithelial-to-mesenchymal transition contributes to coronavirus disease 2019 pathophysiology CXCR6 regulates localization of tissue-resident memory CD8 T cells to the airways Association of CXCR6 with COVID-19 severity: delineating the host genetic factors in transcriptomic regulation Genome and epigenome editing identify CCR9 and SLC6A20 as target genes at the 3p21.31 locus associated with severe COVID-19 Integrative approach identifies SLC6A20 and CXCR6 as putative causal genes for the COVID-19 GWAS signal in the 3p21.31 locus Mangul and the Saudi Human Genome Program for discussions regarding effect of age in individuals with rare variants. A.G. was supported by the Academy of Finland (grant nos 323116, 340539, 340541) and by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant no. 945733). A.G. has also received funding from the European Union's Horizon The authors thank M. Kanai for providing Fig. 2 and the accompanying figure legend. The authors also thank G. Butler-Laporte, A. Renieri and the COVID-19 Host Genetics Initiative for personal communication regarding TLR7 variants in sequencing studies, and G. Butler-Laporte, M.E.K.N. and A.G. researched data for and wrote the article. All authors substantially contributed to the discussion of content and reviewing/editing the manuscript before submission. M.E.K.N. is an employee of Novartis. M.J.D. and A.G. declare no competing interests. Nature Reviews Genetics thanks Yukinori Okada and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.