key: cord-0960432-itwtu6oi authors: McLaren, Paul J.; Fellay, Jacques title: HIV-1 and human genetic variation date: 2021-06-24 journal: Nat Rev Genet DOI: 10.1038/s41576-021-00378-0 sha: 00c8ab76c22e69b787787842fcb0c5c4474e72ae doc_id: 960432 cord_uid: itwtu6oi Over the past four decades, research on the natural history of HIV infection has described how HIV wreaks havoc on human immunity and causes AIDS. HIV host genomic research, which aims to understand how human genetic variation affects our response to HIV infection, has progressed from early candidate gene studies to recent multi-omic efforts, benefiting from spectacular advances in sequencing technology and data science. In addition to invading cells and co-opting the host machinery for replication, HIV also stably integrates into our own genome. The study of the complex interactions between the human and retroviral genomes has improved our understanding of pathogenic mechanisms and suggested novel preventive and therapeutic approaches against HIV infection. HIV-1 is the human retrovirus responsible for the HIV/AIDS pandemic, which has claimed more than 30 million lives over the past four decades. HIV infection continues to be a major global public health issue, with currently around 40 million people living with HIV (PLWH). Lifelong antiretroviral therapy (ART) has transformed the disease into a manageable chronic health condition. When available, ART enables PLWH to lead long and healthy lives but there is still no effective vaccine and no cure. Early in the pandemic, it became clear that the risk of HIV acquisition is highly variable across humans. Socioeconomic and behavioural factors played a central role in this variability with some risk groups, such as intravenous drug users and men who have sex with men (MSM), being disproportionately affected 1 . Still, even among the most highly exposed individuals, a fraction remained HIV negative 2, 3 . Similarly, important differences in the natural course of HIV infection (such as time from infection to AIDS diagnosis and the occurrence of opportunistic infections or malignancies) were only partially explained by known variables such as age and comorbidities 4 . Taken together, these clinical and epidemiological observations suggested a role for additional factors in the modulation of the individual response to HIV, including inherited variation in the genes and pathways involved in the retroviral life cycle and in innate or adaptative immunity against the infection. HIV enters its main target cell, the CD4 + T lymphocyte, by binding to its receptor CD4 and to the co-receptor CC-chemokine receptor 5 (CCR5) 5 . This binding event triggers the fusion of the viral and human cell membranes, initiating a complex intracellular life cycle that will lead to the production of new viruses (FIg. 1) . The natural immune response against HIV infection relies mostly on CD8 + T cells, also called cytotoxic T lymphocytes (CTLs). Upon primary infection, intense HIV replication results in a very high plasma viral load, measured as copies of the HIV RNA genome per millilitre of plasma, which is then partly controlled by the specific CD8 + T cell response. The very diverse human leucocyte antigen (HLA) class I molecules play a central role in this immune response by presenting small viral fragments, called epitopes, at the surface of infected cells. The recognition of these epitopes by CTL leads to the elimination of HIV-infected cells. A more efficient immune response is linked to a lower viral load during the chronic phase of an untreated infection and to slower disease progression, though it is unable to eliminate the virus 6 . As a retrovirus, HIV can be described as a genomic pathogen. Indeed, it not only uses the molecular machinery of the infected cell for replication and dissemination but it also has the remarkable capacity to integrate a DNA copy of its RNA genome into a host cell chromosome. By becoming part of the human genome, HIV can persist in long-term cellular reservoirs for decades, making it extremely challenging to develop therapeutic strategies resulting in complete eradication 7 . To better fight HIV infection, we must once again consider the old Delphic maxim: 'know thyself ' . Because HIV is an expert at hijacking human cells and immunity, we have no choice but to improve our understanding of our inner machinery, starting with the most fundamental layer of biological information -the human genome. The exploration of human diversity at the DNA level, long hampered by technological limitations, has been fuelled by the development of new and more powerful tools over the past decades 8 . Thanks to progress in our understanding of human genetic diversity, in genotyping and sequencing technology, as well as in bioinformatics and data science, it became possible to search for genetic factors that modulate the individual response to HIV, including the resistance and susceptibility to infection and the natural history of the disease in PLWH 9 . In this Review, we first present an overview of the technological and conceptual developments that have fuelled HIV host genomic research. We then describe the major genetic factors modulating the natural history of HIV infection -in the HLA class I region and the CCR5 locus. Next, we highlight the recent convergence of human and HIV genomics, which allows longitudinal analyses of host-pathogen genetic interactions. Finally, we explain how genomic knowledge is poised to have a positive impact on PLWH, notably through pharmacogenomic interventions and stratification of care based on polygenic risk scores (PRS) before discussing the short-term and long-term perspectives for translational research and clinical applications of human genomics in the field of HIV. The search for human genetic differences that have an impact on HIV-related outcomes was first motivated by clinical observations, namely the striking variability in individual trajectories of patients in the absence of treatment. It was further propelled by a desire to uncover fundamental physiopathological mechanisms by the careful exploration of genomic variants and their impact on host and viral molecular processes. Candidate gene studies. In the candidate gene approach, population-level associations are sought between HIVrelated phenotypes and specific genetic variants in genes that have been selected based on previous biological knowledge or functional work. The selected variants are typically typed using targeted genotyping assays or Sanger sequencing of the region of interest. This framework was first applied to HIV host genetics in the early 1990s in analyses of allelic variants in genes known or suspected to play a role in HIV pathogenesis or in the antiretroviral immune response. Therefore, genetic associations were reported in two broad categories: genes coding for proteins involved in the HIV life cycle (such as PPIA 10 and TLR9 (reF. 13 )) as well as in specific antiretroviral defence mechanisms (such as APOBEC3G 14 and TRIM5 (reF. 15 )). Dozens of genes were tested in multiple cohorts. Unfortunately, as has been the case in the broader field of human genetics, most reported associations turned out to be false positives, notably owing to the small sizes of the studied cohorts, population stratification and the lack of correction for multiple testing. Replication attempts in larger cohorts, where these factors could be better controlled, showed no association for the vast majority of variants [16] [17] [18] . In fact, only two major discoveries remain from the candidate gene era: the protective effect of a homozygous 32 bp deletion in CCR5 (CCR5Δ32) against HIV acquisition [19] [20] [21] Genome-wide association studies. Advances in genotyping and sequencing technologies progressively transformed human genetic analyses during the first decade of this century. In particular, the commercial availability of genome-wide genotyping arrays marked the beginning of the era of genome-wide association studies (GWAS). The principle of a GWAS is to simultaneously test very large numbers of genetic variants throughout the genome for potential associations with a phenotype of interest 23 . This truly agnostic approach finally allowed for a more comprehensive exploration of the human genome. To date, most GWAS have been based on the genotyping of single nucleotide polymorphisms (SNPs) followed by imputation, a process that leverages the linkage disequilibrium property of the human genome to statistically infer the genotypes that are not directly measured. This approach allows nearcomprehensive testing of common variants (that is, variants with a minor allele frequency of >1%) in most human populations 24 . The first GWAS of any infectious disease focused on the level of detectable viral genetic material in the blood of untreated, chronically infected individuals during the period of HIV latency 25 . This phenotype, known as set point viral load (spVL), was selected because of its relative ease of measurement and its known correlation to the rate of progression to AIDS 26 and transmission potential 27 . The spectrum of interrogated variants was limited by early DNA genotyping arrays, yet genome-wide significant associations were identified in the HLA class I region, the most polymorphic locus in the human genome, known to have a crucial role in the modulation of T cell immunity (see 'HLA variation in HIV control' , below). These findings were soon validated and expanded by other GWAS performed in independent cohorts, which demonstrated that the genetic architecture of HIV spVL is comparable between the general population of PLWH 16,28-30 and a particular group of individuals able to maintain low viral loads for prolonged periods of time in the absence of ART, the so-called HIV controllers 17, 31 . The absence of specific genetic factors explaining the HIV controller phenotype was a disappointment in terms of potential therapeutic development. However, it is consistent with what has been found for many complex human traits and diseases -that individuals at the extremes of the phenotypic distribution are more likely to carry multiple common variants with weak effects rather than rare, high-impact variants 32 . Beyond genotyping, a single exome sequencing study has been published so far in the HIV field 33 also indicating that rare coding variants with large effect sizes are unlikely to make a major contribution to host control of HIV infection. GWAS were less successful in the search for determinants of HIV resistance, with no definitive evidence found of human genetic polymorphisms conferring an altered susceptibility to HIV apart from CCR5 variation 18, 34 . However, recent genome sequencing studies of extreme exposure phenotypes 35 have shown promising associations in CD101, a gene encoding an immunoglobulin superfamily member implicated in regulatory T cell function 36 , and in UBE2V1, which encodes a ubiquitin-conjugating enzyme involved in pro-inflammatory cytokine expression 37,38 that associates with the HIV restriction factor TRIM5α 38 . Although both CD101 and UBE2V1 are plausible candidates, further functional studies are required to validate their role in HIV susceptibility. Finally, analyses of GWAS data provide evidence for residual heritability owing to additive genetic effects beyond CCR5 (reF. 18 ) and genetic overlap with behavioural and socioeconomic traits 39 . These results suggest that larger genomic studies of HIV acquisition may identify additional loci that impact susceptibility and warrant further investigation, potentially in large biobanks. Several intrinsic limitations make it difficult to investigate the genetic mechanisms potentially involved in HIV resistance. For example, sample sizes are usually small (in the tens or hundreds) because studies need to be performed on highly exposed yet uninfected individuals such as patients with haemophilia exposed to HIV through contaminated blood products 34 , sex workers in hyper-endemic areas 40 or serodiscordant couples (stable heterosexual couples where one partner has HIV infection and the other is seronegative for HIV at enrolment) 29 . Frailty (or survival) bias is a limitation in cross-sectional studies of HIV cohorts with long-term follow-up, as these cohorts are enriched for genetic factors protecting against HIV disease progression. Another limitation is misclassification bias in studies comparing the genomes of patients with HIV infection to unselected controls from the general population, in which most individuals are in fact susceptible to HIV infection 18 . The identification of additional genetic determinants of individual susceptibility to HIV infection will require increased sample sizes (ideally in the thousands) as well as the use of sequencing approaches to characterize the rare functional variants that are not interrogated in studies based on genotyping arrays. The HLA locus in infectious diseases. The human major histocompatibility complex (MHC) located on chromosome 6 is one of the most genetically diverse loci in the genome 41 . The extended MHC occupies ~7.6 Mb of Human leucocyte antigen (HLA). A protein, encoded by one of a group of HLA genes, that presents antigens that train the adaptive immune response. HLA genes are highly variable and allelic variants encode proteins that are differentially able to present antigens based on the amino acid sequences in the peptide-binding grooves. Parts of an antigen that make contact with a particular antibody or T cell receptor and are thus capable of stimulating an immune response. Polygenic risk scores (PrS) . Statistics that are calculated by enumerating the number of risk alleles associated with a particular phenotype (often weighted by their population-level effect sizes) that are present in a single individual and comparing the individual's score to the distribution of risk scores in the population. Presence of systematic differences in allele frequencies between population subgroups owing to systematic differences in ancestry. The natural disease course of HIV infection in untreated individuals, characterized by an acute phase, a chronic phase and the development of AIDS. The rate of HIV progression varies dramatically in the infected population. The long-term persistence of HIV in an integrated but transcriptionally inactive form in the host genome. Because latent HIV resides in memory T cells, it persists indefinitely even in patients on suppressive antiretroviral therapy. This latent reservoir is a major barrier to curing HIV infection. Nature reviews | Genetics the human genome 42 and encodes more than 400 genes, many of which are key mediators of the innate and adaptive immune responses. Within this locus, alleles at the classical class I (HLA-A, HLA-B, HLA-C) and class II (HLA-DR, HLA-DQ, HLA-DP) genes have been associated with numerous autoimmune, inflammatory and infectious diseases (reviewed in reFS 43, 44 ) with recent preprints demonstrating extensive disease associations in large biobanks from multiple populations 45, 46 . In the context of infectious disease, class I HLA proteins present endogenous peptides on the surface of infected cells for recognition by CTLs, triggering the development of an adaptive response. As discussed below, the variability in epitope specificity of HLA proteins and expression levels of HLA class I alleles has a dramatic impact on the progression of HIV disease. An individual's genotype at class I HLA genes has been consistently demonstrated to be the major host genetic determinant of HIV spVL and rate of disease progression across geographic contexts and ancestries 17,22,47-50 . This observation was put in the genome-wide context by the first GWAS of HIV spVL 25 and HIV controllers 17 that exclusively identified SNPs in strong linkage disequilibrium with classical HLA-B alleles. Although arraybased techniques for the genotyping of DNA samples do not allow for the direct resolution of classical HLA alleles, computational methods leveraging linkage disequilibrium structure between SNPs and sequence-based HLA types in reference populations allow for accurate imputation of classical HLA types from GWAS data 51 . The application of this technique to a sample of >6,000 PLWH of European ancestry underscored the dramatic effect of HLA-B*57:01 on reducing viral load, which was, on average, ~0.8log10 RNA copies/ml lower in individuals carrying this allele 52 . This study also demonstrated strong associations at multiple other classical class I HLA alleles that had a range of effects, from decreasing spVL (B*57:01, B*27:05, B*13:02, B*14:02, C*06:02, C*08:02, C*12:02) to increasing spVL (B*07:02, B*08:01, C*07:01, C*07:02, C*04:01). To better understand how functional variation in HLA class I proteins can impact HIV spVL, recent studies have tested variable amino acid positions within these proteins to fine-map the classical allele associations. In a GWAS performed by the International HIV Controllers study, this technique was applied to demonstrate that previously identified associations between HIV control and classical HLA alleles such as B*57:01 could be explained by variability across a small number of amino acid positions within the HLA-B protein 17 . The strongest effect was observed at position 97 of the protein, which accommodates six alternative amino acids, including valine, which is unique to B*57 haplotypes. A recent preprint describing the comprehensive analysis of the impact of HLA amino acid polymorphisms on spVL in a multi-ethnic sample of >12,000 PLWH identified three amino acid positions in HLA-B (positions 67, 97 and 156) and one in HLA-A (position 77) as independently associating with spVL 53 . The positions within HLA-B map to classical HLA alleles known to impact spVL, whereas the HLA-A position suggests that HLA-A functions independently of HLA-B. Interestingly, all four positions are located in the peptide-binding groove of the respective HLA protein, supporting the hypothesis that epitope presentation is key for the natural suppression of HIV replication. Furthermore, there was no substantial evidence that the effects of these polymorphic positions differed across ancestry groups, suggesting biological relevance across global contexts. Several mechanisms of action have been proposed to explain why different alleles of the same HLA gene have differential effects on HIV progression. Studies of epitope specificity have shown that certain protective alleles, including B*57:01 (which uniquely carries valine at position 97) and B*27:05 (which carries the protective cysteine and asparagine residues at positions 67 and 97, respectively), drive compensatory mutations in the HIV genome leading to reduced viral fitness [54] [55] [56] . In addition to differential epitope specificity, the CTL effector function induced by epitope presentation has been implicated in HIV control, with CTLs in carriers of some protective HLA alleles exhibiting an enhanced proliferative capacity and more polyfunctional responses [57] [58] [59] . In addition to the impact of specific class I HLA alleles on HIV progression at the population level, the within-host diversity of HLA alleles may be important at the individual level. An early study looking at the impact of allele combinations revealed that maximum heterozygosity at HLA class I genes (that is, individuals carrying two different alleles at all three class I genes) was associated with a reduced time to AIDS 47 . This observation was supported by a GWAS that showed that individuals carrying different HLA alleles at each class I gene had a significantly lower viral load than homozygous individuals, even after accounting for the additive effect at each allele 60 . This heterozygote advantage likely comes from the ability to present multiple HIV epitopes, supporting the hypothesis that the breadth of presentation is beneficial in preventing HIV progression. To further test this hypothesis, a recent in silico study used novel algorithms to predict the binding affinity of all possible 9-mer HLA-E interacts with the NKG2A receptor on the surface of natural killer (NK) cells and, when highly expressed, inhibits the killing of infected cells. Underlying genetic basis of a given trait, in terms of variant number, effect size, allele frequency and interactions. A group of people living with HIV whose plasma HIV rNA load is spontaneously maintained at very low levels for several years (usually at least 3-5 years) in the absence of antiretroviral therapy. A host cellular protein that participates in antiviral defence by interfering with specific steps of the viral replication cycle. peptides in the HIV proteome to HLA proteins encoded by the different class I alleles 61 . Coupling these predicted affinities to clinical and genetic data demonstrated that spVL was negatively correlated with the breadth of the peptide repertoire bound by an individual's HLA protein isoforms. Moreover, HLA-B isoforms had the largest predicted breadth of epitope recognition and conferred the strongest reduction of viral load (FIg. 2a) . However, the quantity of epitopes alone is unlikely to fully explain the protective capacity of an individual's HLA alleles, as HLA-C Nature reviews | Genetics subsets of epitopes that are uniquely presented by protective HLA isoforms explained more of the observed variance in spVL than the entire predicted set. This observation is further supported by an in silico and functional study that demonstrated that HIV epitopes that encode structurally important residues are preferentially targeted by protective HLA isoforms and associate with elite control of replication 62 . Thus, the quantity and quality of HIV epitopes presented by combinations of HLA isoforms are the key drivers of spVL. Non-classical effects of HLA variation. In addition to the classical effects of HLA genes on peptide presentation, several studies have suggested that non-classical effects may play a part in limiting HIV replication in vivo. In particular, the variable expression levels of classical HLA-C alleles have been linked to HIV control, with those expressed at high levels conferring protection against disease progression 63 . This effect has been observed across ancestries and has been linked to the absence of a variable microRNA-148a (miR-148a) binding site in the 3′ untranslated region of HLA-C 64 . The proposed model suggests that mRNA from alleles lacking the miR-148a binding site escape suppression by miR-148a; as a consequence, proteins encoded by these alleles and loaded with HIV epitopes are expressed at higher levels on infected cells, allowing for greater rates of detection by CTLs 64 (FIg. 2b) . Similarly, proteins encoded by HLA-A alleles are also expressed at variable levels on the cell surface 65 . However, in contrast to HLA-C, HLA-A alleles expressed at high levels associate with poorer control of viral replication and with faster disease progression 66 . A combination of genetic and functional studies indicated that increased HLA-A expression levels correlated with higher viraemia in a combined cohort of more than 9,000 PLWH from sub-Saharan Africa and the United States. It was proposed that this effect may be the result of enhanced production of the HLA class I signal peptide that regulates HLA-E expression, a hypothesis that was supported by a correlation between HLA-A expression and HLA-E expression among 58 healthy donors tested 66 . HLA-E is a ligand for natural killer group protein 2A (NKG2A) and their interaction results in strong inhibition of natural killer (NK) cell degranulation (FIg. 2c) . Thus, the enhanced production of the HLA class I signal peptide in individuals carrying highly expressing HLA-A alleles may lead to enhanced inhibition of immune responses in infected individuals, resulting in poorer clinical outcomes. Finally, it has also been observed that the combination of HLA genotype and the expression of particular killer cell immunoglobulin-like receptor (KIR) proteins variably modulated HIV disease course 67 . The KIR proteins are a highly variable set of cell-surface receptors expressed on NK cells (and some T cells) that, when engaged by their cognate receptors, either activate or inhibit NK cell-mediated killing (recently reviewed in reF. 68 ). In particular, the combination of the activating KIR3DS1 allele with a set of HLA-B alleles that carry isoleucine in the Bw4 epitope (Bw4-I80) is highly associated with HIV control 69 . Taken together, these results demonstrate the complex interplay between epitope presentation, HLA protein expression and NK inhibition. Perhaps the most highly touted example of human genetic variability restricting infectious diseases is the observation that individuals carrying two copies of a loss-of-function variant in the gene encoding the cell receptor CCR5 are highly resistant to infection by HIV. CCR5 is a chemokine receptor expressed on the surface of multiple subsets of monocytes and lymphocytes, including CD4 + T cells, the major HIV target cells. At the earliest stages of infection, the HIV envelope protein gp120 binds CD4 and CCR5 on the cell surface, resulting in fusion of the viral and host cell membranes and in the release of the viral genome into the target cell. The discovery that individuals who carry homozygous loss-of-function alleles at CCR5 are resistant to infection was first made in a group of MSM that were multiply exposed to the virus but remained uninfected 19 . It was determined that these men all shared a 32-bp deletion in the CCR5 gene (the CCR5Δ32 allele) that leads to the production of a nonfunctional protein and the absence of functional CCR5 on the cell surface prevents HIV from entering target cells (FIg. 3a) . The CCR5Δ32 allele is observed at ~10% frequency in individuals of European ancestry (homozygosity occurs at a frequency of 1%), at a reduced frequency in southern Europeans compared to those in the north 70 and is not observed at an appreciable frequency in other continental populations. Compound heterozygotes (that is, individuals carrying one copy of CCR5Δ32 and a second loss-of-function CCR5 variant) are also resistant to infection, although these individuals are exceedingly rare 71 . The observation that individuals lacking CCR5 expression are resistant to HIV infection directly led to the development of the antiviral drug Maraviroc, a CCR5 antagonist 72 , as well as to the world's first ethically fraught attempt at human embryo engineering 73 . Perhaps most interestingly, bone marrow transplants between CCR5Δ32 homozygous donors and recipients with HIV infection have resulted in the only two confirmed cases of long-term HIV cure 74, 75 . Although promising, this effect has been difficult to replicate in engineered autologous stem cell models 76 and is unlikely to be scalable to the level necessary to stem the pandemic. Additionally, the protection is not absolute, as several confirmed cases of infection in CCR5Δ32 homozygotes have been reported (reviewed in reF. 77 ), presumably by viruses that utilize the minor co-receptor CXCR4 or by dual-tropic viruses. In addition to the impact of homozygosity on preventing infection, individuals with a single CCR5Δ32 copy exhibit lower spVL and delayed disease progression compared to those with two functional copies 19, 20, 78 , likely because the reduced levels of CCR5 protein on the cell surface lower the efficiency of HIV entry into target cells (FIg. 3a) . The CCR5 locus was also identified in GWAS, first in a study of ~2,500 PLWH in Europe 16 and then in an expanded set of 6,300 individuals from across the globe 52 . However, the CCR5Δ32 allele was not directly assayed on the genotyping Killer immunoglobulin-like receptor (KIr) . A family of highly polymorphic activating and inhibitory receptors that serve as key regulators of human natural killer cell function. The cells primarily infected by HIV, namely CD4 + T cells and macrophages, both of which are key components of a healthy immune system. www.nature.com/nrg platforms used in these studies, thus only proxy SNPs were identified. In a combined analysis of GWAS data and direct CCR5Δ32 genotyping, it was observed that the CCR5Δ32 allele was not the most strongly associated variant in the region, suggesting that multiple independent genetic effects occur at this locus. Conditional analysis accounting for the effect of CCR5Δ32 showed that an additional marker, rs1015164, was also strongly associated with spVL. Functional analysis of this variant showed that it regulates the expression of an antisense long non-coding RNA called CCR5-AS, which overlaps the CCR5 gene 79 . This study further showed that the increased expression of CCR5-AS resulted in increased CCR5 expression because CCR5-AS interfered with the rALY-mediated degradation of CCR5 mRNA. Moreover, the knockdown of CCR5-AS reduced the susceptibility of CD4 + T cells to HIV-1 infection ex vivo (FIg. 3b) . These results demonstrate that the clinical course of untreated HIV infection is directly influenced by the innate level of CCR5 expression within the infected individual. Whether additional functional polymorphisms in CCR5 have similar effects remains an open question. Most studies performed so far in the field of host genetics focused on clinically defined outcomes such as the susceptibility to infection or disease progression. However, intermediate phenotypes have been shown to be very valuable in identifying subtle genetic association signals that are not always detectable using more complex clinical outcomes. A particularly promising intermediate phenotype, unique by its nature to infectious diseases, is variation in the pathogen genome (FIg. 4) . HIV is a highly variable virus that establishes a lifelong infection. Therefore, it represents High Low None A mechanism in which the rALY protein binds to the 3′ untranslated region of an mrNA to promote its degradation. Nature reviews | Genetics an ideal model to search for the potential effects of intrahost selective pressure on a human pathogen. While some of the variants observed in the HIV genomic sequence are present in the transmitted/founder virus, another fraction is acquired during the course of the disease resulting, at least partially, from the selective pressure exerted by the host response to infection. Signs of host-driven selection are clearly visible in the HIV genome. In particular, specific variants have been described in key viral epitopes presented by HLA class I molecules and targeted by CTL responses 80 . Mutations have also been reported in regions targeted by KIR, suggesting the escape from immune pressure by NK cells 81, 82 . A non-negligible fraction of the HIV-1 genome (~12%) is under positive selection but only about half of the positively selected sites map to canonical CD8 + T cell epitopes 83 , indicating that additional host factors could be driving evolution in non-epitope sites. Computational approaches developed over the past decade have allowed more comprehensive analyses of the reciprocal genetic signals resulting from the host-pathogen interaction 84, 85 . Joint analyses of human and HIV sequence variation start with the generation of large-scale genomic data from paired samples. The retroviral genome can be Control for human stratification, for example, using PC analysis Control for viral stratification, for example, using a phylogenetic-based approach www.nature.com/nrg isolated and sequenced either as native RNA during replicative infection or as proviral DNA, integrated into the host genome, during latent infection. Human genomic information can be obtained using genotyping or sequencing technology. The principle of genome-togenome (g2g) studies is then to perform a systematic search for associations between human genetic polymorphisms and viral sequence variants, at the nucleotide or amino acid levels. Because of the very large number of models run in parallel -one GWAS for each viral variant -this approach requires stringent correction for multiple testing. By mapping all interacting loci, G2G studies have the potential to uncover the most important genes and pathways involved in specific responses to infectious agents, thereby revealing novel diagnostic or therapeutic targets. In addition to identifying the sites of genetic interplay between virus and host, this study design makes it possible to estimate the biological consequences of such interactions and to estimate the relative impact of human and viral genetic variation on phenotypic outcomes by assessing associations between human-driven escape at viral sites and a quantitative clinical phenotype. In spite of these promises, it must be acknowledged that G2G studies have not led, as of today, to the identification of novel HIV restriction factors in the human genome 86 . Future studies will require larger sample sizes to increase power but also more diversity with a strong focus on the inclusion of PLWH of non-European ancestries. Nevertheless, studies based on the combined analysis of host and pathogen genomic variation have already demonstrated their potential in other infections. In particular, the use of a similar study design in chronic hepatitis C virus infection highlighted the evolutionary pressure exerted by both innate (interferon-λ) and acquired (HLA class II) immune defence mechanisms 87, 88 . The intra-host evolution of DNA viruses can also be investigated using a G2G approach, as shown in a recent study that revealed several associations between human and Epstein-Barr virus sequence variation in immunosuppressed PLWH 89 . 91 , where 90% of infected people know their status, 90% of those are on antiviral therapy and 90% of those are suppressing the virus below the level of detection. This aspirational treatment target would practically mean, given currently available technologies, that more than 34 million people would be on lifelong chemotherapy. Although this treatment as prevention approach would undoubtedly result in decreases in transmission and dramatic increases in life expectancy for the population with HIV infection, it also requires a deeper understanding of how human genetic variation relates to variability in drug toxicity and response to long-term therapy. HIV pharmacogenetics. In addition to affecting HIV disease progression in untreated individuals, human genetic variability has also been implicated in modifying the response to treatment. A major achievement in the fight against HIV has been the development of multiple, effective therapeutics that target several stages of the viral life cycle. These include entry inhibitors, which prevent the binding of the viral spike protein gp120 to host cell receptors and fusion of the virus with host cell membranes; nucleoside and non-nucleoside reverse transcriptase inhibitors, which prevent the reverse transcription of the viral RNA genome into DNA; integrase inhibitors, which prevent the integration of the viral DNA product into the host genome; and protease inhibitors, which prevent the cleavage of viral polyproteins into their functional subunits (FIg. 5) . For several classes of anti-HIV therapy, human genetic variability is known to influence response to the drug, which in some cases leads to severe adverse events and treatment discontinuation 92 . Paradoxically, the HLA-B allele B*57:01, most notably associated with the control of infection, also predisposes carriers to a severe hypersensitivity reaction to the 101 ) have all been associated with slow metabolization kinetics of their cognate drugs (TABLe 1) , in some cases leading to drug accumulation in the brain, psychiatric complications and treatment stoppage 99 . The frequency of many of these polymorphisms varies depending on ancestral background, leading to reduced drug tolerance and therefore reduced efficacy in some populations. For example, the allele CYP2B6*6 (rs3745274), which results in the slow metabolism of efavirenz and nevirapine, two non-nucleoside reverse transcriptase inhibitors recommended for firstline use by WHO until recently, has an approximately twofold higher frequency in some African populations compared to Europeans 102 . This increased frequency and the resulting adverse events led to thousands of cases of treatment discontinuation in Zimbabwe when the nation adopted a single-pill efavirenz-containing regimen 103 . This example highlights the need to not only tailor the therapy to the individual but, in some cases, to the population as well. Newer generations of HIV therapies, such as integrase inhibitors and advanced nucleoside reverse transcriptase inhibitors, have more favourable pharmacokinetic and safety profiles 104 . However, the effects of long-term treatment with these drugs and any potential interactions with human genetic variability remain to be understood. In addition to the direct interactions between host genotype and drug metabolism, patients on long-term HIV therapy also experience early onset of several chronic diseases, including cardiovascular disease [105] [106] [107] , metabolic syndrome 108 , kidney disease 109, 110 and liver fibrosis 111 . These conditions are all known to have high heritability in the HIV uninfected population and genetic risk factors for type 2 diabetes mellitus 112 and cardiovascular disease 113 have been shown to be enhanced in PLWH on therapy. Recently, there has been a push to develop PRS in the general population. These scores, built by summing the additive effects of dozens to thousands of genetic variants within an individual, have been shown to have a strong predictive ability for multiple metabolic, inflammatory, tumoural and cardiovascular conditions 32 . Investigations of PRS in the specific context of HIV infected individuals receiving long-term antiretro viral therapy have just begun, with the recent demonstrations that the prediction of chronic kidney disease can be improved through the addition of a PRS to the known clinical and pharmacological risk factors 114, 115 and that a PRS can be useful to stratify PLWH at a high risk of cardiometabolic diseases who may benefit from preventive therapies 116 . An important caveat is that PRS are not necessarily transferable across ancestral groups and, as in all areas of genomics, attention should be paid to enhancing diversity and ensuring equity in precision medicine approaches. Host genomic studies have advanced our understanding of HIV biology in several important ways. Firstly, the demonstration of the dominant impact of HLA variation on HIV progression in the context of the whole genome reinforced the need to focus on T cell responses in vaccine design. Moreover, the ability to accurately infer HLA allele types and protein-level variability from genotyping array data, an approach first piloted in HIV genomic studies, has greatly increased our understanding of how amino acid variability in HLA molecules contributes to multiple medically important traits. Secondly, dense genotyping and large sample sizes enabled the discovery of multiple, independent signals in the CCR5 locus, which provided a deeper understanding of how the expression of CCR5 is regulated and how it modulates HIV infection beyond the known impact of the CCR5Δ32 allele. Finally, amassing genome-wide data for large cohorts of PLWH has enabled the validity of previous candidate gene associations to be assessed, providing a new standard for identifying novel loci of HIV restriction. In recent years, there have been several barriers to further advancing our understanding of how host genomics affects HIV susceptibility and progression. Firstly, current studies have predominantly included individuals of European ancestry, mirroring the lack of diversity in genomics in general 117 , which is particularly problematic because the vast majority of PLWH are non-White. The example of the population-specific CCR5Δ32 allele further highlights the need to stretch beyond European cohorts to determine if other population-specific effects may exist. Attaining the large sample sizes required for genomic discovery in non-European populations will require a substantial investment of resources and building of capacity in low-income and middle-income countries. Furthermore, understanding the potential function of genetic variants identified in diverse samples will require a shift towards inclusivity across genomics databases 118 . Secondly, with improvements in HIV care and broad adoption of test and treat strategies, the focus of host genomics studies has necessarily shifted away from the natural history of infection phenotypes to intermediate phenotypes, pharmacogenomics of long-term therapy, comorbidities or vaccine response. Thirdly, understanding other classes of genetic variation that are not well captured by genotyping arrays, for example, diversity of KIR alleles and T cell receptor usage, the other partner in the HLA interaction, should be investigated to better understand how genetic variation in key innate and adaptive immune genes impact disease outcomes. However, capturing these types of variation requires in-depth sequencing to resolve genetic diversity and, in the case of T cell receptor variation, targeted immune assays to capture the relevant cells. Progress on computational methods for inferring variation at some complex loci from genotyping array data 51,119 or next-generation sequencing data 120-122 will greatly aid these efforts. The full translational potential of host genomics discovery in HIV has yet to be realised. Although the association between HLA allele type, epitope binding and HIV control have been well established, this knowledge has yet to be translated into an effective preventative or therapeutic vaccine. As mentioned above, treatment of PLWH with CCR5-deficient cells has shown potential as an HIV cure but several technological improvements in autologous cell editing will be required before it becomes a scalable strategy. In addition to targeting host genes for editing, in vitro studies have also shown that it is feasible to directly target and excise the integrated proviral genome 123, 124 . Although an extremely promising strategy, the delivery of the necessary machinery to latently infected cells remains a challenge. The host genomics approach established in HIV research has since been applied to several other infectious diseases, including those posing substantial threats to human health such as hepatitis C virus 125, 126 , tuberculosis 127 , malaria 128 and even SARS-CoV-2 (reF. 129 ), among others. These studies have time and again uncovered novel therapeutic targets and mechanisms to identify the individuals who are most vulnerable to specific infections. As the world struggles with a novel pandemic-causing RNA virus, the lessons we can learn from how the human genome contributes to variability in outcome have never been more important. Published online xx xx xxxx Transmission of the human immunodeficiency virus Resistance to HIV-1 infection: lessons learned from studies of highly exposed persistently seronegative (HEPS) individuals Cohorts for the study of HIV-1-exposed but uninfected individuals: benefits and limitations The natural history of HIV infection Molecular mechanism of HIV-1 entry Immunopathogenesis and immunotherapy in AIDS virus infections The multifaceted nature of HIV latency A brief history of human disease genetics Host genetic variation and HIV disease: from mapping to mechanism Regulatory polymorphisms in the cyclophilin A gene, PPIA, accelerate progression to AIDS Consistent effects of TSG101 genetic variability on multiple outcomes of exposure to human immunodeficiency virus type 1 Polymorphisms in the MBL2 promoter correlated with risk of HIV-1 vertical transmission and AIDS progression Polymorphisms in Toll-like receptor 9 influence the clinical course of HIV-1 infection APOBEC3G genetic variants and their influence on the progression to AIDS Effects of human TRIM5alpha polymorphisms on antiretroviral function and susceptibility to human immunodeficiency virus infection Common genetic variation and the control of HIV-1 in humans International HIV Controllers Studyet al. The major genetic determinants of HIV-1 control affect HLA class I peptide presentation Association study of common genetic variants and HIV-1 acquisition in 6,300 infected cases and 7,200 controls This GWAS of HIV-1 acquisition showed no evidence for association outside of CCR5 including lack of replication of 22 loci previously claimed to impact acquisition We believe this study to be the first description of host genetic resistance to an infectious disease observed in people homozygous for a loss of function The role of a mutant CCR5 allele in HIV-1 transmission and disease progression Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection Influence of combinations of human major histocompatibility complex genes on the course of HIV-1 infection Benefits and limitations of genome-wide association studies Genotype imputation from large reference panels A whole-genome association study of major determinants for host control of HIV-1 Quantitation of HIV-1 RNA in plasma predicts outcome after seroconversion Viral load and heterosexual transmission of human immunodeficiency virus type 1. Rakai Project Study Group Host determinants of HIV-1 control in African Americans Genomewide association study for determinants of HIV-1 acquisition and viral set point in HIV-1 serodiscordant couples with quantified virus exposure A genetic polymorphism of FREM1 is associated with resistance against HIV infection in the Pumwani Sex Worker Cohort Fine-mapping classical HLA variation associated with durable host control of HIV-1 infection in African Americans Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations Evaluating the impact of functional genetic variation on HIV-1 control A genome-wide association study of resistance to HIV infection in highly exposed uninfected individuals with hemophilia A Whole genome sequencing of extreme phenotypes identifies variants in CD101 and UBE2V1 associated with increased risk of sexually acquired HIV-1 Triggering CD101 molecule on human cutaneous dendritic cells inhibits T cell proliferation via IL-10 production Direct activation of protein kinases by unanchored polyubiquitin chains TRIM5 is an innate immune sensor for the retrovirus capsid lattice The behavioral, cellular and immune mediators of HIV-1 acquisition: new insights from population genetics Common human genetic variants and HIV-1 susceptibility: a genome-wide survey in a 0123456789();: homogeneous African population The HLA genomic loci map: expression, interaction, diversity and disease Gene map of the extended human MHC Interrogating the major histocompatibility complex with highthroughput genomics What has GWAS done for HLA and disease associations? Pervasive additive and non-additive effects within the HLA region contribute to disease risk in the UK Biobank A global atlas of genetic associations of 220 deep phenotypes HLA and HIV-1: heterozygote advantage and B*35-Cw*04 disadvantage HLA B*5701 is highly associated with restriction of virus replication in a subgroup of HIV-infected long term nonprogressors Effect of a single amino acid change in MHC class I molecules on the rate of progression to AIDS Dominant influence of HLA-B in mediating the potential co-evolution of HIV and HLA Imputing amino acid polymorphisms in human leukocyte antigens Polymorphisms of large effect explain the majority of the host genetic contribution to variation of HIV-1 virus load A high-resolution HLA reference panel capturing global population diversity enables multiethnic fine-mapping in HIV host response This paper presents a framework for HLA allele imputation and association testing from genomewide SNP data and presents a detailed fine-mapping of HLA functional variation in ~12 Clustered mutations in HIV-1 gag are consistently required for escape from Hla-B27-restricted cytotoxic T lymphocyte responses Fitness cost of escape mutations in p24 Gag in association with control of human immunodeficiency virus type 1 Escape from the dominant HLA-B27-restricted cytotoxic T-lymphocyte response in Gag is associated with a dramatic reduction in human immunodeficiency virus type 1 replication HIV-specific CD8 + T cell proliferation is coupled to perforin expression and is maintained in nonprogressors Preservation of T cell proliferation restricted by protective HLA alleles is critical for immune control of HIV-1 infection Superior control of HIV-1 replication by CD8 + T cells is reflected by their avidity, polyfunctionality, and clonal turnover HLA heterozygote advantage against HIV-1 is driven by quantitative and qualitative differences in HLA allele-specific peptide presentation HIV peptidome-wide association study reveals patient-specific epitope repertoires associated with HIV control Using network analysis to quantify the structural importance of amino acids in HIV proteins, the authors show that mutations in epitopes presented by protective HLA class I alleles disproportionately impair viral replication This paper demonstrates that HLA-C expression level rather than epitope presentation is the key mediator of its impact on host control of HIV replication Differential microRNA regulation of HLA-C expression and its association with HIV control Epigenetic regulation of differential HLA-A allelic expression levels Elevated HLA-A expression impairs HIV control through inhibition of NKG2A-expressing cells Furthering work on non-classical HLA effects in HIV control, this paper demonstrates how the expression level of the HLA-A signal peptide can regulate the HLA-E-NKG2A interaction modulating spVL Immunogenetics of HIV disease Killer Ig-like receptors (KIRs): their role in NK cell modulation and developments leading to their clinical exploitation Epistatic interaction between KIR3DS1 and HLA-B delays the progression to AIDS The geographic spread of the CCR5 Δ32 HIV-resistance allele Combined effect of CCR5-Delta32 heterozygosity and the CCR5 promoter polymorphism -2459 A/G on CCR5 expression and resistance to human immunodeficiency virus type 1 transmission Efficacy of short-term monotherapy with maraviroc, a new CCR5 antagonist, in patients infected with HIV-1 Gene-edited babies: what went wrong and what could go wrong Long-term control of HIV by CCR5 Delta32/Delta32 stem-cell transplantation This is the first description of a functional cure of HIV infection in a patient who received a stem cell transplant from a donor homozygous for the CCR5Δ32 polymorphism Evidence for HIV-1 cure after CCR5Δ32/Δ32 allogeneic haemopoietic stem-cell transplantation 30 months post analytical treatment interruption: a case report Gene editing of CCR5 in autologous CD4 T cells of persons infected with HIV HIV-1 infection in persons homozygous for CCR5-Δ32 allele: the next case and the review The role of viral phenotype and CCR-5 gene defects in HIV-1 transmission and disease progression CCR5AS lncRNA variation differentially regulates CCR5, influencing HIV disease outcome HIV-1 adaptation to HLA: a window into virus-host immune interactions HIV-1 adaptation to NK-cell-mediated immune pressure Selection of an HLA-C*03:04-Restricted HIV-1 p24 Gag sequence variant is associated with viral escape from KIR2DL3+ natural killer cells: data from an observational cohort in South Africa Mapping of positive selection sites in the HIV-1 genome in the context of RNA and protein structural constraints This paper proposes the G2G method and demonstrates that viral genetic variation can be a more powerful phenotype than clinical markers for host-pathogen genomic studies Mapping the drivers of within-host pathogen evolution using massive data sets Exploring the interactions between the human and viral genomes Genome-to-genome analysis highlights the effect of the human innate and adaptive immune systems on the hepatitis C virus Applying the G2G approach to hepatitis C virus, the authors demonstrate that it can be used to detect viral evolution owing to both innate and adaptive host immune pressure Adaptation of hepatitis C virus to interferon lambda polymorphism across multiple viral genotypes The influence of human genetic variation on Epstein-Barr virus sequence diversity HIV viral load and transmissibility of HIV infection: undetectable equals untransmittable Joint United Nations Programme on HIV/AIDS. 90-90-90: an ambitious treatment target to help end the AIDS epidemic Association of pharmacogenetic markers with premature discontinuation of first-line anti-HIV therapy: an observational cohort study Association between presence of HLA-B*5701, HLA-DR7, and HLA-DQ3 and hypersensitivity to HIV-1 reverse-transcriptase inhibitor abacavir Genetic variations in HLA-B region and hypersensitivity reactions to abacavir Abacavir induces loading of novel self-peptides into HLA-B*57: 01: an autoimmune model for HLA-associated drug hypersensitivity Predictive value of known and novel alleles of CYP2B6 for efavirenz plasma concentrations in HIV-infected individuals Pharmacogenetics-based population pharmacokinetic analysis of efavirenz in HIV-1-infected individuals Pharmacogenetics of efavirenz and central nervous system side effects: an Adult AIDS Clinical Trials Group study In vivo analysis of efavirenz metabolism in individuals with impaired CYP2A6 function Pharmacogenetics-based population pharmacokinetic analysis of etravirine in HIV-1 infected individuals ADME pharmacogenetics: investigation of the pharmacokinetics of the antiretroviral agent lopinavir coformulated with ritonavir Pharmacogenetics of cytochrome P450 2B6 (CYP2B6): advances on polymorphisms, mechanisms, and clinical relevance How the genomics revolution could finally help Africa Comparative efficacy and safety of first-line antiretroviral therapy for the treatment of HIV infection: a systematic review and network meta-analysis Coronary heart disease in HIVinfected individuals Ischemic heart disease in HIV-infected and HIV-uninfected individuals: a population-based cohort study Increased risk of myocardial infarction in HIV-infected patients in France, relative to the general population Atherogenic dyslipidemia in HIVinfected individuals treated with protease inhibitors. The Swiss HIV cohort study The burden of dialysis-requiring acute kidney injury among hospitalized adults with HIV infection: a nationwide inpatient sample analysis Association of tenofovir exposure with kidney disease risk in HIV infection Combination antiretroviral therapy is associated with reduction in liver fibrosis scores in HIV-1-infected subjects Impact of single nucleotide polymorphisms and of clinical risk factors on newonset diabetes mellitus in HIV-infected individuals Contribution of genetic background, traditional risk factors, and HIV-related factors to coronary artery disease events in HIV-positive persons Contribution of genetic background and data collection on adverse events of anti-human immunodeficiency virus (HIV) drugs (D:A:D) clinical risk score to chronic kidney disease in Swiss HIV-infected persons with normal baseline estimated glomerular filtration rate Rapid progression of kidney dysfunction in swiss people living with HIV: contribution of polygenic risk score and D:A:D clinical risk score Genetic architecture of cardiometabolic risks in people living with HIV Genomics is failing on diversity DNA databases are too white: this man aims to fix that Imputation of KIR types from SNP variation data HLA-HD: an accurate HLA typing algorithm for next-generation sequencing data Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype Accurate and efficient KIR gene and haplotype inference from genome sequencing reads with novel K-mer signatures HIV-1 proviral DNA excision using an evolved recombinase RNA-directed gene editing specifically eradicates latent and prevents new HIV-1 infection Genetic variation in IL28B predicts hepatitis C treatment-induced viral clearance Genetic variation in IL28B and spontaneous clearance of hepatitis C virus Genome-wide association analyses identifies a susceptibility locus for tuberculosis on chromosome 18q11.2 Malaria Genomic Epidemiology Network. A novel locus of resistance to severe malaria in a region of ancient balancing selection Inborn errors of type I IFN immunity in patients with life-threatening COVID-19 Population structure in genetic studies: confounding factors and mixed models The authors contributed equally to all aspects of the article. The authors declare no competing interests. Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.