key: cord-1002421-pxgjx7wu authors: MacGowan, Stuart A.; Barton, Michael I.; Kutuzov, Mikhail; Dushek, Omer; van der Merwe, P. Anton; Barton, Geoffrey J. title: Missense variants in human ACE2 modify binding to SARS-CoV-2 Spike date: 2021-05-21 journal: bioRxiv DOI: 10.1101/2021.05.21.445118 sha: 60236e00e3badc8c69479243e5fd98c397de4024 doc_id: 1002421 cord_uid: pxgjx7wu SARS-CoV-2 infection begins with the interaction of the SARS-CoV-2 Spike (Spike) and human angiotensin-converting enzyme 2 (ACE2). To explore whether population variants in ACE2 might influence Spike binding and hence infection, we selected 10 ACE2 variants based on affinity predictions and prevalence in gnomAD and measured their affinities for Spike receptor binding domain through surface plasmon resonance (SPR). We discovered variants that enhance and reduce binding, including two variants with distinct population distributions that enhanced affinity for Spike. ACE2 p.Ser19Pro (ΔΔG = ± 0.59 0.08 kcal mol−1) is often seen in the gnomAD African cohort (AF = 0.003) whilst p.Lys26Arg (ΔΔG = 0.26 0.09 kcal mol−1) is predominant in the Ashkenazi Jewish (AF = 0.01) and European non-Finnish (AF = 0.006) cohorts. Carriers of these alleles may be more susceptible to infection or severe disease and these variants may influence the global epidemiology of Covid-19. We also identified three rare ACE2 variants that strongly inhibited (p.Glu37Lys, ΔΔG = −1.33 ± 0.15 kcal mol−1 and p.Gly352Val, predicted ΔΔG = −1.17 kcal mol−1) or abolished (p.Asp355Asn) Spike binding. These variants may confer resistance to infection. Finally, we calibrated the mCSM-PPI2 ΔΔG prediction algorithm against our SPR data, give new predictions for all possible ACE2 missense variants at the Spike interface and estimate the overall burden of ACE2 variants on Covid-19 phenotypes. The COVID-19 pandemic is one of the greatest global health challenges of modern times. Although the disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is usually cleared following mild symptoms, it can progress to serious illness and death 1 . Besides the clear risks associated with age and comorbidities 2,3 , there could be a genetic component that predisposes some individuals to worse outcomes 4 . Genetic association studies have already identified several loci involved in Covid-19 risk 5 . Identifying further genetic factors of COVID-19 susceptibility has implications for clinical decision making and epidemic dynamics. Genetic variation may constitute hidden risk factors and, in some cases, explain why otherwise healthy individuals in low-risk groups experience severe disease. The identification of specific genetic variants that influence the severity and progression of COVID-19 presents the opportunity for predictive diagnostics, early intervention and personalised treatments whilst the population distribution of such variants could contribute to population specific risk. Human angiotensin-converting enzyme 2 (ACE2) is the host cell receptor that SARS-CoV-2 exploits to infect human cells 1, 6 . As this is the same receptor used by the SARS coronavirus (SARS-CoV) that caused the SARS outbreak in 2002, the detailed body of knowledge built around SARS-CoV infection is relevant to understanding SARS-CoV-2 1,6,7 . The spike glycoprotein (Spike) is the coronavirus entity that recognises and binds host ACE2. Both SARS coronavirus Spikes include an S1 domain that contains ACE2 recognition elements and an S2 domain that is responsible for membrane fusion 6 . Spike is primed for cell fusion by cleavage with host furin 8 and TMPRSS2 6 , in SARS-CoV cleavage by TMPRSS2 is thought to be promoted upon formation of the ACE2 Spike complex 9 . The S1 receptor binding domains (RBDs) from both SARS-CoV 10 and SARS-CoV-2 11 have been co-crystallised with human ACE2. The RBDs from both viruses are similar in overall architecture and interface with roughly the same surface on ACE2. Differences are apparent in the so-called receptor binding motif, which is the region of the RBD responsible for host range and specificity of coronaviruses [10] [11] [12] . The binding affinity of Spike and ACE2 is known to be correlated to the infectivity of SARS-CoV and is determined by the complementarity of the interfaces 10, 12 . However, despite its essential role in infection, risk variants in ACE2 have not been conclusively identified in genetic association studies. Missense variants located in protein-protein interaction interfaces can affect altered binding characteristics 13, 14 and in the context of virus-host interactions, have been shown to effect susceptibility 15 . This indicates the potential of missense variants in ACE2 to alter Spike binding and therefore influence a key step in SARS-CoV-2 infection. This is also suggested by the fact that the host range of coronaviruses is partly determined by the complementarity of the Spike receptor binding motif and the target hosts' ACE2 sequence 10, 12 . A few studies 4, [16] [17] [18] [19] [20] [21] have addressed this question and gave rise to conflicting conclusions regarding the effects of specific variants on the interaction affinity and their relevance to the pandemic generally. The strongest of these used data from a published deep mutagenesis binding screen 22 to assess the effect of ACE2 population variants on Spike affinity and confirmed the effects of five key variants with further biochemical assays 17 . In our own previous contribution 23 , we employed the mCSM-PPI2 protein-protein interaction affinity prediction algorithm 13 to assess the effects of ACE2 variants on the binding of SARS-CoV-2 Spike and predicted that three reported ACE2 variants would strongly inhibit or abolish binding (p.Asp355Asn, p.Glu37Lys and p.Gly352Val). Confidence was given to these predictions by the performance of mCSM-PPI2 13 in comparison to binding data for 26 ACE2 mutants in complex with SARS-CoV Spike RBD 12 . Here we report experimental binding affinities of 10 ACE2 variants for isolated SARS-CoV-2 Spike RBD determined via surface plasmon resonance (SPR). These results give better insight into the effect of ACE2 variants on SARS-CoV-2 Spike binding, revealing additional variants that enhance Spike binding and also test the quality of our predictions. The SPR data also allowed us to recalibrate the mCSM-PPI2 predictions to provide more accurate estimates of the effect of interface variants that we did not test experimentally. Figure 1 highlights the mutated residues on the structure of ACE2 11 in complex with SARS-CoV-2 Spike. We determined the binding affinity of 10 ACE2 mutants for isolated SARS-CoV-2 Spike RBD via surface plasmon resonance (SPR) to identify variants that may influence an individual's response to infection. Nine of these mutants were selected from the 241 ACE2 missense variants reported in gnomAD 24 on the basis of our previous computational predictions 23 and their reported prevalence, whilst the tenth mutant (p.Thr27Arg) was predicted to enhance Spike binding more than any other possible mutation at the interface. Table 1 presents experimentally determined ΔΔG and mCSM-PPI2 13 predictions for the 10 ACE2 mutants together with predicted data for a further three variants, alongside variant population frequencies and RBD interacting residues. Our SPR data were collected in two batches. The first batch comprised four variants, two were predicted to strongly reduce or abolish Spike binding (p.Glu37Lys and p.Asp355Asn) and two predicted to enhance Spike binding (p.Gly326Glu and Thr27Arg) 23 . The SPR measurements showed strongly reduced binding for p.Glu37Lys and the total abolition of binding for p.Asp355Asn (within the concentration range assayed), in agreement with the predictions. In contrast, p.Gly326Glu and Thr27Arg, which had been predicted to enhance binding, displayed decreased and slightly decreased binding, in disagreement with the predictions. These discrepancies motivated a second set of SPR measurements that included six of the most prevalent ACE2 variants close to the Spike binding site. Surprisingly, the two most common ACE2 variants tested bound SARS-CoV-2 Spike more strongly than reference ACE2. These were p.Lys26Arg (ΔΔG = 0.26 ± 0.09), which has the highest allele frequency of ACE2 variants near the Spike interface, and p.Ser19Pro (ΔΔG = 0.59 ± 0.09) that has the second highest frequency. Two other variants in this series also increased Spike binding (p.Phe40Leu and p.Pro389His) despite being over 8 Å from the closest Spike residue. Finally, p.Glu329Gly may cause a slight reduction in binding (ΔΔG = -0.09 ± 0.09) whilst p.Glu35Lys had an inhibitory effect (ΔΔG = -0.36 ± 0.09). These results show that ACE2 variants can both enhance and inhibit Spike binding, properties that may reasonably be associated with susceptibility and resistance phenotypes to SARS-CoV-2 infection. 11 . Spike residues: Spike residues within 5 angstrom of mutant site (or closest if no residues are in this range). mCSM-PPI2 ΔΔG: The predicted ΔΔG in kcal mol -1 for the missense mutation calculated by mCSM-PPI2 13 with PDB 6vw1 as the model structure. Recalibrated mCSM-PPI2 ΔΔG: adjusted mCSM-PPI2 ΔΔG following recalibration with SPR data ( §2.4). SPR ΔΔG: ΔΔG determined by SPR assay. Max. prevalence: The allele frequency for the gnomAD continental population with the highest frequency (see Table 4 for all population frequencies and definitions of all gnomAD cohort abbreviations). a. p.Thr27Arg was not reported in gnomAD and was selected as it had the highest predicted increased ΔΔG of any possible ACE2 mutation at the Spike interface. b. No binding was observed for ACE2 p.Asp355Asn, this value corresponds to the calculated maximum affinity that is consistent with this observation. Figure 3 illustrate the mCSM-PPI2 provided models of p.Ser19Pro and p.Lys26Arg. p.Lys26Arg was not predicted to make any new well-defined contacts with Spike residues but it is predicted to adopt a conformation that extends toward Spike residues Y473 and F456, coming within 7.8 Å and 6.1 Å of these residues' aromatic rings, respectively, slightly beyond typical amino-aromatic interaction distances 27 but potentially favourable when dynamics are considered. Similarly, p.Ser19Pro also does not introduce any new Spike contacts according to the mCSM-PPI2 model and so the enhanced affinity is difficult to explain but it is has been suggested that the Pro mutant stabilises the helix to favour Spike interaction 22 . ACE2 p.Ser19Pro is of further interest because of its proximity to Spike Ser477, which has mutated to Asn in circulating SARS-CoV-2 strains and these ACE2 and RBD variants have been found to interact 28 . The mCSM-PPI2 structural models of the p.Asp355Asn, p.Glu37Lys and p.Gly352Val were discussed in detail in our previous work 23 . In summary, p.Asp355Asn was predicted to introduce a number of steric clashes at the interface, whilst p.Glu37Lys abolished an H-bond between ACE2 Glu37 and RBD Tyr505. The enhanced binding by p.Lys26Arg and p.Ser19Pro is particularly interesting since this may effect carrier susceptibility or vulnerability toward SARS-CoV-2 infection and they have relatively high prevalence in the gnomAD 24 populations. p.Lys26Arg is the most common missense variant in the ACE2 ectodomain (Total allele count = 797, Total allele frequency = 0.004) and is predominant in the Ashkenazi Jewish cohort (ASJ AF = 0.01) and the European (non-Finnish) population (NFE, AF = 0.006). Amongst NFE sub-populations, it is most prevalent in north-western Europeans (AF = 0.007) and least prevalent in southern Europeans (AF = 0.003) and Estonians (AF = 0.003). p.Lys26Arg was also observed within this frequency range in Latino/Admixed American (AMR, AF = 0.003) samples. The variant is less frequent in Finnish (FIN, AF = 0.0005), African/African-American (AFR, AF = 0.001), South Asian (0.001) and, especially, East Asian (0.00001) samples. Interestingly, gnomAD reports a second variant at this site, p.Lys26Glu, suggesting that the position is especially tolerant to mutation. p.Ser19Pro is the next most common ACE2 missense variant in proximity to the Spike binding site (AC = 64, AF = 0.0003) and has the highest positive ΔΔG of all those tested (ΔΔG = 0.59 ± 0.03). This variant is practically unique to the African/African-American gnomAD population (AFR, AC = 63, AF = 0.003). The only other observation of this variant in gnomAD is in a single heterozygote in the labelled "Other" cohort. As a result of these distinct population distributions, it is possible that these variants could contribute to some of the observed epidemiological 30 variation between populations and ethnic groups. Given the prevalence of these variants we checked for their occurrence in recent GWAS studies on Covid-19 related phenotypes. Table 2 presents GWAS association results for ACE2 p.Lys26Arg from the Covid-19 Host Genetics Initiative (HGI) 31 . These data show some consistency with the hypothesis that p.Lys26Arg contributes additional risk for more severe Covid outcomes, whilst not affecting the likelihood of infection. In the very severe respiratory confirmed Covid vs. population contrast (study A2), a non-statistically significant increased risk was reported (β = 0.38 ± 0.24, p = 0.12, p-het = 0.74). Other Covid HGI data summaries for contrasts testing for alleles associated with the risk of SARS-CoV-2 infection suggest the variant does not play a role in infection acquisition. Covid vs. lab/self-reported negative (β = -0.18 ± 0.14, p = 0.21, p-het = 0.92); Covid vs. population (β = -0.03 ± 0.11, p = 0.81, p-het = 0.65) and predicted Covid from self-reported symptoms vs. predicted or selfreported non-Covid (β = -0.10 ± 0.24, p = 0.67, p-het = 0.08). Although none of these tests achieved genome wide significance, it would be worthwhile to reassess the data after controlling for other loci with the greatest effect sizes, sex or other appropriate stratifications. Our SPR data provide accurate readouts of the effect of ACE2 variants on Spike binding, but we could not feasibly carry out experiments for all possible ACE2 mutations at the interface. For variants we have not studied, predictions and other high-throughput datasets can be useful, so long as their applicability and limitations are well-understood. In our previous work 23 , we calibrated predictions against relative binding data for ACE2/SARS-CoV Spike, we now improve this by recalibrating the mCSM-PPI2 predictions with our ACE2/SARS-CoV-2 Spike SPR dataset. Figure 4 compares experimental and mCSM-PPI2 13 predicted ΔΔG. The eight mutants with predicted ΔΔG < 0 kcal mol -1 are linearly correlated with the experimental ΔΔG (R 2 = 0.91, p = 0.0006), whilst the two mutants with predicted ΔΔG > 1 kcal mol -1 (p.Gly326Glu and Thr27Arg) do not follow this behaviour. The absence of detectable binding for p.Asp355Asn and the strongly reduced binding observed for p.Glu37Lys, support our original determination that predicted ΔΔG < -1 kcal mol -1 was a reliable indicator of inhibitory variants. The difficulty mCSM-PPI2 has with affinity enhancing variants is not altogether surprising since our original calibration showed poorer performance for these variants 23 , which may be because there were fewer experimentally determined affinity enhancing variants in the algorithm's training data. Structurally, although the mCSM-PPI2 model of p.Gly326Glu predicted new contacts with Spike 23 , it is possible that reduced torsional flexibility at mutant Glu326 has other effects that were not recognised (a similar argument applies for p.Ser19Pro but here the error is small). The subtler affinity changes measured for the five variants with predicted ΔΔG between −0.5 and 0.0 kcal mol -1 , validate our original ambiguity towards those predictions, but the strong linear correlation within this narrow range is encouraging. Taken together, these observations suggest that predictions within the range of p.Glu36Lys (ΔΔG pred = −1.2 kcal mol -1 ) and p.Lys26Arg (ΔΔG pred = 0.0 kcal mol -1 ) may be rescaled to provide a better estimate of actual ΔΔG. Recalibrated ΔΔG yields correct predictions for all eight mutations with negative predicted ΔΔG and measurable ΔΔG from SPR (Table 1) . This lends additional confidence in our prediction that ACE2 p.Gly352Val strongly reduces Spike binding (in agreement with published deep mutagenesis experiments 22 ). Existing human variation datasets are well-powered to detect common variation (1KG was estimated to detect >99% SNPs with MAF >1% 32 and gnomAD is substantially larger) in the sampled populations but they are far from comprehensive with respect to rare variation 24 . Rare variants in ACE2 that influence Spike binding could have implications for the epidemiology of COVID-19 in addition to the consequences for affected individuals. If there were 10 such variants with an allele frequency of 1 in 50,000, their collective occurrence might be as high as 1 in 5,000 (discounting linkage) and when this is considered alongside the possibility that a high proportion of the global population will be exposed to SARS-CoV-2 it becomes clear that such effects should be investigated. These variants could even be present at significant frequencies in populations missing or underrepresented in gnomAD. Figure 5 illustrates the distribution of recalibrated mCSM-PPI2 ΔΔG predictions (ΔΔG recal ) for all 475 possible ACE2 mutations at 25 ACE2 residues close to the Spike interface and the subset that are accessible via a single base change of the ACE2 coding sequence (these are more likely to be present in human populations than those requiring multiple substitutions). Most of these mutations are predicted to have only a slight effect on Spike binding, but there is a secondary mode below −1.0 kcal mol -1 with 126 mutations predicted to lead to strongly reduced binding. Fewer variants received high positive ΔΔG recal scores: 17 had ΔΔG recal > 1.0 kcal mol -1 , a further 44 had ΔΔG recal > 0.5 kcal mol -1 and an additional 70 had ΔΔG recal > 0.2 kcal mol -1 (i.e., ΔΔG recal > p.Lys26Arg). A similar pattern is observed for the 151 mutants corresponding to 172 single nucleotide variants of the ACE2 coding sequence ( Figure 5B ). These results suggest that random novel ACE2 missense variants at these loci can inhibit or enhance Spike binding, but are slightly more likely to be inhibitory, meaning that diversity at these positions might typically be beneficial and provide some resistance to infection. Notably, these recalibrated predictions do not display the same bias toward negative ΔΔG that the raw mCSM-PPI2 ΔΔG predictions did 23 (Supplementary Figure 1) , which may be another indicator of the improved quality of these recalibrated predictions. Even though most of these variants are not reported in gnomAD, they may still occur within the populations represented, especially if they occur at frequencies that are poorly detected. With this in mind, we calculated allele frequencies for these mutations that are compatible with their absence from gnomAD (see Methods) in order to gain a better idea of how widespread the effects of these variants might be. Table 3 presents estimated allele frequencies for novel variants that are predicted to inhibit and enhance Spike binding at varying ΔΔG recal thresholds. Upper bounds for their joint frequencies assuming that they occur at lower frequencies than singleton variants suggest frequency bounds that range 1 in 1,000 to 1 in 2,000 variants per allele for inhibitory variants, and 1 in 1,500 to around 1 in 5,000 for enhancer variants. A second approach, which takes account of the empirical detection of rare variants in ACE2, yields frequencies that span from 1 in 6,250 to 1 in 12,195 for inhibitory variants, and 1 in 8,333 to 1 in 37,037 for enhancer variants. These estimates show that novel variants in ACE2 with any weak Spike affinity phenotype could plausibly be as common as 1 in 3,571 alleles (calculated as the sum of the highest inhibitor and enhancer prevalence's), but the strongest affinity phenotypes are more likely to occur in frequency ranges akin to rare genetic diseases. It should be remembered that these values are calculated to be compatible with their absence from the gnomAD dataset, but it remains possible that some or all of these variants do not exist at all or that they are very common in one or more populations not represented in gnomAD. If this applies to SARS-CoV-2, ACE2 variants that enhance Spike binding could increase viral spreading in carriers due to increased cellular tropism. Greater viral spreading is associated with clinical deterioration 34 and could cause increased infectiousness, akin to the enhanced transmissibility of SARS-CoV-2 vs. SARS-CoV, which is associated with increased viral loads in the upper respiratory tract 6, 35 and also correlates with enhanced receptor affinity 11 . The importance of tuned Spike-receptor affinities in SARS-CoV-2 may be enhanced because of the relatively low surface density of Spike on SARS-CoV-2 virions, which may lead to a weak avidity effect 36 . Anecdotal evidence is also provided by Spike RBD mutations, such as N501Y, that enhance ACE2 affinity 37 and are associated with increased transmissibility 38 . Again, infectivity studies are needed but here the prevalence of p.Lys26Arg and p.Ser19Pro may allow the observation of an effect in future Covid-19 GWAS studies. Some infectivity data for SARS-CoV-2 towards cells expressing ACE2 variants are available, including variants p.Ser19Pro and p.Lys26Arg 16 . Surprisingly, SARS-CoV-2 pseudotypes showed slightly decreased infectivity towards HEK293T cells expressing ACE2 p.Lys26Arg, and no difference compared to reference ACE2 was found for the susceptibility of cells expressing ACE2 p.Ser19Pro 16 . Although these results suggest that affinity is not directly proportional to infectivity in a single cell type, they do not exclude the possibility that these variants allow increased cellular tropism as discussed above. The same study also reported no effect on Spike affinity caused by these variants 16 , which is probably due to the low sensitivity of the cell based binding assay employed, since enhanced binding was also found in a different cell based assay 22 . Indeed, we are especially confident in the accuracy of our ΔΔG measurements since our results are consistent with published deep mutagenesis binding data 22 (Supplementary Figure 2 ). In addition, another recent report agreed with our findings that the common ACE2 variants p.Ser19Pro and p.Lys26Arg enhanced affinity for Spike whilst p.Glu37Lys inhibits binding 17 . Other considerations relevant to the effect of ACE2 Spike affinity variants on carriers include ACE2 carboxypeptidase activity 39 , which may be modulated by the specific binding affinity, the effect of hemizygosity and sex differences in Covid-19 outcomes, and the interplay of affinity variants and ACE2 expression levels. The importance of ACE2 expression was mentioned earlier but it is worth highlighting that it is known to partly determine the cellular specificity of SARS-CoV 40 and was explored as a potential factor in COVID-19 susceptibility and severity, including the interaction between ACE2 variants and ACE2 stimulating drugs 41 . Also, heterozygotes express a proportion of ACE2 alleles whilst hemizygotes carry only a single ACE2 allele so that ACE2 Spike affinity variants ought to always show greater penetrance in hemizygotes, for better or worse. In contrast and since ACE2 escapes complete X-inactivation 20 , heterozygotes have the advantage that the more resistant ACE2 allele could become dominant in infected tissues due to selection over cellular infection cycles, gradually increasing the prevalence of the more Spike resistant ACE2 allele. Besides the possible benefit against infection, there could be other implications for heterozygotes depending on the persistence X-inactivation bias and the nature of any hitchhiking alleles. Table 4 presents detailed allele frequencies from gnomAD 24 for the ACE2 variants investigated in this work. The most prevalent ACE2 variant mediated Covid-19 phenotypes are likely to arise from the two relatively common variants found to enhance Spike binding, p.Ser19Pro and p.Lys26Arg. These variants were found in 3 in 1,000 individuals in the gnomAD African/African-American (AFR) samples and 7 in 1,000 North Western non-Finnish European samples (NW-NFE), respectively. p.Lys26Arg was also observed in other populations, including 1 in 1,000 AFR samples, adding further burden to this cohort. The high affinity displayed by ACE2 p.Ser19Pro is concerning in context with the reported disproportionate impact of Covid-19 on black and minority ethnic groups 30 . Further research to identify the impact of these variants on SARS-CoV-2 pathogenesis should be prioritised with urgency and existing and future GWAS should investigate these variants more closely with appropriate stratifications. Our analysis of the effects of all possible missense mutations at the ACE2 Spike interface predicts many other potential novel variants that effect the Spike interaction, but even though a relatively high proportion of possible variants are predicted to effect Spike binding (up to 62%), their collective frequency could be as little as 2.8 in 10,000. These frequencies are comparable to those that might be expected for variants that cause rare diseases. Although these data suggest that ACE2 alleles with large positive or negative ΔΔG are extremely rare it is important to note that allele frequencies show significant variation even between large variation datasets 42 . Moreover, it is known that local population allele frequencies can vary substantially from those reported in public datasets 43 and it is therefore possible that some populations have risk or protective mutations at higher frequencies. It is also worth highlighting that the substantially increased Spike affinities of the two most common variants tested may indicate the presence of a past selective effect occurring twice independently, which may increase the possibility of other higher frequency variants amongst populations not well represented by gnomAD. These frequencies might be useful for modellers to assess the role of rare ACE2 variants with Spike binding phenotypes on the epidemiology of Covid-19. For example, whether rare ACE2 variants that inhibit Spike binding contribute to the prevalence of asymptomatic carriers or if rare ACE2 variants with enhanced Spike binding can account for instances of particularly severe disease in the absence of typical comorbidities. In summary, we report the binding affinities of 10 ACE2 variants with SARS-CoV-2 Spike RBD. We found that ACE2 p.Ser19Pro and p.Lys26Arg, two of the most common ACE2 missense variants in gnomAD 24 , substantially increase the affinity and are therefore possible risk factors for COVID-19. These variants have distinct distributions across the gnomAD cohorts and could therefore contribute population-specific risk. Two additional rare variants were identified that also enhanced Spike binding, although to a lesser extent. We confirmed our previous predictions that ACE2 p.Glu37Lys and p.Asp355Asn strongly inhibit Spike binding and are therefore potentially protective against COVID-19. We also identified two further rare variants that inhibited binding, including p.Gly326Glu, which we had previously incorrectly predicted would enhance binding. The SPR affinity data were used to recalibrate the mCSM-PPI2 13 algorithm to provide improved affinity predictions for all possible ACE2 missense variants that interact with Spike and we estimated the prevalence of novel Spike affinity variants in ACE2. A key point in these burden assessments is the separate consideration of variants predicted to inhibit and enhance Spike affinity since these may have distinct phenotypes. Overall, p.Ser19Pro and p.Lys26Arg are still expected to have the most widespread effects, with the joint prevalence of the rare affinity variants substantially lower, but in all cases the possibility of higher prevalence in local or underrepresented populations remains and the penetrance of each variant will be a key factor. This work has applications in helping to prioritise experimental work into SARS-CoV-2 Spike human ACE2 recognition; developing genetic diagnostic risk profiling for COVID-19 susceptibility and severity, improving detection and interpretation in future COVID-19 genetic association studies and in understanding SARS-CoV-2 Spike evolution and host adaptation. The ACE2 construct was kindly provided by Ray Owens at the Oxford Protein Production Facility-UK. The RBD construct was kindly provided by Quentin Sattentau at the Sir William Dunn School of Pathology. ACE2 point mutations were added using Agilent QuikChange II XL Site-Directed Mutagenesis Kit following the manufactures instructions. The primers were designed using the Agilent QuikChange primer design web program. Cells were grown in FreeStyle™ 293 Expression Medium (12338018) in a 37 °C incubator with 8% CO2 on a shaking platform at 130 rpm. Cells were passaged every 2-3 days with the suspension volume always kept below 33.3% of the total flask capacity. The cell density was kept between 0.5 and 2 million per ml. Cells were counted to check cell viability was above 95% and the density adjusted to 1. The protein was eluted from the Ni-NTA Agarose with elution buffer (50 mM NaH2PO4, 300 mM NaCl and 250 mM imidazole at pH 8). The protein was concentrated, and buffer exchanged into size exclusion buffer (25 mM NaH2PO4, 150 mM NaCl at pH 7.5) using a protein concentrator with a 10,000 molecular weight cut-off. The protein was concentrated down to less than 500 μl before loading onto a Superdex 200 10/300 GL size exclusion column. Fractions corresponding to the desired peak were pooled and frozen at -80 °C. Samples from all observed peaks were analysed on an SDS-PAGE gel (Supplementary Figure 3) . SARS-CoV-2 receptor binding domain binding to human extracellular ACE2 were analysed on a Biacore T200 instrument (GE Healthcare Life Sciences) at 37°C and a flow rate of 30 µl/min. Running buffer was HBS-EP (BR100669). Streptavidin was coupled to a CM5 sensor chip (29149603) using an amine coupling kit (BR100050) to near saturation, typically 10000-12000 response units (RU). Biotinylated ACE2 WT and variants were injected into the experimental flow cells (FC2-FC4) for different lengths of time to produce desired immobilisation levels (600-700 RU). FC1 was used as a reference and contained streptavidin only. Excess streptavidin was blocked with two 40 s injections of 250 µM biotin (Avidity). Before RBD injections, the chip surface was conditioned with 8 injections of the running buffer. A dilution series of RBD was then injected simultaneously in all FCs. Buffer was injected after every 2 or 3 RBD injections. The lowest RBD concentration was injected at the beginning and at the end of each dilution series to ensure reproducibility. The length of all injections was 30 s, and dissociation was monitored from 180-300 s. Binding measured in FC1 was subtracted from the other three FCs. Additionally, all binding and dissociation data were double referenced using the closest buffer injections 44 . In all experiments, an ACE2specific antibody (NOVUS Biologicals, AC384) was injected at 5 µg/ml for 10 minutes with the disassociation monitored for 10 minutes (Supplementary Figure 4) . Only ACE2 T27R did not bind AC384 as expected but since this mutant displays RBD binding comparable to WT ACE2, this most likely indicates direct inhibition of AC384 binding rather than the presence of unfolded protein. Double referenced binding data was plotted and fit with GraphPad Prism (Supplementary Figure 5) . To find the equilibrium KD (dissociation constant) the association phase was fit with a One−phase association model and the plateau binding measurements were extracted and plotted against the corresponding concentration of RBD. This plot was then fit with a One−site specific binding model below and the value for the equilibrium KD extracted. To convert KD values to ΔG the equation below was used. Where R is the gas constant measured in cal mol -1 K -1 and T is the temperature measured in K. A ΔΔG value could then be found for each mutant by subtracting the ΔG of the WT from the ΔG of each mutant. For ACE2 variant D355N binding was too poor to fit accurately therefore, an estimate for the lower limit for the KD was calculated using the formula below. Where the "Maximum [RBD]" is the highest concentration of RBD flown over the surface, the "Binding at KD (WT)" is the RU value at the equilibrium KD for the WT protein on the same chip and the "Binding at maximum [RBD] (variant)" is the maximum RU value for the RBD binding D355N at the highest concentration. The estimated KD lower limit could then be converted into ΔG and then ΔΔG using the same method above. The pyDRSASP suite 45 was used to integrate 3D structure, population variant and mutagenesis assay data for analysis. Population variants from gnomAD v2 24 were mapped to ACE2 with VarAlign 45 . Residue mappings were derived from the Ensembl VEP annotations present in the gnomAD VCF. In addition, we manually checked the gnomAD multi nucleotide polymorphisms (MNPs) data file and found no records for ACE2. The structure of chimeric SARS-CoV-2 Spike receptor binding domain in complex with human ACE2 (PDB ID: 6vw1) 11 was downloaded from PDBe. Residue-residue contacts were calculated with ARPEGGIO 46 . These operations were run with our ProIntVar 45 Python package, which processes all these data into conveniently accessible Pandas DataFrames. ACE2 Spike interface residues were defined as those with any interprotein interatomic contact (Supplementary Table 2 ). The mCSM-PPI2 13 web server was used to predict the effect of mutations on the SARS-CoV-2 Spike-ACE2 interface topology and binding affinity with the structure PDB ID: 6vw1 11 according to our previous protocol 23 . The SPR determined ΔΔG (Kd-plateau) were regressed against the mCSM-PPI2 prediction, restricting the regression to the variants with negative predicted ΔΔG, with the lm function in R 47 . Recalibrated scores were calculated with the predict.lm function and applied to ACE2 variants within 10 angstroms of Spike. The ACE2 gene (ENSG00000130234) was retrieved from Ensembl in Jalview 25 . Two identical CDS transcripts (ENST00000427411 and ENST00000252519) were found with the Get Cross-References command corresponding to ACE2 full-length proteins (ENSP00000389326 and ENSP00000252519). These correspond to the UniProt ACE2 sequence Q9BYF1. The CDS was saved in Fasta format and this was parsed in Python with Biopython. The CDS was broken into codons and all possible single base changes were enumerated and translated using the standard genetic code. This provided the set of amino acids accessible to each residue via a single base change. Upper bound from gnomAD singleton frequency (Minimum reportable frequency) Upper bounds for the total prevalence of potential Spike affinity variants were calculated based on the conservative assumption that novel variants must occur at lower frequencies than the minimum reportable variant frequency (i.e., minimum singleton frequency) in gnomAD. The theoretical minimum reportable frequency is the allele frequency of a singleton variant at a site where all samples have been effectively called. For alleles on the X chromosome, the proportion of XX and XY samples is important since the number of alleles sequenced is 2 #$%&'$ + %&'$ . Practically, many reported singleton frequencies are greater than this theoretical minimum because at a given loci not all samples have sufficient sequence data quality to be called. Therefore, the minimum reportable frequency varies by genomic position, as well as population. Since gnomAD reports the allele number (AN) only for variant sites, we used the maximum observed allele number in ACE2. For example, the maximum allele number in ACE2 corresponding to non-Finnish European samples is 80,119 (AN_NFE = 80,119). This yields a minimum observed variant frequency in ACE2 amongst non-Finnish Europeans in gnomAD (v2 exomes) of 1/80,119 = 1.2 × 10 -5 , or 2.5 variants per 200,000 alleles. The total prevalence is then found by multiplying this value by the number of SNPs being considered. Empirical detection rate of rare ACE2 variants and the affinity active ratio based estimate Our second approach to estimate plausible frequencies of novel variants (P) was to project the empirical detection rate of rare ACE2 variants in gnomAD (k) onto the sites that correspond to the Spike interface (n), and then adjust this by the proportion of variants that are predicted to effect Spike binding (a) so that, = a For example, there are 1,181 variant alleles in the 56,885 non-Finnish European cohort (AN_NFE = 80,119) arising from variants with AF < 0.01 in the 2,415 nucleotide ACE2 coding sequence (n.b. at this AF threshold, 90 % are missense). This corresponds to a variation rate k = 6.1 × 10 -6 variants per nucleotide per allele. Projecting rate k onto the 25 ACE2 sites considered (n = 75 nucleotides) we find kn = 4.6 × 10 -4 variants per allele, or 91.6 variants per 200,000 alleles. Note that this should be a conservative estimate of variability at these sites, since surface residues tend to be more variable than the core or other functional regions of the protein 48 , unless the site is under specific selection. The proportion of SNPs that are defined to have an affinity phenotype is dependent on the ΔΔG recal threshold. For example, there are 38 substitutions corresponding to 41 single nucleotide variants out of the a possible 225 that are predicted to inhibit binding with ΔΔG recal < −1.0 kcal mol -1 (Table 3 ) so that a = 41 / 225 = 0.18. Altogether this gives P = 8.2 10 −5 for variants with ΔΔG recal < −1.0 kcal mol -1 . This calculation could be improved (e.g., to account for missense/ synonymous ratios in a) but in its current form is suitable to provide estimates that indicate plausible orders of magnitude of these variants' prevalence as intended. Software Jalview 2.11 25 was used for interactive sequence data retrieval, sequence analysis, structure data analysis and figure generation. UCSF Chimera 26 and PyMol 29 were used for structure analysis and figure generation. The pyDRSASP 45 packages, comprising ProteoFAV, ProIntVar and VarAlign were used for data retrieval and analysis. Biopython was used to process sequence data. Data analyses were coded in R and Python Jupyter Notebooks. Numpy, Pandas and Scipy were used for data analysis. Matplotlib, Seaborn and ggplot2 were used to plot data. All code and analysis notebooks used in this study are available from the Barton Group public GitHub repository at https://github.com/bartongroup/covid19-ace2-variants. This study employed several public datasets, which are available from their original sources and all necessary identifiers and accessions are provided in the methods. Data that was compiled and calculated as part of this study are available from the BioStudies database (https://www.ebi.ac.uk/biostudies) under accession S-BSST649. Supplementary . A pneumonia outbreak associated with a new coronavirus of probable bat origin Estimates of the severity of coronavirus disease 2019: a model-based analysis Clinical Characteristics of Coronavirus Disease 2019 in China Comparative genetic analysis of the novel coronavirus (2019-nCoV/SARS-CoV-2) receptor ACE2 in different populations Genetic mechanisms of critical illness in COVID-19 SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus Loss of furin cleavage site attenuates SARS-CoV-2 pathogenesis TMPRSS2 and ADAM17 cleave ACE2 differentially and only proteolysis by TMPRSS2 augments entry driven by the severe acute respiratory syndrome coronavirus spike protein Structure of SARS coronavirus spike receptor-binding domain complexed with receptor Structural basis of receptor recognition by SARS-CoV-2 Receptor and viral determinants of SARS-coronavirus adaptation to human ACE2 mCSM-PPI2: predicting the effects of mutations on protein-protein interactions The Contribution of Missense Mutations in Core and Rim Residues of Protein-Protein Interfaces to Human Disease Human susceptibility and resistance to Norwalk virus infection Population-Specific ACE2 Single-Nucleotide Polymorphisms Have Limited Impact on SARS Human ACE2 receptor polymorphisms and altered susceptibility to SARS-CoV-2 Interaction of the spike protein RBD from SARS-CoV-2 with ACE2: Similarity with SARS-CoV, hot-spot analysis and effect of the receptor polymorphism Structural Variations in Human ACE2 may Influence its Binding with SARS-CoV-2 Spike Protein ACE2 and TMPRSS2 variants and expression as candidates to sex and country differences in COVID-19 severity in Italy ACE2 gene variants may underlie interindividual variability and susceptibility to COVID-19 in the Italian population Engineering human ACE2 to optimize binding to the spike protein of SARS coronavirus 2 Missense variants in ACE2 are predicted to encourage and inhibit interaction with SARS-CoV-2 Spike and contribute to genetic risk in COVID-19. bioRxiv Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes Jalview Version 2--a multiple sequence alignment editor and analysis workbench UCSF Chimera--a visualization system for exploratory research and analysis Amino-aromatic interactions in proteins Effects of common mutations in the SARS-CoV-2 Spike RBD domain and its ligand the human ACE2 receptor on binding affinity and kinetics. bioRxiv, 2021 The PyMOL Molecular Graphics System Ethnicity and covid-19 The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic A global reference for human genetic variation Affinity thresholds for membrane fusion triggering by viral glycoproteins COVID-19: towards understanding of pathogenesis Virological assessment of hospitalized patients with COVID-2019 Structures and distributions of SARS-CoV-2 spike proteins on intact virions Reduced neutralization of SARS-CoV-2 B.1.1.7 variant by convalescent and vaccine sera Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom High affinity binding of SARS-CoV-2 spike protein enhances ACE2 carboxypeptidase activity Cellular entry of the SARS coronavirus Are patients with hypertension and diabetes mellitus at increased risk for COVID-19 infection? Analysis of protein-coding genetic variation in 60,706 humans A geographically matched control population efficiently limits the number of candidate disease-causing variants in an unbiased whole-genome analysis Improving biosensor analysis The Dundee Resource for Sequence Analysis and Structure Prediction Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein Structures A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Human Missense Variation is Constrained by Domain Structure and Highlights Functional and Pathogenic Residues. bioRxiv Supplementary Figure 2. Comparison of SPR ΔΔG with the log2 enrichment ratios from the nCoV-S-High sorts from Procko and co-workers 22 . Figure generated with R ggplot2 Supplementary Figure 3. Protein purification size exclusion chromatography and corresponding SDS-PAGE of labelled peaks C) ACE2 mutants E37K and T27R Anti ACE2 Antibody (NOVUS AC384-NBP2-80038) binding to WT and 10 ACE2 variants. A. Anti ACE2 binding to WT and four ACE2 variants: T27R, G326E, E37K and D355N. B. Binding to WT, S19P and K26R. C. Binding to WT, E35K and F40L. D. Binding to WT, E329G and P389H. Anti ACE2 was injected for 600 s association and buffer for a further 600 s for dissociation Representative binding trace for WT RBD binding immobilised WT ACE2. Injections start from lowest to highest concentration of RBD and fits are shown in blue. B. Plateau binding values from A are plotted against the concentration of RBD We thank Johannes Pettmann for help with protein expression and Anna Huhn for help with data analysis. We thank Dr Jim Procter for alerting us to the release of the structure of ACE2 in complex with SARS-CoV-2 S and his initial observations regarding variants at the ACE2-S interface. We thank the Dundee Research Computing team for supporting our IT infrastructure and remote working. This work was supported by Biotechnology and Biological Sciences Research Council Grants (BB/J019364/1 and BB/R014752/1) and Wellcome Trust Biomedical Resources Grant (101651/Z/13/Z). OD is supported by a Wellcome Trust Senior Fellowship in Basic Biomedical Sciences (207537/Z/17/Z828). SM performed the computational and population genetic analyses, compiled and analysed data, and wrote the first draft manuscript. MB conducted all wet lab experiments and compiled and analysed the associated data. GJB and SM conceived the research. All authors interpreted results and edited and reviewed the final manuscript. AvdW declares ownership of shares in BioNTech SE.