key: cord-0941910-lrw1yvhk authors: Ortuso, Francesco; Mercatelli, Daniele; Guzzi, Pietro Hiram; Giorgi, Federico Manuel title: Structural genetics of circulating variants affecting the SARS-CoV-2 spike/human ACE2 complex date: 2021-02-13 journal: Journal of biomolecular structure & dynamics DOI: 10.1080/07391102.2021.1886175 sha: 0218327e87a1dfc28e7c448bb85622cdd48bce9d doc_id: 941910 cord_uid: lrw1yvhk SARS-CoV-2 entry in human cells is mediated by the interaction between the viral Spike protein and the human ACE2 receptor. This mechanism evolved from the ancestor bat coronavirus and is currently one of the main targets for antiviral strategies. However, there currently exist several Spike protein variants in the SARS-CoV-2 population as the result of mutations, and it is unclear if these variants may exert a specific effect on the affinity with ACE2 which, in turn, is also characterized by multiple alleles in the human population. In the current study, the GBPM analysis, originally developed for highlighting host-guest interaction features, has been applied to define the key amino acids responsible for the Spike/ACE2 molecular recognition, using four different crystallographic structures. Then, we intersected these structural results with the current mutational status, based on more than 295,000 sequenced cases, in the SARS-CoV-2 population. We identified several Spike mutations interacting with ACE2 and mutated in at least 20 distinct patients: S477N, N439K, N501Y, Y453F, E484K, K417N, S477I and G476S. Among these, mutation N501Y in particular is one of the events characterizing SARS-CoV-2 lineage B.1.1.7, which has recently risen in frequency in Europe. We also identified five ACE2 rare variants that may affect interaction with Spike and susceptibility to infection: S19P, E37K, M82I, E329G and G352V. Communicated by Ramaswamy H. Sarma The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has emerged in late 2019 as the etiological cause of a pandemic of severe proportions dubbed Coronavirus Disease 19 . The disease has reached virtually every country in the globe (Hilton & Keeling, 2020) , with more than 40,000,000 confirmed cases and more than 1,100,000 deaths (source: World Health Organization). SARS-CoV-2 is characterized by a 29,903-long single stranded RNA genome, densely packed in 11 Open Reading Frames (ORFs); the ORF1 encodes for a polyprotein which is further split in 16 proteins, for a total of 26 proteins . The second ORF encodes for the Spike (S) protein, which is the key protagonist in the viral entry into host cells, through its interaction with human epithelial cell receptors Angiotensin Converting Enzyme 2 (ACE2) (Tai et al., 2020) , Transmembrane Serine Protease 2 (TMPRSS2) (Hoffmann et al., 2020) , Furin (Xia et al., 2020) and CD147 (Ulrich & predicted events/year (Hadfield et al., 2018) , the virus is in continuous evolution from the original Wuhan reference sequence (NC_045512.2) (Tang et al., 2020) , and there are currently at least six major variants circulating in the population . Some of these strains are characterized by a mutation in Spike, at AA 614, whereas an Aspartic Acid (D) is substituted by a Glycine (G) (Sashittal et al., 2020) . In fact, the Spike D614G mutation gives the name to the most frequent viral clade (G), which was first detected in Europe at the end of January 2020, and is currently present in all continents, with increasing frequency over time . D614G does not fall within the putative RBD (AA $330-530), but some studies suggest it may have a clinically relevant role: D614G is positively correlated with increased case fatality rate (Becerra-Flores & Cardozo, 2020) , and it shows increased transmissibility and infectivity compared to the reference genome (Korber, 2020) . In vitro studies show that viruses carrying the D614G Spike mutation have an increased viral load and cytopathic effect in cultured Vero cells (Tang et al., 2020) . Despite these preliminary observations, there are still several doubts on the molecular effects of the D614G variant (Grubaugh et al., 2020) . Other recurring Spike mutations have been observed in the population worldwide, however at frequencies of 1% or below ; some of these mutations fall within the RBD and therefore may have a direct role in ACE2 interaction. On the other hand, genetic variants of ACE2 in human population may influence susceptibility or resistance to SARS-CoV-2 infection, possibly contributing to the difference in clinical features observed in COVID-19 patients (Benetti, 2020) . ACE2 gene is located on chromosome Xp22.2 and consists of 18 exons, coding for an 805 AAs long protein exposed on the cell surface of a variety of human organs, including kidneys, heart, brain, gastrointestinal tract, and lungs (Burrell et al., 2013) . It is unclear if tissue-expression patterns of ACE2 may be linked to the severity of symptoms or outcomes of SARS-CoV-2 infections; however, ACE2 levels in lungs were found to be increased in patients with comorbidities associated to severe COVID-19 clinical manifestations (Pinto, 2020) , whereas polymorphisms of ACE2 have been already described to play a role in hypertension and cardiovascular diseases (Bosso et al., 2020) , particularly in association with type 2 diabetes (Burrell et al., 2013) , all conditions predisposing to an increased risk of dying from COVID-19 (Zheng, 2020) . Despite early studies, the presence of Spike mutations potentially altering the binding with ACE2 is still largely under-investigated, as is the role of ACE2 variants in the human population in determining patient-specific molecular interactions between these two proteins. In the present study, we aim at detecting which Spike and ACE2 AAs are the most important in determining the SARS-CoV-2 entry interaction and analyze which ones have already mutated in the population. The task is clinically relevant, providing a functional characterization of present and future mutations targeting the ACE2/Spike binding and detected by sequencing SARS-CoV-2 on a patient-specific basis. Characterizing the variability of both proteins must be taken in consideration in the process of developing anti-COVID-19 strategies, such as the Spike-based vaccine currently deployed by the National Institute of Allergy and Infectious Diseases and Moderna (Jackson, 2020) . We set out to analyze the key AAs involved in the Spike/ ACE2 interaction, in order to highlight which ones may alter the binding affinity and therefore etiological and clinical properties of different SARS-CoV-2 variants on different patients. Following that, we determined which Spike and ACE2 AA variations relevant for this interaction have been observed in the SARS-CoV-2 and human population, respectively. We obtained structural models of the SARS-CoV-2 Spike interacting with the human ACE2 from three recent X-ray structures, deposited on the Protein Data Bank: 6LZG , 6M0J and 6VW1 (Shang et al., 2020) . For 6VW1, two Spike/ACE2 complexes were available, so we report results for both as 6VW1-A and 6WV1-B, separately. All models show the core domains of interaction, located in the region of AA 330-530 for Spike and in the region AA 15-615 of ACE2. Full length proteins would be 1273 AAs (Spike only known isoform, from reference SARS-CoV-2 genome NC_045512.2) and 805 AAs (ACE2 isoform 1, UniProt id Q9BYF1-1). Selected PDB entries are wild type and their primary sequence and the higher order structures were identical. Residues 517-519 were missed in 6VW1-B. With the aim to investigate the conformation variability, PDB complexes were aligned by backbone and the Root Mean Square deviation (RMSd) was computed on all equivalent not hydrogen atoms. RMSd data have shown some conformation flexibility that confirmed our idea to take into account all PDB structures in the next investigation ( Figure 1 ). The GBPM method was originally developed for identifying and scoring pharmacophore and protein-protein interaction key features by combining GRID molecular interaction fields (MIFs) according to the GRAB tool algorithm (Ortuso et al., 2006) . In the present study, GBPM has been applied to all selected complex models considering Spike and ACE2 either as host or guest. DRY, N1 and O GRID probes were considered for describing hydrophobic, hydrogen bond donor and hydrogen bond acceptor interaction. For each probe a cutoff, required for highlighting the most relevant MIFs points, was fixed above the 30% from the corresponding global minimum interaction energy value. With respect to the known GBPM application, where pharmacophore features are used for virtual screening purposes, here these data guided us in the complex stabilizing AAs identification. In fact, Spike or ACE-2 residues, within 3 Å from GBPM points, were marked as relevant in the host-guest recognition and were qualitatively scored by assigning them the corresponding GBPM energy. If a certain residue was suggested by more than one GBPM point, its score was computed as summa of the related GBPM points energy ( Figure 2 ). Finally, for each selected residue, the four models averaged score was considered for estimating the role in complex stabilization. Taking into account their average scores, Spike and ACE2 AAs were divided by quartiles to facilitate the interpretation of the results: quartile 1 (Q1) includes the strongest complex stabilization contributors; quartile 2 (Q2) contains residues less important than those reported in Q1 but most relevant of those included in quartile 3 (Q3); quartile 4 (Q4) indicates the weakest predicted interacting AAs. Such an extension of the original approach allowed us to highlight known relevant interaction residues of both Spike (Table 1) and ACE-2 (Table 2) . Basically, the same number of AAs was highlighted for Spike (26 AAs) and ACE2 (25 AAs). The average score was also in the same range. Spike reported a population of Q1 larger than ACE2: 12 and 7 AAs, respectively. The opposite scenario was observed in the Q2 that accounted for 7 residues for Spike and 11 for ACE2. No remarkable difference can be addressed to the Q3 and Q4 Spike-ACE2 comparison. We reasoned that mutations and variants in Q1 residues could have a more relevant impact in the complex stability. The analysis of all designed GBPM suggested the Spike-ACE2 molecular recognition is largely sustained by polar interactions, such as hydrogen bonds, and by very few putative hydrophobic contributions (Table 3) . We analyzed 295,507 publicly available SARS-CoV-2 fulllength genome sequences collected worldwide and deposited on the GISAID database on December 30, 2020 (Shu & McCauley, 2017) . From these, we obtained 257,434 samples containing at least one AA-changing mutation in the Spike protein. A total of 3314 different AA-changing mutations were detected in the 1279 AA-long Spike sequence. However, many of these are unique events (or possibly even sequencing errors), as only 2023 mutations were found in more than one sample, 788 were found in more than 10 samples, and 196 in more than 100 samples (Supplementary File 1). We then focused on mutations located in the Spike RBD (AA 330-530) with predicted interaction contribution, as assessed by our GBPM method. The majority of mutations here are found in only a handful of samples (Table 4 and Figure 4 (A)), with a few notable exceptions. The mutations S477N and N439K are the most frequent in the current population and were identified in 16,547 patients (5.60%) and 5587 patients (1.89%) respectively. These two variants (N439K and S477N) are also amongst the top 20 most frequent in the population and involve two positions productively contributing to the interaction between Spike and ACE2, according to GBPM (see Table 1 and Figure 3 for locations 439 and 477). The graphical inspection of the PDB structures revealed that Spike Asparagine (N) 439, raked at GBPM Q2, is mainly involved in intra-protein interaction. In fact, by means of its backbone sp2 oxygen atom, N439 accepts one hydrogen bond from Spike Serine 443 side chain and, by its side chain amide group, donates one hydrogen bond to the Spike Proline 499 backbone: all these AAs are located into a random coil loop of Spike so the N439K could minimally modify the Spike-ACE2 recognition. On the other hand, after the theoretical mutation of the Asparagine 439 with a Lysine, it is possible to predict a productive electrostatic interaction between the new net positively charged residue and the ACE2 Glutamate 329. Such a long-distance interaction could improve the stabilization of the complex with respect to the Spike wild type (Supporting information Figure S1 ). A similar effect could be addressed to the mutation at position 477. Serine (S) 477 is a weak contributor to the complex interaction. In all PDB entries we selected, Serine 477 is located into a solvent exposed random coil loop. No interaction with ACE2 or Spike residues can be observed. Actually, the GBPM analysis included such a residue in Q2. Conversely, its mutation to Asparagine (S477N), in our in silico model, revealed the possibility to establish hydrogen bond to the ACE2 Serine 19 that can clearly result in a stabilization of the complex (Supporting information Figure S2 ). Moreover, position 477 is also affected by three other events with lower occurrence: S477I, S477R and S477G, with 6, 2 and 2 observations (Table 4 ). Among all, the S447R could be the most interesting one. Actually, a net positively charged residue, such as Arginine (R), can establish a weak electrostatic interaction to ACE2 Glutamate 87, as suggested by a theoretical model we built. The S477I and S477G could modify the conformation of a random coil segment, so it does not appear very relevant. Conversely, S477N and S477G could productively contribute to the Spike ACE2 complex stabilization. Of course, deeper theoretical and experimental investigations should be carried out to confirm this hypothesis. Unfortunately, full-scale simulations cannot be rigorously performed today because the available 3 D structural models report only fragments of the complex between Spike and ACE2. The third most common mutation, N501Y (Figure 3 ), targets an AA predicted to have a strong role in the interaction in all four models, sitting in the GBPM Q1. N501Y was detected in 4921 patients (1.67% of the dataset): the majority of which were located in the United Kingdom (Shu & McCauley, 2017) . From a structural point of view, we predict that a substitution, at position 501, of an Asparagine (N) with a Tyrosine (Y) may have an effect: their Total Polar Surface Area (TPSA), equal to 101.29 and to 78.43 Å 2 respectively, is different, however both their side chains can donate/accept a hydrogen bond. Therefore, their contribution to complex stabilization may be slightly different, also taking into account the chemical environment. In fact, the wild type Asparagine 501 donates one hydrogen bond to ACE2 Tyrosine 41: such an interaction could be possible also for N501Y mutant or, as we observed in our theoretical model, it could be replaced by pi-pi stacking (Supporting information Figure S3 ). The rapid increase in frequency of mutation N501Y has been recently observed in the United Kingdom and other countries, as it is one of the variants characterizing lineage B1.1.7 (Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations -SARS-CoV-2 coronavirus/nCoV-2019 Genomic Epidemiology -Virological, 2021). The Asparagine/ Tyrosine substitution in Spike position 501 could contribute to determine an evolutionary advantage for this lineage, based on differential affinity for the human receptor ACE2 (Fratev, 2020; Leung et al., 2020) . A less frequent mutation amongst those predicted to contribute to the ACE2/Spike interaction is G476S, detected in 43 samples (0.02%), and supported by three out of four structural models (Table 1, Figure 4 (B)). The Glycine (G) 476 was included by GBPM analysis in Q2: its contribution to the complex stabilization is weak. Conversely to the other mutation described here, the replacement of Glycine 476 with a Serine (S) could have more evident effects on Spike ACE2 molecular recognition. In fact, in all PDB entries, the alpha carbon of this Glycine is very close, about 4 Å, to the side chain amide group of the ACE2 Glutamine 24. Between these two AAs no productive interaction can be established but the substitution of the Spike Glycine with a Serine could Figure 2 . Summary of the pipeline adopted by GBPM to identify key residues contributing to the SARS-CoV-2 Spike/Human ACE2 interface. Spike is depicted in cyan, and ACE2 in orange, based on the 6LZG PDB model . Residues highlighted by GBPM are then tested for mutation frequency in the worldwide SARS-CoV-2 population. allow one inter-protein hydrogen bond to ACE2 Glutamine 24. Moreover, G476S could establish the same interaction with Spike Glutamine 478 that could stabilize the conformation of a random coil segment of the viral protein resulting in a better pre-organization to the ACE2 recognition (Supporting information Figure S4 ). Another Spike residue, predicted by our analysis for playing a relevant role in ACE2 recognition, is the Glutamine 493 (Table 1 ). The GISAID data revealed that such an aminoacid is rarely replaced by a Leucine (Q493L) or by an Arginine (Q493R). These mutations could affect the recognition of ACE2 in an opposite way. Spike Glutamine 493 is involved in hydrogen bond with ACE2 Glutamate 35. The mutation Q493L cannot establish such a productive contribution and could only hydrophobically interact to Spike Leucine 455. Conversely, Q493R could locate its net positively charged side chain into an ACE2 pocket delimited by Aspartate 30, Histidine 34 and Glutamate 35. Such a positioning could produce a remarkable electrostatic stabilization of the complex (Supporting information Figure S5 ). In general, we could observe that AAs with the strongest evidence for interaction contribution in the Spike/ACE2 interface tend not to diverge from the reference (Figure 4(B) ), which may indicate a solid evolutionary constraint to maintain the interface residues unchanged. For example, one of the most relevant 1st quartile AA in the ACE2/Spike interaction, Glutamine (Q) 493, is rarely mutated, with 12 cases of Q493L, 4 of Q493 Ã (the substitution of Q493 with a stop codon), three of Q493K, and one of Q493R and Q493H. One possible exception is the aforementioned Spike mutation N501Y, located in the strongest 1st quartile GBPM-predicted AA for ACE2 binding, which was found in the considerable number of 4921 different patients. We also investigated the variants of human ACE2, since these could constitute the basis for patient-specific COVID-19 susceptibility and severity. ACE2 protein sequence is highly conserved across vertebrates (Guzzi et al., 2020) and also within the human species (Cao et al., 2020) , with the most frequent missense mutation (rs41303171, N720D) present in 1.5% of the world population (Supplementary File 2) . Our analysis shows that only five variants of ACE2 detected in the human population are also located in the ACE2/Spike direct binding interface (Table 5 and Figure 5 ). Of these, rs73635825 (causing a S19P AA variant) is both the most frequent in the population (0.06%) and the most relevant in the interaction with the viral protein, with a GBPM score of À47.6175 (Q1) and support from all four models ( Table 2 ). The rs73635825 SNP frequency is higher in the population of African descent (0.2%). The second SNP, rs143936283 (E329G , Table 5 ) is a very rare allele (0.0066%) in the European (non-Finnish) Asian population. The rs766996587 (M82I) SNP is also a very rare allele (0.0066%) found in the African population. E37K (rs146676783) is more frequent in the Finnish (0.03%) and G352V (rs370610075) in the European non-Finnish (0. 007%) population. None of these five SNPs have a reported clinical significance, according to dbSNP and literature search (Sherry et al., 2001) . It must be mentioned that M82I, together with S19P, has been predicted to adversely affect ACE2 stability (Hussain, 2020) . M82I, together with E329G, has been simulated to increase binding affinity with Spike when compared to wild type ACE2, hypothesizing greater susceptibility to SARS-CoV-2 for patients carrying these variants . Instead, E37K and G352V (Darbani, 2020) were predicted to possess a lower affinity with Spike, suggesting lower susceptibility to the infection. However, while describing potential explanations to the existence of a possible predisposing genetic background to infection, all these studies remain inconclusive in linking allele variants to COVID-19 susceptibility. Structurally, the S19P variant may greatly differ from the reference sequence in the interaction with ACE2: Serine (S) is a polar residue, able to accept and donate, by means of its side chain alcoholic group, a hydrogen bond. Proline (P), on the other hand, cannot be involved in hydrogen bonding, and therefore should establish a weaker interaction with Spike. In fact, ACE2 Serine 19 side chain donates a hydrogen bond to Spike Alanine 475 backbone (Supporting information Figure S6 ) and potentially could establish the same interaction with Spike Glycine (G) 476, which could also be mutated (Table 4 ). Both Methionine (M) 82 and Glutamate (E) 329 are in Q3 minimally contributing to Spike ACE2 recognition (Supporting information Figures S7 and S8 ). They are located within two alpha helices so their mutation could modify the secondary structure of ACE2 corresponding to a different affinity against Spike. Such a possibility should be more evident in the case of E329G because Glutamate 329 side chain is involved in hydrogen bond with ACE-2 Glutamine 325. SARS-CoV-2 Spike evolved through a series of adaptive mutations that increased its affinity for the human ACE2 receptor (Ortega et al., 2020) . There is no reason to believe that the evolution and adaptation of the virus will stop, making continuous sequencing and mutational tracking studies of paramount importance to strategically contain COVID-19 (Meredith et al., 2020) . In our study, we highlighted which specific locations of Spike can influence the ACE2 molecular recognition, required for the viral entry into the host cell (Hoffmann et al., 2020) . We further showed that some Table 4 . Spike mutations located within the RBD (AA 330-530) with at least two cases in the population and non-zero GBPM average score in the ACE2/Spike interaction models. The asterisk ( Ã ) indicates a stop codon. A lower GBPM score indicates a stronger effect in the ACE2/Spike interaction. mutations are already present in the SARS-CoV-2 population that may weakly affect the interaction with the human receptor, specifically Spike N439K, S477N and N501Y. These mutations are rising in the viral population (>1%) and in particular N501Y is one of the key mutations characterizing lineage B.1.1.7 (Leung et al., 2020) , which has seen a recent dramatic increase in frequency in the United Kingdom (Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations -SARS-CoV-2 coronavirus/nCoV-2019 Genomic Epidemiology -Virological , XXXX). Having identified this mutation proves that our combination of targeted mutation frequency and GBPM is a useful pipeline to monitor events in the key region used by SARS-CoV-2 to recognize and enter human bronchial cells. The same approach can be used to monitor, in the future, if any of these events will increase in frequency, suggesting an adaptation to the human host leveraging a higher affinity with ACE2. On the other hand, we studied the variants in the human ACE2 population, identifying five loci that can affect the binding with SARS-CoV-2 Spike. They are all rare variants, with the most frequent, S19P, present in 0.06% of the population, and with no known clinical significance. However, other in silico studies have predicted their role in decreasing ACE2 stability (S19P and M82I) (Hussain, 2020) , and in altering the affinity with Spike (increasing it: M82I and E329G ; decreasing it: E37K and G352V (Darbani, 2020) ). The most common ACE2 variant, rs41303171 (N720D), is not located in the binding region, and so far its predicted effects on the etiopathology of COVID-19 are still largely conjectural and associated to neurological complications via mechanisms probably independent from direct interaction with Spike (Strafella et al., 2020) . It remains to be seen whether, in the future, the combination of Spike and ACE2 sequences will produce novel and unexpected COVID-19 specificities, that will require granular efforts in developing wider-spectrum anti-SARS-CoV-2 strategies, such as vaccines or antiviral drugs. So far, our analysis has shown a location on the Spike/ACE2 complex where both proteins vary in the viral/human population, specifically on ACE2 S19 and Spike A475/G476. While, as described in our Results, these mutations on Spike are not likely to strongly affect the interaction surface, future combinations of ACE2/Spike variants may have peculiar effects that will require constant mutation monitoring. Identifying single or multiple AAs involved in this viral entry interaction will allow for personalized diagnosis and clinical prediction based on the specific combination of SARS-CoV-2 strain and ACE2 variant. Personalized COVID-19 treatment will require targeted sequencing of the patient ACE2 and Spike, to identify the combination causing the specific case. This technical obstacle can be further complicated by the intra-host genetic variability of SARS-CoV-2, which has recently been reported from RNA-Sequencing studies (Shen et al., 2020) . Structural investigation will benefit, in the next future, from the availability of experimental structural models reporting the complete sequence of both Spike and ACE2, or at least Spike. This will allow more rigorous computational analyses (i.e. molecular dynamics simulation, free energy perturbation) on the effect of mutations on the Spike/ACE2 recognition. Beyond the complex investigated in this manuscript, our approach can be fully extended to any other partners in the SARS-CoV-2/human interactome, for example the recently discovered interaction between viral protease NSP5 (Gordon et al., 2020) and human histone deacetylase HDAC2 (Milazzo et al., 2020) , which is indirectly responsible for the transcriptional activation of pro-inflammatory genes. Our approach can also be extended to other viruses exploiting human receptors as an entry mechanism, such as CD4 for the Human Immunodeficiency Virus (HIV) or TIM-1 for the Ebola virus (Grove & Marsh, 2011) . The PDB (Berman et al., 2000) was searched for high-resolution Spike/ACE2 complexes. PDB entries 6LZG , 6M0J and 6VW1 (Shang et al., 2020) , reporting the Spike RBD interacting to ACE2, have been retrieved and taken into account for our GBPM analysis (Ortuso et al., 2006) . Such a computational approach compares GRID (Goodford, 1985) molecular interaction fields (MIFs) computed on a generic complex (A) and on its host (B) and guest (C) components, separately. Actually, MIFs describe the interaction between a certain probe and a certain target. If the target is represented by a complex, depending on the selected area, the MIF energies can be referred to the interaction between the probe and one of the complex subunits or, at the host/guest interface, with both of them. The GBPM analysis, objectively, highlights these last. Five steps are required: (1) the complex A is disassembled in its subunits B and C; (2) MIFs are computed on A, B and C by using the most appropriate GRID probes. A Figure 5 . Frequency of mutations on ACE2. X-axis indicates the AA position in isoform 1 (UniProt id Q9BYF1-1). Y-axis indicates the allele frequency in the global population according to the GNOMAD v3 database. Labels indicate AA changes observed in the human population with non-zero GBPM average score in the ACE2/Spike interaction models. Vertical dashed lines indicate the crystalized region analyzed in this study (aa 15 -615). Before starting the GBPM analysis, co-crystalized water molecules were removed from PDB structures. In 6VW1, showing two Spike-ACE2 complexes, namely chains A-E and B-F, both structures have been investigated and further reported as model A and B, respectively. All selected complexes have been conformationally compared with each other by alignment and computing the RMSd on the cartesian coordinates of equivalent non hydrogen atoms. DRY, N1 and O original GRID probes have been used to highlight hydrophobic, hydrogen bond donors and acceptors areas. In order to identify the most relevant residues of both Spike and ACE2, we conceptually and technically extended the GBPM algorithm, originally designed for drug/target interactions (Ortuso et al., 2006) . In the GBPM analysis presented here, the two interacting proteins have been considered either as host and guest units, and relevant AAs were selected if their distance from GBPM features was lower or equal to 3 Å. For each PDB model, the selected residues were scored as summa of the corresponding GBPM features interaction energy. In order to prevent unrealistic distortion of the Spike-ACE2 complex, due to the usage of structures not covering the full length of the interacting proteins, the mutations effect has been qualitatively estimated by means of the mutagenesis tool implemented in PyMol software (PyMOL, 2017) . Wild type residues have been replaced by the mutation and the new side chain conformations have been optimized taking into account the neighboring AAs. The graphical analysis was carried out onto the predicted most populated rotamers. On the basis of its better X-ray resolution, the 6M0J PDB structure has been selected for the above reported investigation. SARS-CoV-2 genome sequences from human hosts and accounting for a total of 145,201 submissions were obtained from the GISAID database on 15 October 2020 (Shu & McCauley, 2017) . Low quality (with more than 5% uncharacterized nucleotides) and incomplete (<29,000 nucleotides, based on a total reference length of 29,903) sequences were removed. The resulting 135,591 genome sequences were aligned on the reference SARS-CoV-2 Wuhan genome (NCBI entry NC_045512.2) using the NUCMER algorithm (Marc¸ais et al., 2018) . Position-specific nucleotide differences were merged for neighboring events and converted into protein mutations using the coronapp annotator . The results were further filtered for AA-changing mutations targeting the Spike protein. ACE2 variants in the human population were extracted from the gnomAD database, v3, 18 July 2020 (Karczewski, et al., 2020) . We considered only missense variants affecting specific AAs in the protein sequence, for a total of 155 entries (Supplementary File 2). Graph generation was performed with the R statistical software and the corto package v1.1.2 (Mercatelli, Lopez-Garcia et al., 2020) . her suggestions on the use of gnomAD, and Prof. Stefano Alcaro who provided the computational resources required by the GBPM analysis. Finally, we thank Mr. George Wolf for the final proofreading the manuscript. FMG, PHG and FO designed the study. FO designed and performed the structural analysis. FMG designed the genetics analysis. FMG and DM performed the genetics analysis. FMG financially supported the study. PHG drafted the manuscript and performed literature search. All authors contributed to the writing of the final version of the manuscript. No potential conflict of interest was reported by the author(s). We developed a method to identify key amino acids responsible for the initial interaction between SARS-CoV-2 (the COVID-19 virus) and human cells, through the analysis of Spike/ACE2 complexes. We further identified which of these amino acids show variants in the viral and human populations. Our results will facilitate scientists and clinicians alike in identifying the possible role of present and future Spike and ACE2 sequence variants in cell entry and general susceptibility to infection. SARS-CoV-2 viral spike G614 mutation exhibits higher case fatality rate ACE2 gene variants may underlie interindividual variability and susceptibility to COVID-19 in the Italian population. Genetic and Genomic Medicine. Advance online publication The protein data bank The two faces of ACE2: The role of ACE2 receptor and its polymorphisms in hypertension and COVID-19 The ACE2 gene: Its potential as a functional candidate for cardiovascular disease Comparative genetic analysis of the novel coronavirus (2019-nCoV/SARS-CoV-2) receptor ACE2 in different populations Genomic variance of the 2019-nCoV coronavirus The expression and polymorphism of entry machinery for COVID-19 in human: Juxtaposing population groups, gender, and different tissues The N501Y and K417N mutations in the spike protein of SARS-CoV-2 alter the interactions with both hACE2 and human derived antibody: A Free energy of perturbation study A computational procedure for determining energetically favorable binding sites on biologically important macromolecules A SARS-CoV-2 protein interaction map reveals targets for drug repurposing The cell biology of receptor-mediated virus entry Making sense of mutation: What D614G means for the COVID-19 pandemic remains unclear Master Regulator Analysis of the SARS-CoV-2/Human Interactome Nextstrain: Real-time tracking of pathogen evolution Targeting ACE2-RBD interaction as a platform for COVID19 therapeutics: Development and drug repurposing screen of an AlphaLISA proximity assay. bioRxiv. Advance online publication Estimation of country-level basic reproductive ratios for novel Coronavirus (SARS-CoV-2/COVID-19) using synthetic contact matrices SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor Structural variations in human ACE2 may influence its binding with SARS-CoV-2 spike protein An mRNA vaccine against SARS-CoV-2 -Preliminary report The mutational constraint spectrum quantified from variation in 141,456 humans Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2. Biorxiv. Advance online publication Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Early empirical assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom MUMmer4: A fast and versatile genome alignment system Geographic and genomic distribution of SARS-CoV-2 Mutations corto: A lightweight R package for gene network inference and master regulator analysis Coronapp: A web application to annotate and monitor SARS-CoV-2 mutations Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care associated COVID-19: A prospective genomic surveillance study. The Lancet Infectious Diseases Histone deacetylases (HDACs): Evolution, specificity, role in transcriptional complexes, and pharmacological actionability Role of changes in SARS-CoV-2 spike protein in the interaction with the human ACE2 receptor: An in silico analysis GBPM: GRID-based pharmacophore model: Concept and application studies to protein-protein recognition Emergence of RBD mutations in circulating SARS-CoV-2 strains enhancing the structural stability and human ACE2 receptor affinity of the spike protein The inhibitory effect of a Corona virus spike protein fragment with ACE2. Biorxiv. Advance online publication ACE2 expression is increased in the lungs of patients with comorbidities associated with severe COVID-19 Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations -SARS-CoV-2 coronavirus/nCoV-2019 Genomic Epidemiology -Virological The PyMOL Molecular Graphics System, Version 2.0 (Schr€ odinger, LLC) Characterization of SARS-CoV-2 viral diversity within and across hosts Structural basis of receptor recognition by SARS-CoV-2 Genomic diversity of severe acute respiratory syndrome -Coronavirus 2 in patients with coronavirus disease 2019 dbSNP: The NCBI database of genetic variation GISAID: Global initiative on sharing all influenza data -from vision to reality Analysis of ACE2 genetic variability among populations highlights a possible link with COVID-19-related neurological complications Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: Implication for development of RBD protein as a viral attachment inhibitor and vaccine On the origin and continuing evolution of SARS-CoV-2 CD147 as a target for COVID-19 treatment: Suggested effects of azithromycin and stem cell engagement Molecular simulation of SARS-CoV-2 spike protein binding to pangolin ACE2 or human ACE2 natural variants reveals altered susceptibility to infection Structural and functional basis of SARS-CoV-2 entry by using human ACE2 The role of furin cleavage site in SARS-CoV-2 spike protein-mediated membrane fusion in the presence or absence of trypsin Risk factors of critical & mortal COVID-19 cases: A systematic literature review and meta-analysis A novel coronavirus from patients with pneumonia in China We thank the Italian Ministry of Education and Research for their financial support under the Montalcini initiative. We thank Prof. Giovanni Perini for his continued support and scientific enthusiasm, Prof. Massimo Battistini for his lessons on logic and writing, Prof. Elena Bacchelli for