key: cord-0822251-40xsypzt authors: Nelson-Sathi, Shijulal; Umasankar, PK; Sreekumar, E; Radhakrishnan Nair, R; Joseph, Iype; Nori, Sai Ravi Chandra; Philip, Jamiema Sara; Prasad, Roshny; Navyasree, KV; Ramesh, Shikha; Pillai, Heera; Ghosh, Sanu; Santosh Kumar, TR; Radhakrishna Pillai, M. title: Mutational landscape and in silico structure models of SARS-CoV-2 Spike Receptor Binding Domain reveal key molecular determinants for virus-host interaction date: 2020-10-01 journal: bioRxiv DOI: 10.1101/2020.05.02.071811 sha: 87de8b2eb9b428bf0ad97ad95bcd1a8b6210c6f8 doc_id: 822251 cord_uid: 40xsypzt Protein-protein interactions between virus and host are crucial for infection. SARS-CoV-2, the causative agent of COVID-19 pandemic is an RNA virus prone to mutations. Formation of a stable binding interface between the Spike (S) protein Receptor Binding Domain (RBD) of SARS-CoV-2 and Angiotensin-Converting Enzyme 2 (ACE2) of host actuates viral entry. Yet, how this binding interface evolves as virus acquires mutations during pandemic remains elusive. Here, using a high fidelity bioinformatics pipeline, we analysed 31,403 SARS-CoV-2 genomes across the globe, and identified 444 non-synonymous mutations that cause 49 distinct amino acid substitutions in the RBD. Molecular phylogenetic analysis suggested independent emergence of these RBD mutants during pandemic. In silico structure modelling of interfaces induced by mutations on residues which directly engage ACE2 or lie in the near vicinity revealed molecular rearrangements and binding energies unique to each RBD mutant. Comparative structure analysis using binding interface from mouse that prevents SARS-CoV-2 entry uncovered minimal molecular determinants in RBD necessary for the formation of stable interface. We identified that interfacial interaction involving amino acid residues N487 and G496 on either ends of the binding scaffold are indispensable to anchor RBD and are well conserved in all SARS-like corona viruses. All other interactions appear to be required to locally remodel binding interface with varying affinities and thus may decide extent of viral transmission and disease outcome. Together, our findings propose the modalities and variations in RBD-ACE2 interface formation exploited by SARS-CoV-2 for endurance. Importance COVID-19, so far the worst hit pandemic to mankind, started in January 2020 and is still prevailing globally. Our study identified key molecular arrangements in RBD-ACE2 interface that help virus to tolerate mutations and prevail. In addition, RBD mutations identified in this study can serve as a molecular directory for experimental biologists to perform functional validation experiments. The minimal molecular requirements for the formation of RBD-ACE2 interface predicted using in silico structure models may help precisely design neutralizing antibodies, vaccines and therapeutics. Our study also proposes the significance of understanding evolution of protein interfaces during pandemic. CoV-2 and related SARS-CoV provided initial clues regarding molecular architecture of the interface. RBD comprises of 223 amino acid long peptide in the S1-region of S-protein ( Table 1 ) (Walls et al., 2020) . However, ACE2 binding information is confined to a variable loop region within RBD called Receptor binding motif (RBM). These structures elucidated key interfacial interactions responsible for enhanced binding affinity of SARS-CoV-2 to ACE2 than SARS-CoV . It also suggested that few amino acid changes in RBM can remodel the interface resulting in altered binding affinities and viral transmission. However, all these studies were based on parental SARS-CoV-2 Wuhan strain and several questions remain unanswered. What are the mutations acquired on RBD during COVID-19 and what are the interfacial molecular rearrangements induced by these mutations? Can we gain valuable insights regarding RBD-ACE2 interface formation by analyzing these mutations? To address these questions we investigated the mutational landscape of SARS-CoV-2 RBD. A total of 55,485 spike proteins of SARS-CoV-2 were directly downloaded on 29 th June 2020 from the GISAID database. We removed the partial sequences, sequences greater than 1% unidentified 'X' amino acids and sequences from low quality genomes. Further, 31,403 spike protein sequences along with Wuhan reference spike protein (YP_009724390.1) were aligned using Mafft (maxiterate 1,000 and global pair-ginsi) (Katoh et al. 2002) . The alignments were visualized in Jalview (Waterhouse AM et al., 2009) and the amino acid substitutions in each position were extracted using custom python script. We ignored the substitutions that were present in only one genome and unidentified amino acid X. The mutations that are present in at least two independent genomes in a particular position were further considered. These two criteria were used to avoid mutations due to sequencing errors. The mutated amino acids were further tabulated and plotted as a matrix using R script. For the Maximum-likelihood phylogeny reconstruction, we have used the SARS-CoV-2 genomes containing RBD mutations, and 10 genomes were sampled as representatives for each known subtype with Wuhan RefSeq strain as root. Sequences were aligned using Mafft (maxiterate 1,000 and global pair-ginsi), and phylogeny was reconstructed using IQ-Tree (Nguyen et al., 2015) . The best evolutionary model (GTR+F+I+G4) was picked using the ModelFinder program (Kalyaanamoorthy et al., 2017) . The structural analysis of the mutated spike glycoprotein of SARS-CoV-2 RBD domain was done to assess the impact of interface amino acid residue mutations on binding affinity towards the human ACE2 (hACE2) receptor. The crystal structure of the SARS-CoV-2 RBD-hACE2 receptor complex was downloaded from Protein Data Bank (PDB ID:6LZG) and the mutagenesis analysis To capture changes that affect viral tropism during pandemic, we searched for non-synonymous mutations in RBD sequences from SARS-CoV-2 genomes. Using unbiased and stringent filtering criteria, we analyzed 31,403 genomes deposited in GISAID till 29th June, 2020. Altogether, 444 non-synonymous mutations in RBD were identified that belong to viral genomes from 30 countries. These mutations were found to substitute 49 amino acid residues in which 23 residues lie within RBM ( Table 1) . These residues include those that directly engage ACE2 (G446, L455, A475, G476, E484, F490 and Q493) and those that are in the near binding vicinity (Figure 1A and 1B). Hot spot mutations were also identified that caused recurrent substitutions of amino acid residues in the same position (N354, P384, Q414, I468, S477, V483, F490, A520, P521 and A522). Each RBD mutation was found to be unique to the genome; a combination of mutations was never observed in our analysis. Overall, RBD mutations accounted for ~9% of the total non-synonymous mutations in S-protein. To see the evolutionary trend in RBD mutations, we compared RBDs from SARS-CoV-2, the related SARS-CoV and the bat coronavirus RaTG13, a suspected precursor of SARS-CoV-2. SARS-CoV-2 RBD is 73.4 % identical to SARS-CoV and 90.1 % identical to RaTG13 ( Figure 1A ). We identified several RBD mutations on residues that are unique to SARS-CoV-2 (N439K, V483A/F/I, E484D, F490S/L, Q493L and S494P) or are conserved in all three viruses ( Figure 1A ). In addition, we observed micro evolutionary reversion mutations in SARS-CoV-2 that interchange residues to that in SARS-CoV or RaTG13 (R346K, N354D, N439K, L452R, E471V and S477G) ( Figure 1A) . Interestingly, most of the unique and reversion mutations were located in the RBM region and thus may have implications in viral tropism. We performed phylogenomic analysis to understand the evolutionary pattern of RBD variants during pandemic and observed an unbiased clustering of RBD variants among distinct SARS-CoV-2 subtypes likely indicating independent emergence of these mutants (Figure 1C ). RBD is divided into a structured core region comprising five antiparallel β-sheets and a variable random coil region, RBM that directly binds to ACE2 (Figure 1B) . On the contrary, binding information on ACE2 is located across long α-helices. Structurally, RBM scaffold resembles a concave arch that makes three contact points with ACE2 α-helix; Cluster-I, II and III. Cluster-I and Cluster-III are on two ends and Cluster-II is towards the middle of the interface (Figure 2A) . We analysed the effect of observed RBM mutations on the molecular interactions at RBD-ACE2 binding interface. It has been shown that differences in ACE2 residues render mouse resistant to infection from SARS-like coronaviruses (Zhao et al., 2020) . Hence, to gain insights into relevant interactions that can create stable interface in mutants, we also included RBD-mouse ACE2 interface in our analysis ( Figure 2B) . Structure models were created for all mutants based on the information from three recently reported crystal/ cryo-EM structures of SARS-CoV-2 RBD-ACE2 bound complex (Figure 2A) . Comparative analysis of structures showed key differences in all three binding clusters of SARS-CoV-2 RBD wild type and mutant interfaces with human or mouse ACE2 (Figure 2C, 2D and Table S1 ). In cluster-I, F486 of SARS-CoV-2 RBM is found buried into the hydrophobic pocket made of human ACE2 residues L79, M82 and Y83. A mutation of F>L in SARS-CoV disrupts this pocket, thus weakens the binding affinity suggesting importance of this interaction (Wan et al., 2020) . In addition, N487 of SARS-CoV-2 RBM forms hydrogen bonds with Q24 and Y83 of human ACE2. The hydrophobic pocket and N487-Y83 interactions were completely abolished in mouse interface due to natural ACE2 substitutions in L79T, M82S and Y83F. But, these interactions were retained in all RBM mutants suggesting their importance in the pandemic. Nevertheless, interactions of A475/ G476 of SARS-CoV-2 RBD with S19 of ACE2 in cluster-I which were present in human and mouse were disrupted in mutants. In addition, SARS-CoV-2 genomes containing A475V and G476S replacements were identified in our analysis suggesting these mutations can be well tolerated. An additional hydrogen bond between Y489-Y83 was seen in mutants lacking A475/G476-S19 interaction. This could possibly be a compensatory mechanism to stabilize cluster-I interactions (Figure 2C, 2D and Table S1 ). Table S1 ). A bunch of interactions in cluster-III involving G446/Y449/Q498 of RBM and Q42 of ACE2 were present in human but abolished in mouse and RBM mutants. However, additional interactions to compensate for these were not seen. A hydrogen bond formed between G496 of RBM and K353 of human ACE2 appeared significant as this was completely abolished in mouse, owing to K353H substitution, but retained in all RBM mutants. In addition, other interactions; G502-K353, Y449-D38 and T500-Y41 in the same cluster were maintained in human, mouse and mutants likely suggesting their supportive role ( Figure 2C, 2D and Table S1 ). The varying interface arrangements in mutants were consistent with the binding affinity differences (ΔΔG). Compared to wild type, ΔΔG values of mutants ranged within ~ + 1 kcal/mol, with the lowest value close to that of SARS-CoV ( Figure S1) . Sine RBM is a variable loop, mutations on any residue could impact spatial arrangements of backbone leading to altered binding affinities. Consistently, we did not find a considerable difference in binding energies between mutations on residues that are involved in ACE2 interaction or are in the near vicinity. In conclusion, we could pinpoint two interfacial interactions that remain unaffected in all mutants analysed. These are interactions mediated through RBD residues N487 and G496 and are located in cluster-I and cluster-III respectively. Based on their spatial arrangement, these residues appear critical in directly anchoring the RBM loop onto ACE2. This may help initiate interface formation that favours viral entry. Both N487 and G496 are highly conserved in all SARS-like corona viruses further reinforcing this notion. The significant remodelling in cluster-II interactions indicates they are dispensable for anchoring but might be important for stabilizing the interface. Since RBD-ACE2 interface is a direct determinant of viral infectivity, along with other factors, varying interface architecture and binding affinities in SARS-CoV-2 RBM mutants may account for global variations in COVID-19 transmission and outcome. SARS-CoV-2 S-protein is highly immunogenic, so recombinant vaccines and neutralizing antibodies that target the whole S-protein or RBD are currently being considered in clinics . Our investigations reveal key molecular determinants and their modalities for RBD-ACE2 interaction. This information might be used to design vaccines, synthetic nanobodies or small molecules that could specifically target RBM anchoring residues or their binding pockets. TePyMol Molecular Graphics System The spike protein of SARS-CoV -a target for vaccine and therapeutic development SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor ModelFinder: Fast model selection for accurate phylogenetic estimates MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform Increasing the precision of comparative models with YASARA NOVA-a selfparameterizing force feld Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV ZDOCK server: interactive docking prediction of protein-protein complexes and symmetric multimers SWISS-MODEL: an automated protein homology-modeling server PIC: Protein Interactions Calculator LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus Structural and Functional Basis of SARS-CoV-2 Entry by Using Human ACE2 Enhanced receptor binding of SARS-CoV-2 through networks of hydrogen-bonding and hydrophobic interactions Jalview Version 2-a multiple sequence alignment editor and analysis workbench Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation PRODIGY: a web server for predicting the binding affinity of protein-protein complexes Mining of epitopes on spike protein of SARS-CoV-2 from COVID-19 patients Broad and Differential Animal Angiotensin-Converting Enzyme 2 Receptor Usage by SARS-CoV-2 The authors wish to acknowledge John B Johnson, Mahendran KR and Sara Jones for critical comments. The work was supported by the Department of Biotechnology, Government of India.