key: cord-1011685-hy7577ng authors: Khan, Abbas; Zia, Tauqir; Suleman, Muhammad; Khan, Taimoor; Ali, Syed Shujait; Abbasi, Aamir Ali; Mohammad, Anwar; Wei, Dong‐Qing title: Higher infectivity of the SARS‐CoV‐2 new variants is associated with K417N/T, E484K, and N501Y mutants: An insight from structural data date: 2021-03-23 journal: J Cell Physiol DOI: 10.1002/jcp.30367 sha: 92f108f00a1918c152311cdc1366a4746aafc9c3 doc_id: 1011685 cord_uid: hy7577ng The evolution of the SARS‐CoV‐2 new variants reported to be 70% more contagious than the earlier one is now spreading fast worldwide. There is an instant need to discover how the new variants interact with the host receptor (ACE2). Among the reported mutations in the Spike glycoprotein of the new variants, three are specific to the receptor‐binding domain (RBD) and required insightful scrutiny for new therapeutic options. These structural evolutions in the RBD domain may impart a critical role to the unique pathogenicity of the SARS‐CoV‐2 new variants. Herein, using structural and biophysical approaches, we explored that the specific mutations in the UK (N501Y), South African (K417N‐E484K‐N501Y), Brazilian (K417T‐E484K‐N501Y), and hypothetical (N501Y‐E484K) variants alter the binding affinity, create new inter‐protein contacts and changes the internal structural dynamics thereby increases the binding and eventually the infectivity. Our investigation highlighted that the South African (K417N‐E484K‐N501Y), Brazilian (K417T‐E484K‐N501Y) variants are more lethal than the UK variant (N501Y). The behavior of the wild type and N501Y is comparable. Free energy calculations further confirmed that increased binding of the spike RBD to the ACE2 is mainly due to the electrostatic contribution. Further, we find that the unusual virulence of this virus is potentially the consequence of Darwinian selection‐driven epistasis in protein evolution. The triple mutants (South African and Brazilian) may pose a serious threat to the efficacy of the already developed vaccine. Our analysis would help to understand the binding and structural dynamics of the new mutations in the RBD domain of the Spike protein and demand further investigation in in vitro and in vivo models to design potential therapeutics against the new variants. SARS-CoV-2 caused the coronavirus disease-19 was announced as a pandemic by the world health organization (Rothan & Byrareddy, 2020) . The SARS-CoV-2 possessed the positive-sense single-stranded 30 kb RNA genome, which codes for structural envelope (E), spike (S), nucleocapsid (N), and membrane (M), nonstructural and accessory proteins . The bat coronavirus RaTG13 is considered as the evolutionarily closest relative of SARS-CoV-2 and genomes of these two viruses that codes for structural protein depict~96% sequence similarity (Fehr & Perlman, 2015; Hussain et al., 2020) . Mechanisms of SARS-CoV-2 transmission and pathogenesis involve the binding of the virus to the host cellular angiotensin-converting enzyme (ACE2) receptors through the surface S-protein (Spike protein) which is composed of S1 and S2 subunit (Li, 2016) . The RBD (receptor binding domain) of S1 facilitates the binding of the virus to ACE2; however, the S2 subunit is responsible for membrane fusion which permits the entry of viral genome into host cytoplasm (Lan et al., 2020; Li, 2016) . The binding between viral S-protein and ACE2 receptor triggers the host antibody production against the RBD domain of S-protein, leading to host immunization (Abraham et al., 1990; Hoffmann et al., 2020) . Therefore, blocking ACE2-RBD interaction has been recognized as an essential way to inhibit SARS-CoV-2 transmission and infection (Walls et al., 2020) . Consequently, the SARS-CoV-2 RBD domain can serve as an important target for designing the anti-COVID-19 therapeutic strategies (Lan et al., 2020) Recent reports from England regarding the origin of novel contagious strain (B.1.1.7) of SARS-CoV-2 have further exacerbated the situation (Leung et al., 2021) . Novel mutations in the spike protein of B.1.1.7 (deletion 69-70, 144 and substitution K417N, K417T, E484K, N501Y, A570D, D614G, P681H, T716I, S982A, D1118H, and many others) might have altered the SARS-CoV-2 ability to transmit and infect. Consequently, the currently available vaccines against COVID-19 might not be effective against B.1.1.7 (Harrington et al., 2021) . The substitution mutations N501Y, E484K, and others) within RBD of the UK and South African SARS-CoV-2 strains are now spreading unchecked. The number of 501Y mutations associated cases has raised from 0.1 percent in October to 49.7% in the UK as estimated in late November. The mutation N501Y co-occurs with other mutations in the N, orf8, orf1a, and S glycoprotein in 501, involving two deletions Δ69 and Δ70. The new variant of South Africa carries K417N-E484K-N501Y mutations amongst others (Kirby, 2021; Koyama et al., 2020) . The new variants of both UK and South Africa seem to be more contagious; however, mutations in the UK variants are unlikely to impede the effectiveness of the developed vaccines, though the South African variants may interfere with it to some extent (Davies et al., 2020) . In this regard, the lack of empirical data is the major challenge to predict which one of the newly emerged strains of SARS-CoV-2 is more lethal. To relieve this challenge and diminish the dread of the current pandemic, many researchers worldwide with molecular and computational expertise are using different strategies and approaches like the repurposing of available antiviral drugs, vaccine designing, and mutational analysis. A more focused and detailed analysis is crucial for understanding the effect of these novel amino acid substitutions on the structure, function, and binding of the ACE2 receptor. Therefore, in the present study, we employed the protein-protein docking methods with a biophysical investigation to examine the effect of K417N-E484K-N501Y, K417T-E484K-N501Y, E484K, N501Y, and E484K-N501Y mutations on the structure and binding of the ACE2 receptor and their correlation with infectivity of newly emerged strains of SARS-CoV-2. Our analysis would help to understand the structural dynamics of the new mutations in the RBD domain of the Spike protein. 2 | METHODOLOGY Recently the novel strains of SARS-CoV-2 were reported in England, South Africa, and Brazil that has a high transmission speed and claimed to be very contagious. The predicted mutations in novel strains were found in Spike protein, which plays an indispensable role in the binding of the virus to the host cell. Keeping the importance of Spike protein in the viral infection, we retrieved the latest submitted amino acid sequence of SARS-CoV-2 Spike protein from UniProt (Magrane, 2011) available under accession number: P0DTC2 to identify the exact location of newly emerged mutations. Finally, the reported wild-type structure (6M0J) of SARS-CoV-2 (Lan et al., 2020) Spike protein was obtained from a protein data bank (Rose et al., 2010) and used for variant modeling by Chimera software (Goddard et al., 2005) . High ambiguity-driven protein-protein docking (HADDOCK) algorithm which uses biophysical and biochemical interaction data such as mutagenesis, ambiguous interaction restraints (AIRs), and chemical shift perturbation from NMR titration experiments was used to perform the protein-protein docking process (Dominguez et al., 2003) . To predict docking, we used the Guru interface, which is known to be the strongest interface between all four interfaces owned by the HADDOCK server. Approximately 500 features are used by Guru interface for protein-protein/DNA/RNA docking. Furthermore, to give a convincible insight into the K D (dissociation constant) was calculated using PROtein binDIng enerGY prediction (PRODIGY) which is an online server to compute the binding affinity and K D for different biological complexes (Xue et al., 2016) . The dynamic behavior of complex wild type, K417N-E484K-N501Y, K417T-E484K-N501Y, E484K, N501Y, and E484K-N501Y was checked by MD simulation performed on Amber20 using FF14SB force field. The system solvation was performed in a TIP3P water box, and the system was neutralized by the addition of counter ions (Price & Brooks, 2004) . Energy minimization protocol was used for the removal of the bad clashes in the system. The steepest descent algorithm (Meza, 2010) and the conjugate gradient algorithm were used for 6000 and 3000 cycles, respectively (Watowich et al., 1988) . After heating up to 300 K the system was equilibrated at constant pressure 1 atm with weak restraint and then equilibrated without any restraint. Finally, the production step was run for 100 ns. The long-range electrostatic interaction was treated with particle mesh Ewald algorithm with a cutoff distance of 10.0 Å. The SHAKE algorithm was used to treat a covalent bond (Kräutler et al., 2001) . The production step of MD simulation was performed on PMEMD.CUDA and trajectories were processed usingAmber20 CPPTRAJ package (Roe & Cheatham, 2013) . To estimate the real binding energy calculations of the wild type, K417N-E484K-N501Y, K417T-E484K-N501Y, E484K, N501Y, and E484K-N501Y, MMGBSA approach was used. This method is the best approach used by different studies to estimate the real binding energy of different biological complexes, such as Spike protein-ligand, protein-protein, and protein-DNA/RNA (Ali et al., 2019; Khan et al., 2018; Khan et al., 2019; Khan et al., 2020a; 2020c; 2020d) . The MMGBSA.py (Hou et al., 2011) script was used to estimate the total binding free energy of the top ligand complexes. Each energy term such as vdW, electrostatic, GB, and SA was calculated as a part of the total binding energy. For free energy calculation, the following equation was used: Each component of the total free energy was estimated using the following equation: Where G bond , G ele , and G vdW denote bonded, electrostatic, and van der Waals interactions, respectively. G-pol and G npol are polar and nonpolar solvated free energies. The G pol and G npol are calculated by the generalized born (GB) implicit solvent method with the solventaccessible surface area SASA term. A distressing situation has been created by the new variants of the SARS-CoV-2 that has ascended in the UK, South Africa, and Brazil. Among the evolved residues Leu455, Phe456, Phe486, Gln493, Gln498, and Asn501 (from SARs-CoV) in SARS-CoV-2 also involved Asn501 ( Figure 1b ). This residue Asn501 (SARS-CoV-2 Wuhan isolate) is changed to Tyr501 in the SARS-CoV-2 (B.1.1.7) strain, which shows that this residue is continuously subjected to positive selection pressure (Lan et al., 2020; Wang et al., 2020a) . Previously studies demonstrated that these five residues (given above) are responsible for the stronger binding of spike RBD to ACE2 in SARS-CoV-2 than the SARs-CoV (Li et al., 2005) . Looking into the significant role of these substitutions in the RBD domain of the spike glycoprotein, we generated K417N-E484K-N501Y, K417T-E484K-N501Y, E484K, N501Y, and E484K-N501Y double mutant to perform comparative structural and binding analysis. The mutants were generated by using Chimera and are given in Figure 1c Given the role of Spike protein in many important processes including attachment and pathogenesis next, we performed the binding analysis of the wild type and mutants (Spike RBD). Nearly all biological processes inside the cells are regulated by the interactions among different proteins. Variations in these interactions are liable for several disorders, making protein-protein complexes a key target for the advancement of therapeutics (Ray, 2014) . In this case, the identification of the structural determinants of these interactions and their binding energy is a crucial step towards a deeper understanding and regulation of KHAN ET AL. | 3 these processes. Notably, the binding affinity, which determines whether or not complex formation occurs under particular circumstances, holds the key to regulating molecular interactions (e.g., engineering high-affinity interactions), developing novel therapeutics (e.g., guiding rational drug design), or predicting the effect of variations on protein interfaces (Smith & Sternberg, 2002) . The binding affinity has been calculated for decades by different methodologies ranging from exact approaches (e.g., free energy perturbation), that are precise however computationally expensive compared to empirical methods (e.g., scoring functions in docking, various regression models), which are fast and accurate (Sprinzak et al., 2003) . Therefore, we used HADDOCK to perform the protein-protein docking of ACE2 with the wild spike-RBD, E484K spike- Additionally, as reported that the salt-bridges and other interactions are increased in the mutant complexes than the wild type, which is possible due to the increase in the electrostatic energy, specifically in the E484K complex. Among these, some interactions are strongly conserved between the wild type and mutant complexes. Among the key hotspots, Gln498 formed an interaction with the Lys353 of ACE2. This interaction is previously reported to be sustained during the molecular dynamics simulation (Ali & Vijayan, 2020) . Tyr449, a conserved residue, form key interaction with Glu30 is an important interaction for the ACE2 re- Gln93-Glu35 and Gln498-Glu38 (Ali & Vijayan, 2020 Additionally, to provide a convincible insight into the binding differences, we also calculated the K D (dissociation constant) of the wild and mutant complexes. K D is used to evaluate and rank order strengths of biomolecular interactions (Landry et al., 2012) . The K D kinetics is widely practiced for the antigen-antibodies binding, protein-ligand interaction, and large biological macromolecule interactions affinity prediction. The lowest K D the value, the stronger the interaction (Landry et al., 2011) . This method has been previously used to determine the K D value for different macromolecules associations (Landry et al., 2008) . To further provide a deep insight into the real-time binding of these wild and mutant complexes, we used PRODIGY (PROtein binDIng enerGY prediction), an online server to compute the binding affinity K D for different biological complexes. Herein the wild type the binding affinity ΔG was predicted to be −13.2 and the K D value for the wild complex was reported to be 5.2E −10 , for the E484K the ΔG was predicted to be −13.2 while the K D was 3.0E −10 , for the N501Y the ΔG was predicted to be −13.1 while the K D was 6.6E −10 . In case of the double mutant (E484K-N501Y), the predicted ΔG value was −12.8 kcal/mol while the K D 9.4E −10 was reported. The RMSD did not converge substantially but increased gradually. It can be seen that during the 0-8 ns, the RMSD was reported to be 2.0 Å; however, the RMSD remained higher between 9 and 38 ns. The RMSD between 9 and 38 ns was observed to be 2.2Å. Afterward in protein evolution (Hussain et al., 2020) . Next, we calculated the radius of gyration (Rg), which is an indicator of protein structure compactness during the simulation. As given in to the reported mutants. Figure 6a shows that the region between 40 and 50 in N501Y displayed considerably higher fluctuation than the others. Intriguingly, the four mutant complexes but not the wild type showed a higher fluctuation between 100 and 200. Fluctuation in this region is considerably higher than the others, which is due to the distribution of three important loops vital for the binding with ACE2 thus implicating the functional relevance of the mutant complexes for better F I G U R E 5 The figure represents the RMSDs and Rg(s) of all the complexes. The RMSD and Rg of the wild type is shown in black color while the other mutants are given in different colors. RMSDs, root mean square deviations binding and essential conformational optimization during Darwinian evolution and amino acids fixation. In addition, the higher fluctuation was observed between 300 and 400 (ACE2) essential for the binding. Overall these results show that the spike protein has passed through critical structural evolution and considerable adjustment for better binding and ultimately leads to the increased infectivity. Subsequently, we also calculated the RMSF for the Spike RBD domain of the wild and mutants apo states (Figure 6b ). Three loops in the spike RBD domain γ1 (474-485), γ2 (488-490), and γ3 (494-505) are crucial for binding with ACE2 possesses higher fluctuation in the mutant systems but not in the wild type. The residue Lys417 is an essential residue for the binding with ACE2, which showed limited fluctuation. These findings are similar to the previous study reported ACE2-RBD dynamics. Our findings also showed that region 469-505 possess higher fluctuation. In E484K, the region where the mutated residue (E484K) reside possesses higher fluctuation. In comparison, in the case of N501Y, the regions 500-505 have higher fluctuation but not others ( Figure 6C ). Moreover, no noticeable difference was observed in other regions. To further connect protein conformational changes with the binding of the receptor-binding domain (RBD), we used the MM-GBSA approach to compute the binding energy of spike RBD to ACE2 (Figure 7) Brazilian variants variant to be more contagious than the others and the already developed vaccine may not work against it. This shows that protein conformational epistasis evolution may play a significant role in the binding of ACE2 to the spike RBD in the wild type and mutants. In addition, the mutants with the hypothetical mutant (E484K-N501Y) were observed to be radical, thus implying the biological significance of the designed mutant, which may result in relatively higher infectivity than the wild type. The electrostatic energy significantly improved than the wild type. In conclusion, this study precisely explored the mechanism of the interaction of the spike RBD with the host ACE2 and revealed the differences in the binding of the reference and new variants. The systematic investigation revealed that the South African and Brazilian variants are more lethal than the others due to interprotein contacts specifically the electrostatic while the N501Y is comparable with the wild type. We hypothesized that the residue at 501Y is continuously subjected to positive selection pressure. We further demonstrated the dynamic behavior is also changed with the protein evolution. Conclusively, this study provides a strong basis for structure and rationale-based drug designing against the new variant by exploring the noticeable differences. Deduced sequence of the bovine coronavirus spike protein and identification of the internal proteolytic cleavage site Immunoinformatic and systems biology approaches to predict and validate peptide vaccines against Epstein-Barr virus (EBV) Dynamics of the ACE2-SARS-CoV-2/SARS-CoV spike protein interface reveal unique mechanisms Estimated transmissibility and severity of novel SARS-CoV-2 HADDOCK: A protein−protein docking approach based on biochemical or biophysical information Coronaviruses: An overview of their replication and pathogenesis Software extensions to UCSF chimera for interactive visualization of large molecular assemblies Confirmed Reinfection with SARS-CoV-2 Variant VOC-202012/01 A multibasic cleavage site in the spike protein of SARS-CoV-2 is essential for infection of human lung cells Assessing the performance of the MM/PBSA and MM/GBSA methods. 1. The accuracy of binding free energy calculations based on molecular dynamics simulations Clinical features of patients infected with 2019 novel coronavirus in Wuhan Evolutionary and structural analysis of SARS-CoV-2 specific evasion of host immunity Combined drug repurposing and virtual screening strategies with molecular dynamics simulation identified potent inhibitors for SARS-CoV-2 main protease (3CLpro) Computational identification, characterization and validation of potential antigenic peptide vaccines from hrHPVs E6 proteins using immunoinformatics and computational systems biology approaches Dynamics insights into the gain of flexibility by Helix-12 in ESR1 as a mechanism of resistance to drugs in breast cancer cell lines Phylogenetic analysis and structural perspectives of RNA-dependent RNA-polymerase inhibition from SARs-CoV-2 with natural products Structural Insights into the mechanism of RNA recognition by the N-terminal RNA-binding domain of the SARS-CoV-2 nucleocapsid phosphoprotein Structural and free energy landscape of novel mutations in ribosomal protein S1 (rpsA) associated with pyrazinamide resistance New variant of SARS-CoV-2 in UK causes surge of COVID-19. The Lancet Variant analysis of SARS-CoV-2 genomes A fast SHAKE algorithm to solve distance constraint equations for small molecules in molecular dynamics simulations Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Protein reactions with surface-bound molecular targets detected by oblique-incidence reflectivity difference microscopes High throughput, label-free screening small molecule compound libraries for protein-ligands using combination of small molecule microarrays and a special ellipsometry-based optical scanner Simultaneous measurement of 10, 000 protein-ligand affinity constants using microarray-based kinetic constant assays Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom Structure, function, and evolution of coronavirus spike proteins Structure of SARS coronavirus spike receptor-binding domain complexed with receptor UniProt Knowledgebase: A hub of integrated protein data Steepest descent A modified TIP3P water potential for simulation with Ewald summation. The The Cell: A molecular approach PTRAJ and CPPTRAJ: Software for processing and analysis of molecular dynamics trajectory data The RCSB protein data bank: Redesigned website and web services The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak An overview of the Amber biomolecular simulation package Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit solvent particle mesh Ewald Prediction of protein-protein interactions by docking methods How reliable are experimental protein-protein interaction data Potent binding of 2019 novel coronavirus spike protein by a SARS coronavirus-specific human monoclonal antibody Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein Structural and functional basis of SARS-CoV-2 entry by using human ACE2 Updated understanding of the outbreak of 2019 novel coronavirus (2019-nCoV) in Wuhan A stable, rapidly converging conjugate gradient method for energy minimization Complete genome characterisation of a novel coronavirus associated with severe human respiratory disease in Wuhan PRODIGY: A web server for predicting the binding affinity of protein-protein complexes Smell and taste dysfunction in patients with COVID-19. The Lancet Infectious Diseases Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 The authors declare that there are no conflict of interests. http://orcid.org/0000-0003-4200-7502