key: cord-0942284-z680851r authors: Khan, Abbas; Tahir Khan, Muhammad; Saleem, Shoaib; Junaid, Muhammad; Ali, Arif; Shujait Ali, Syed; Khan, Mazhar; Wei, Dong-Qing title: Structural Insights into the mechanism of RNA recognition by the N-terminal RNA-binding domain of the SARS-CoV-2 nucleocapsid phosphoprotein date: 2020-08-12 journal: Comput Struct Biotechnol J DOI: 10.1016/j.csbj.2020.08.006 sha: 039ea9cc12ba41a6a233ee921a9f2d7649408522 doc_id: 942284 cord_uid: z680851r The emergence of recent SARS-CoV-2 has become a global health issue. This single-stranded positive-sense RNA virus is continuously spreading with increasing morbidities and mortalities. The proteome of this virus contains four structural and sixteen nonstructural proteins that ensure the replication of the virus in the host cell. However, the role of phosphoprotein (N) in RNA recognition, replicating, transcribing the viral genome, and modulating the host immune response is indispensable. Recently, the NMR structure of the N-terminal domain of the Nucleocapsid Phosphoprotein has been reported, but its precise structural mechanism of how the ssRNA interacts with it is not reported yet. Therefore, here, we have used an integrated computational pipeline to identify the key residues, which play an essential role in RNA recognition. We generated multiple variants by using an alanine scanning strategy and performed an extensive simulation for each system to signify the role of each interfacial residue. Our analyses suggest that residues T57A, H59A, S105A, R107A, F171A, and Y172A significantly affected the dynamics and binding of RNA. Furthermore, per-residue energy decomposition analysis suggests that residues T57, H59, S105 and R107 are the key hotspots for drug discovery. Thus, these residues may be useful as potential pharmacophores in drug designing. Introduction 56 SARS-CoV-2 belongs to the single-stranded positive-sense RNA family. This virus family has a 57 large genome (30 kb RNA genome) that encodes four structural proteins, small envelope (E), 58 matrix (M), nucleocapsid phosphoprotein (N), spike (S), and sixteen nonstructural proteins (nsp1-59 16) that together, ensure replication of the virus in the host cell [1] . The non-structural proteins, 60 mostly associated with RNA replication, carry out the enzymatic function required for viral 61 replication. The genome of SARS-CoV-2 also encodes for nsp7, nsp8, and nsp12 that together 62 form a complex called RNA-dependent RNA-polymerase, nsp10, nsp13, nsp14, and 16 complexes 63 called RNA capping machinery, and nsp3, 3PLpro, and nsp5 known as proteases that impede 64 innate immunity and also essential for cleaving viral polyproteins [2, 3] . 65 The first 66.66 % part (two-thirds) of SARS-CoV-2 genome is known as ORF1a/b region and 66 encodes for the non-structural proteins, whereas the remaining one-third part of genome encodes 67 the accessory proteins and four structural proteins [4] . In recent antiviral drug and vaccine 68 designing investigations spike proteins (S) and proteases were targeted. However, the mutations 69 in spike protein would be helpful to evade the effect of these drugs. On other hand the use of 70 protease inhibitors can harm the homologous cellular proteases [5, 6] . Therefore, it is essential to 71 investigate novel targets and devise comprehensive strategies for the protection of human against 72 all sort of viral encroachment including acute respiratory infection caused SAR-CoV-2. 73 In corona viruses the multifunctional N protein is essential for transcription as well as replication. 74 N protein binds to the viral genome and contributes in packing it to get long helical nucleocapsid 75 structure [7] [8] [9] . Previous studies indicated the involvement of N protein in host-pathogen 76 interactions by regulating apoptosis, actin reorganization and host cell cycle progression [10, 11] . 77 The highly immunogenic nature and most expressed protein during infection make N protein a 78 valuable novel target for devising novel strategies to combat respiratory infections caused by CoV. 79 The recent studies suggested that the N proteins (homologous in different coronaviruses) is respectively [11] .In N terminal of coronavirus N protein several residues associated with RNA 88 binding and infectivity has been identified [12] [13] [14] . However, N protein of SAR-CoV-2 required 89 further investigation to confirm the previous findings in other corona viruses. The N-terminal RNA 90 binding domain (N-NTD) captures the RNA genome [15] [16] [17] . In contrast, the C-terminal domain 91 anchors the ribonucleoprotein complex to the viral membrane via its interaction with the M 92 protein [18] . The four structural proteins, together with the viral +RNA genome and the 93 envelope, constitute the complete virion [16, 17, 19] . Both of these domains have the RNA 94 binding affinity, while the CTD binds the M protein, establishing the physical linkage between 95 the envelope and +RNA. The SARS N proteins also play regulatory roles in the viral life cycle 96 through the host intracellular machinery. A more recent study shows the structure of N 97 protein, right hand-like fold, composed of a β-sheet core with an extended central loop. The 98 core region adopts a five-stranded U-shaped right-handed antiparallelβ-sheet platform with 99 the topology β4-β2-β3-β1-β5, flanked by two short α-helices. A prominent feature of the 100 structure is a large extending loop between β2-β3 that forms a long basic β-hairpin (β2' and 101 β3') [15] . Since the role of Nucleocapsid Phosphoprotein to recognize the RNA is crucial [9] . It binds 103 the viral RNA genome and packs them into a complex of ribonucleoprotein (RNP). This RNP 104 complex is critical for retaining highly ordered RNA conformation apt for replicating and 105 transcribing the viral genome [3] . This complex is also being required for host-pathogen 106 interactions regulation, a highly immunogenic and abundantly expressed protein during 107 infection [8] . 108 The NMR structure of the SARS-CoV-2 N-terminal and C-terminal domains of nucleocapsid 109 phosphoprotein has recently been reported but the role of N-terminal domain in recognizing the 110 RNA is not clear [15] . The N-terminal domain reported is a monomer structure and does not contain 111 the interacting RNA. Since it is important to understand the interaction mechanism to provide a 112 way in the treatment of recent pneumonia. Herein, we combined multiple computational 113 approaches to understand how the RNA interacts with this nucleocapsid phosphoprotein. We used 114 computational docking approaches to understand the role of critical residues in interaction with 115 RNA. Furthermore, we used the in-silico mutagenesis strategy to determine the impact of each 116 residue taking part in the interaction. We also performed molecular dynamics simulation, binding 117 free energy calculations, Dynamics cross-correlation analysis, principal component analysis, and 118 Free energy landscape to deeply understand the role RNA recognition mechanism by the 119 nucleocapsid phosphoprotein. The findings of this research can be useful and will provide a better 120 understanding of rapid drug designing to control the global epidemic of SARS-CoV-2. [21] . The missing hydrogens were added, and partial charges were assigned. 128 The structure was also analyzed for structural breaks and unknown residues. Prior to docking, the 3D structure of RNA was constructed by using the sequences reported by a 132 recent study [15] . The structure was generated and analyzed for topology defects. All the grooves 133 were carefully examined before the docking. The NMR structure of the N-terminal nucleocapsid 134 phosphoprotein was retrieved from RCSB databank. For the docking, we used multiple algorithms. Alanine scanning is a site-directed mutagenesis method used to identify whether a particular 152 residue contributes to the stability or function of a specific protein. Alanine is used owing to its 153 chemically inert, non-bulky, methyl functional group that nevertheless imitates the secondary 154 structure preferences that certain other amino acids exhibit. This strategy also can be used to 155 discern if the side chain of a particular residue plays an important role in bioactivity or not [25, 26] procedure of alanine scanning mutagenesis has been given in the previous study [28] . Two 159 parameters dAffinity and dStability were considered while calculating the impact of alanine 160 substitutions. High positive dAffinity and dStability means highly significant substitution. 161 Furthermore, we also used mCSM-NA an online server, to determine the impact of alanine The WT and mutant type complex were subjected to molecular dynamics (MD) simulation studies 172 using the Amber package [31] . The TIP3P water model was used, and the system was neutralized 173 by Na + counter ions addition. The OL3 force field was used for RNA. The system was energy 174 minimized by using the steepest descent algorithm. Restraining simulation of the position was 175 employed to equilibrate the system and solvent around the protein before the actual simulation. In 176 a constant number of atoms, volume, pressure, and temperature (NPT and NVT), ensembles were 177 applied to the system for the MD simulation studies. Particle Mesh Ewald (PME) SHAKE 178 algorithm was used for hydrogen interactions [32] . A total of 400ns of MD simulation for each 179 system was performed and repeated three times. CPPTRAJ and PYTRAJ[33] was used for RMSD, 180 RMSF, and other analysis of the MD trajectories. Pymol was used for visualization [34] . 181 Furthermore, we also calculated the total energies of all the systems including wild type and 182 mutants. (Figure 1 (A) ). The total binding affinity -108.0 kcal/mol was reported for the best 234 conformation. To understand the interaction pattern, these complexes were subjected to the 235 DNAproDB server. This server mapped the interactions, and the results are shown in Figure 236 1 (B). Results from these analyses revealed that residues Thr57, His59, Lys61, Lys102, 237 Asp103, Leu104, Ser105, Arg107, Lys169, Gly170, Phe171, Tyr172, Ala173, Gly175, Ser176, Table 1 , the seven substitutions, which reduce the stability, were selected for 278 molecular dynamics simulation and post-simulation analysis to understand the dynamics of these 279 substitutions. to check the stability of MTs during the simulation period. We repeated each simulation run three 295 times. The trajectory was analyzed, and RMSDs were calculated after 400ns. As given in Figure 296 (2), the wild type system remained stable during the course of simulation except for friction 297 between 150-160ns time period. It can be seen that the wild type system after this acceptable 298 fluctuation has gained the stability and onward till 400ns a straight graph is formed, which reports 299 the stable behavior of the wild type system. In the case of the T57A mutant, the RMSD increased 300 for the first 80ns but remained stable for the rest of simulation time. On the other hand, H59A, 301 which form multiple interactions with an RNA molecule, has significantly affected the overall 302 stability of the system. From the figure, it can be explained that major convergence at different 303 intervals occurred. time periods between 80-100ns, 180-200ns, and 330-380ns showed significant 304 deviation during the simulation. In addition, the system S105A showed a stable graph till the 180ns 305 except for a substantial convergence at 180ns time period and the RMSD increased substantially. 306 Soon after increasing the RMSD no convergence was observed. In the case of R107A, the system 307 showed significant deviation during the course of the simulation. Specifically, the system, R107A, 308 showed significant convergence in the stability till the end of the simulation. Significant The total energies of all the mutants revealed a more similar pattern ranging from -80800 361 kcal/mol to -82600 kcal/mol. On the other hand, the wild type exhibited different total 362 energy as given in Figure 4 . Furthermore, to understand the impact of each residue on the binding of RNA we calculated 538 the energy contribution from each residue to the total energy. Our analysis suggests that 539 among the seven residues T57, H59, S105 and R107 contributes more to the total energy. As 540 given in Figure 9 , it can be seen that H59 contributes the most followed by R107, S105 and 541 T57. Hence these results confirm that while designing small molecule inhibitors these 542 residues should be the primary targets. We speculate the blocking these residues could help 543 to block the SARs-CoV-2 pathogenicity. anchoring the ribonucleoprotein to the viral membrane [42] . Although the previous study [15] 572 unveil that RNA binding to N-NTD and its interaction with RNA, however, the mechanism and 573 the impact of mutation has not been yet investigated. Here in the current investigation, we 574 performed comprehensive MD simulation to unveil the binding mechanism, types of interactions, 575 and the impact of mutations on N proteins' dynamic behavior. Residues T57, H59, S105A, R107A, 576 G170, F171, Y172 have been found, playing a significant role in interaction with RNA. A more 577 recent study also reported that amino acid residues A50, T57, H59, R92, I94, S105, R107, R149, 578 Y172 are essential in the establishment of interactions with SARS-CoV-2 RNA (Dinesh et al. 579 2020). The molecular mechanisms to recognize RNA binding N protein and the establishment 580 of interactions will increase our understating to design future inhibitors. Our protein model 581 docking, and simulation analysis exposed that N-NTD recognizes and establishing contacts 582 in a shape-specific manner by with RNA. The same results have been described earlier, where stem-loop mRNA is recognized by adenosine deaminase RNA specific 2 (ADAR2) [43] . Previous studies demonstrated that residues S105 and R107 are conserved among all SARS- Y172 are playing a significant 615 role in binding with RNA of SARS-CoV-2. Alanine scanning further supported the role of these 616 residues when subjected to comprehensive MD simulation. The overall structural dynamics, 617 including RMSD, RMSF, DCCM, and PCA, have been found, influenced by alanine MTs. Binding 618 free energy further supported that these residues might have a role in binding with RNA. Drug 619 development and screening against these residues may CoV-2 infections. The fluctuations and changes observed in the longer and repeated simulation 621 could provide better understanding. The observed variations in different replicas are significantly 622 correlated and could aid to design small molecule inhibitors which could target the N-terminal 623 domain of SARs-CoV-2 N-NTD protein and may halt the RNA recognition to aid the treatment Wei is supported by the grants from the Key Research Area Grant 2016YFA0501703 630 of the Ministry of Science and Technology of China, the National Natural Science Foundation of 631 61832019, 61503244), the Natural Science Foundation of Henan Province 632 (162300410060) and Joint Research Funds for Medical The computations were partially performed at the 634 Center for High-Performance Computing Coronavirus genome structure and replication The nonstructural proteins directing coronavirus RNA 649 synthesis and processing The SARS coronavirus nucleocapsid protein-forms and functions Crystal structure of SARS-CoV-2 nucleocapsid protein RNA binding domain reveals 653 potential unique drug targeting sites Recent development of 3C and 3CL protease inhibitors 655 for anti-coronavirus and anti-picornavirus drug discovery Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation High affinity interaction between nucleocapsid 659 protein and leader/intergenic sequence of mouse hepatitis virus RNA Specific interaction between coronavirus leader RNA and nucleocapsid 662 protein Nucleocapsid protein recruitment to replication-transcription complexes plays a 664 crucial role in coronaviral life cycle The nucleocapsid protein of severe acute respiratory syndrome-coronavirus 666 inhibits the activity of cyclin-cyclin-dependent kinase complex and blocks S phase progression in 667 mammalian cells Assembly of severe acute respiratory syndrome coronavirus RNA packaging 669 signal into virus-like particles is nucleocapsid dependent Coronavirus N protein N-terminal domain (NTD) specifically binds the 672 transcriptional regulatory sequence (TRS) and melts TRS-cTRS RNA duplexes Functional transcriptional regulatory sequence (TRS) RNA binding and helix 675 destabilizing determinants of murine hepatitis virus (MHV) nucleocapsid (N) protein. Journal of 676 Biological Chemistry Amino acid residues critical for RNA-binding in the N-terminal domain of the 678 nucleocapsid protein are essential determinants for the infectivity of coronavirus in cultured 679 cells Structural basis of RNA recognition by the SARS-CoV-2 nucleocapsid 681 phosphoprotein. bioRxiv Structural proteins of human respiratory coronavirus OC43. Virus 683 research The molecular biology of coronaviruses Origin and evolution of pathogenic coronaviruses Crystal structure studies of RNA duplexes containing s2U: A and s2U: U base 689 pairs Molecular operating environment (MOE) Suite# 910, Montreal … HADDOCK: a protein− protein docking approach 694 based on biochemical or biophysical information NPDock: a web server for protein-nucleic acid docking. Nucleic acids 697 research DNAproDB: an expanded database and web-based tool for structural 699 analysis of DNA-protein complexes Combinatorial alanine-scanning. Current opinion in chemical 701 biology Alanine scanning mutagenesis of the prototypic cyclotide reveals a cluster 703 of residues essential for bioactivity Computational alanine scanning of protein-protein 705 interfaces Structural insights into the Middle East respiratory syndrome coronavirus 4a 707 protein and its dsRNA binding mechanism. Scientific reports mCSM-NA: predicting the effects of mutations on protein-nucleic 709 acids interactions. Nucleic acids research DrugScorePPI knowledge-based potentials used as scoring and objective 711 function in protein-protein docking The Amber biomolecular simulation programs Predicting crystal structures: the Parrinello-Rahman 715 method revisited. Physical review letters Routine microsecond molecular dynamics simulations with AMBER on 717 Explicit solvent particle mesh Ewald Pymol: An open-source molecular graphics tool. CCP4 Newsletter on protein 720 crystallography Principal component analysis. Chemometrics and intelligent 722 laboratory systems On lines and planes of closest fit to systems of points in space. The London Principal component analysis and long time protein dynamics. The Journal 726 of Physical Chemistry Contact-and distance-based principal component analysis of 728 protein dynamics Structural dynamic analysis of apo and ATP-bound IRAK4 kinase. Scientific 730 reports The MM/PBSA and MM/GBSA methods to estimate ligand-binding 732 affinities. Expert opinion on drug discovery Assessing the Performance of MM/PBSA, MM/GBSA Approaches on Protein/Carbohydrate Complexes: Effect of Implicit Solvent Models QM Methods, and Entropic Contributions Interactions between coronavirus nucleocapsid protein and viral RNAs: 738 implications for viral transcription The solution structure of the ADAR2 dsRBM-RNA complex reveals a sequence-740 specific readout of the minor groove Comparing experimental and computational alanine scanning techniques 742 for probing a prototypical protein-protein interaction Structural and free energy landscape of novel mutations in ribosomal protein 745 S1 (rpsA) associated with pyrazinamide resistance. Scientific reports Dynamics Insights Into the Gain of Flexibility by Helix-12 in ESR1 as a Mechanism 747 of Resistance to Drugs in Breast Cancer Cell Lines. Frontiers in molecular biosciences conceptualized the study and did the analysis. AA, SSA, MK wrote the 756 manuscript. AA, SSA and MK also contributed to the methodology