key: cord-1038218-9dbmcz9d authors: Yashvardhini, Niti; Jha, Deepak Kumar; Bhattacharya, Saurav title: Identification and characterization of mutations in the SARS-CoV-2 RNA-dependent RNA polymerase as a promising antiviral therapeutic target date: 2021-08-19 journal: Arch Microbiol DOI: 10.1007/s00203-021-02527-9 sha: 8715342a87f733f7f6871fa0267cbb41da478919 doc_id: 1038218 cord_uid: 9dbmcz9d The causative agent of COVID-19 is a novel betacoronavirus or severe acute respiratory syndrome coronavirus (SARS-CoV-2), which has emerged as a pandemic of global concern. Considering its rapid transmission, WHO has declared public health emergency on 11th March 2020 worldwide. SARS-CoV-2 is a genetically diverse positive sense RNA virus that typically exhibit high rates of mutation than DNA viruses. Higher rates of mutation bring higher genomic variability which may lead to viral evolution and enabling viruses to evade the pre-existing immunity of host and quickly acquire drug resistance properties. The objective of our study was to compare the SARS-CoV-2 RdRp sequences of Indian SARS-CoV-2 isolates with those of Wuhan type virus. A total of 384 point mutations were detected from 488 sequence of the RdRp protein of Indian SARS-CoV-2 genome, out of which seven were used for subsequent study. Furthermore, prediction of secondary structure, protein modeling and its dynamics were performed which revealed that seven mutations (R118C, T148I, Y149C, E802A, Q822H, V880I and D893Y) significantly altered the stability and flexibility of RdRp protein. Present study was therefore, undertaken to analyze the variations occurring in RdRp due to multiple mutations leading to the alterations in the structure and function of RNA-dependent RNA polymerase which is essential for the replication /transcription of this virus and hence can be utilized as a promising therapeutic target to curb SARS-CoV-2 infections. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00203-021-02527-9. Human coronavirus (SARS-CoV-2, Severe acute respiratory syndrome) is a positive-strand RNA virus and the etiologic agent of coronavirus disease 2019 that induces acute to severe respiratory distress (Lu et al. 2020) . Initial outbreak of this pandemic has been reported from a small animal market in Wuhan city of Hubei province, China (Zhu et al. 2020) . As of October 9, 2020, confirmed cases of COVID-19 have been reported 179,686,071, including 3,899,172, casualties, globally by WHO. The ripple effect of this infectious disease has created a humanitarian health crisis and has become major challenges to the health systems of the entire world. SARS-CoV-2 belongs to the family coronaviridae, having 30 kb genome size approximately (Su et al. 2016) . The genome of SARS-CoV-2 comprises of 14 ORF sequences, encoding 29 proteins which include four structural proteins such as S (spike), E (envelope), M (membrane), and N (nucleocapsid) proteins that are essential for the assembly of complete virion particle. In addition, viral genome encodes 16 non-structural proteins (nsp) and nine accessory proteins (Gorden et al. 2020; Wu et al. 2020; Yashvardhini et al. 2021 ) including viral replication/transcription mediating protein; the RNA-dependent RNA polymerase (RdRp) (also called as nsp12). In the presence of divalent metal ion, RdRp (a multi-domain protein) catalyses the RNA-template-dependent formation of phosphodiester bonds between ribonucleotides. The structure of SARS-CoV-2 RdRp has been resolved recently which exhibits three distinct domains (Gao et al. 2020) . RNA viruses exhibit drastically high rate of mutations, up to million times higher as compare to their hosts and these high rates are greatly correlated with virulence and evolvability traits considered vital for viral adaptation (Duffy 2018; Pachetti et al. 2020; Jha et al. 2021) . Furthermore, identification and characterization of viral mutation can provide valuable understanding for assessing viral drug resistance. RdRps are considered as a crucial component for the RNA viruses because of its role in viral genome replication /transcription and also the absence of its homolog in host cells, makes this protein a primary target for the viral drug development. RdRp of SARS-CoV-2 is one of the most promising drug targets for antiviral inhibitors such as Favipiravir (Furuta et al. 2013) , Galidesivir (Lim et al. 2017) , Remdesivir (Lim et al. 2017) and Ribavirin (Morgenstern et al. 2005) . These drugs are frequently used for the treatment of COVID-19. Therefore, in the present work, the RdRp protein sequence of coronavirus (SARS-CoV-2) from the Indian isolates have been compared with the sequence obtained from the Wuhan wet sea food market virus to identify, if any, variation caused by mutations are existing between them. For better survival and virulence in the host cell, viruses show multiple strategies to evade host immunity more efficiently (Sackman et al. 2017) . RNA viruses exhibiting high frequency of mutations because of low fidelity of its RNA polymerase, consequently acquiring genomic diversity leading to antigenic variability (Domingo and Holland 1997) . The data of our study revealed the presence of 384 recurrent mutations in Indian isolates of SARS-CoV-2. Further, we did secondary structure prediction, protein modeling and dynamics study of these mutants that revealed the alterations in the structure of RdRp protein. The present research work was therefore carried out to explore the variations in one of the potent drug target like RdRp of SARS-CoV-2 which is indispensable for replication/ transcription machinery. SARS-CoV-2 ORF1ab protein sequences were downloaded from NCBI virus database from July 2020 to May 2021, submitted from India. The virus sequence first reported from 'Wuhan wet sea food market area' (Accession number YP_009724389) ; and being deposited in NCBI virus database along with the first sequence deposited from India (Kerala) with accession number QHS34545 were used as a reference for mutational analysis in this study. All the downloaded protein sequences were aligned using CLUSTAL Omega program with HMM profile (Madeira et al. 2019) . The alignment file was viewed using Jalview and the difference in sequences (mutation) in the RdRp region was recorded. To gain information on phylogenetic relationships, a phylogenetic tree was constructed using MEGAX (Molecular Evolutionary Genetics Analysis) with default parameters. Phylogenetic analysis was performed by the maximum-likelihood (ML) method with bootstrap tests replicated 1000 times (Sudhir et al. 2018 ). The non-synonymous amino acid variants were analyzed using Protein Variation Effect Analyzer known as PROVEAN v1.1.3 with cut-off predicted score of − 2.50 (Choi et al.2015) to detect the effect of mutation on the RdRp protein. CFSSP (Chou and Fasman secondary structure prediction) (Ashok Kumar 2013) was used for prediction of secondary structure of SARS-CoV-2 RdRp protein. CFSSP used in this analysis predicted the presence of alpha helix, beta sheet and turns in the RdRp protein of the reference virus sequence as well as the mutated virus sequences. The structure of different RdRps i.e. wild type and the mutated ones were modeled and Ramachandran plot was prepared using SWISS-MODEL homology-modeling database. To study the impact of mutation on the structure, conformation, stability and flexibility of RdRp protein, DynaMut software was used (Rodrigues et al. 2018 ). The reference protein structure of SARS-CoV-2 RdRp used in this analysis was downloaded from RCSB with PDB ID: 6M71 (Gao et al. 2020 ) and uploaded in DynaMut. Various protein structure stability parameters like change in vibrational entropy of wild and mutated proteins, deformation and atomic fluctuation were determined using first 10 non-trivial modes of the structure. Also, to detect the variations occurring in the intramolecular interactions due to mutation was estimated using Dynamut software. A total of 488 sequences were retrieved from NCBI virus database, submitted from India in the month of July 2020 to May 2021. Also the Wuhan SARS-CoV-2 virus and the first Indian SARS-CoV-2 isolate sequences were retrieved to be used as reference in this study, both sequences were found similar. These sequences were aligned pairwise using CLUSTAL Omega database to detect the presence of mutation in these sequences. The alignment was visualized using Jalview to check out the similarities and differences in the protein sequences. Only those mutations which occurred in the RdRp (RNA-dependent RNA polymerase) or nsp12 region were identified and used further in this study as it plays a vital role in viral replication. A total of 384 recurrent mutations in 318 Indian isolates were detected in the RdRp (nsp12) region by comparing the sequences from India with that of Wuhan as shown in Table S1 . Out of 384 mutations only R118C, T148I, Y149C, E802A, Q822H, V880I and D893Y were used in this study (Table 1) . Therefore, these seven mutations were further characterized to see its effect on overall protein structure, conformation and dynamicity. Among these seven mutations, only four were found neutral (T148I, V880I, Q822H and D893Y) and rest were deleterious for the RdRp protein at −2.5 cut-off values of PROVEAN score (Table 2) . To obtain information on the phylogenetic relationships of different SARS-CoV-2 isolates from India as well as Wuhan we constructed a phylogenetic tree by maximum-likelihood (ML) method using MEGAX software by aligning full length ORF1ab polyprotein (7096 amino acid length). This phylogenetic analysis ( Fig. 1) showed that, the ORF1ab polyprotein variants from India and Wuhan formed different clusters revealing the multi-spiked nature of the SARS-CoV-2 virus. Secondary structure analysis was carried out to detect the alterations in the formation or loss of alpha helix, beta sheet and turns. Two mutations, Y149C and V880I in this study did not show any changes in the secondary structure however, rest of the five mutations showed significant changes ( Fig. 2A) . At position 118, substitution of arginine by cysteine resulted in loss of turn at position 117. Arginine is a polar positively charged amino acid, hydrophilic in nature having guanidino ring in its structure, and therefore, prefers to form turns. Cysteine being an uncharged polar amino acid prefers to form disulfide bond, and hence, favors compact protein structure, resulting in loss of turns. Our analysis also showed a point mutation at position 148 where threonine is substituted by isoleucine and is favored by helix formation. Isoleucine is a non-polar amino acid, and therefore, a stabilized residue for helix formation; whereas, threonine being a polar uncharged amino acid prefers to lie within beta sheets. The detailed analysis further explained substitution of glutamic acid by alanine at position 802 of the RdRp protein. This mutation was accompanied by a loss of turn secondary structure at point 802 (Fig. 2a) . Alanine a nonpolar amino acid that has more propensity towards the formation of alpha helix rather than turns, therefore, causes loss of turns whereas glutamic acid having negatively charged side chain favors turns formation resulting in proper folding of the protein. Point mutation was further observed at position 822 where glutamine is replaced by histidine. Histidine is a positively charged amino acid with imidazole ring which neither favors helix structure nor forms sheets therefore causes loss of these secondary structures upon introgression in the protein structure. Glutamine, an uncharged amino acid forms beta sheets and turns with more propensity. Therefore, the replacement of glutamine by histidine resulted in loss To further characterize the impact of mutations on the RdRp protein dynamics, tertiary structure was built using Dynamut software (Rodrigues et al. 2018) . Firstly protein modeling was performed using Swiss model which predicts the structure according to the sequence and a template protein which are shown in Fig. 2b . Ramachandran plot of the template as well as mutated protein was prepared to identify the residues in the favored region (Fig. 2c) . On an average more than 90% of the amino residues were found in the favored region of Ramachandran plot of both the wild type and mutated protein as shown in Table 3 . The template protein used in modeling was 6XEZ and 6M71. Dynamut predicts protein steadiness or dynamic state upon mutation in the native structure of protein as determined by ENCoM, mCSM, DUET and others. Our analysis shows free energy difference between the wild type and mutated sequences, ∆∆G was stabilizing in all the RdRp mutants. The free energy change was recorded highest in E802A (1.725 kcal/mol) followed by T148I, Y149C and V880I as shown in Table 4 . The free energy changes predict the accessible surface area of protein, cavity volume and packing density, and hence it computes the stability of the mutated protein versus wild type protein. In this study, all the mutations showed positive ∆∆G values hence predicting a stabilized mutant protein structure. Furthermore, in this investigation vibrational entropy energy (ΔΔSVib ENCoM) was computed which gives the configurational entropy of the protein according to the energy landscape. These values provide deep insight into protein movements and hence their conformational changes (Rodrigues et al. 2018 ). The ΔΔSVib ENCoM calculated for all the mutations showed a negative value representing the rigidification of protein structure upon mutation. The most rigid structure was that of Q822H (− 5.021 kcal/mol/K) mutant, followed by D893Y, R118C and V880I, however, the mutant Y149C (-3.621 kcal/mol/K) showed less rigidity and this mutant protein had nearly flexible structure as indicated in the Fig. 2d . The visual representation of the flexibility analysis by Dynamut showed all RdRp mutants exhibiting a rigid structure except for Y149C mutant which gained flexibility upon mutation (shown as red region in Fig. 2d ). Our analysis extended further with the calculation of atomic fluctuations and deformation energies. The visualization of atomic fluctuation predicts the amplitude of the atomic motion present in the protein moiety whereas deformation energy computes the local flexibility in the protein molecule. Atomic fluctuation was calculated over the first 10 non-trivial modes on the protein molecule. The magnitude of the fluctuations calculated is shown by thin to thick tube colors in which blue shows low, white shows moderate and red shows high fluctuations. Similarly, the amplitude of deformation is calculated over the first 10 non-trivial modes of the molecule whose magnitude is represented by thin to thick tube colors in which blue shows low, white shows moderate and red shows high deformations. In this study, visual changes were observed in the atomic fluctuations and deformation energies of mutant RdRp protein with that of wild type protein which are marked with arrows of different color (Fig. 2e, f) . Upon mutation there was introduction of blue color tubes at the point of white tubes which shows low level of atomic fluctuation as well as deformation in the mutant RdRp protein versus wild type. The protein dynamics study further elaborated with the study of shift in intramolecular interactions caused by mutation in the RdRp protein. Dynamut server detects all covalent and non-covalent interactions and hence predicts Fig. 1 Phylogenetic tree representing Indian SARS-CoV-2 isolate sequences as well as Wuhan type virus isolate as reference. Tree was constructed using MEGA X software with default parameters the intramolecular interactions. The mutation in RdRp protein resulted in disruption of hydrophobic interactions, aromatic contacts, ionic interactions, hydrogen bonds and other metal complex interactions. The results of the present study revealed that the mutation in the arginine, threonine, tyrosine, glutamic acid, glutamine, valine and aspartic acid affected the interaction of the amino acid residues found in close proximity (Fig. 3) . The residues found in the side chain of wild type protein are changed due to incorporation of mutant residues. From the above analysis it can be concluded that the mutation in the RdRp protein is changing not only the stability and flexibility of the protein but also interfering with its intramolecular interactions with its neighboring molecules. Effect of mutation at different sites on the secondary structure of RdRp protein, A-E represents five mutations as two occurred at the same locus observed in Indian isolates. The first secondary structure in each A-E represents the Wuhan type sequence while the second represents the mutated one. The mutation location and respective secondary structures are marked with boxes. b Swiss modeling of wild type and mutant RdRp protein. Models were prepared by Swiss model server with 6XEZ and 6M71 as template. c Ramachandran plot analysis of template and mutated protein. The plot shows amino acids present in the favored region as well as outliers. d Mutational effect on structural dynamics of RdRp protein. Vibrational entropy energy between wild type and mutant RdRp in which amino acids are colored accord-ing to vibrational entropy change in mutant with reference to wild type. Blue represents rigidification; whereas, red represents gain in flexibility upon mutation. e Visual representation of Atomic Fluctuations. The atomic fluctuation of both wild type and mutant RdRp proteins is shown. The magnitude of fluctuation is represented by thin to thick tube colored blue (low), white (moderate) and red (high). The changes in fluctuation of mutant with reference to wild type are marked by red, green and yellow arrows. f Visual representation of Deformation Energies. The Deformation Energy of both wild type and mutant RdRp proteins is shown. The magnitude of deformation energy is represented by thin to thick tube colored blue (low), white (moderate) and red (high). The changes in deformation of mutant with respect to wild type are marked by red, green and yellow arrows Discussion SARS-CoV-2 broke out from Wuhan, China, further began to spread all over the world. Several factors are associated with the transmissibility of SARS-CoV-2, such as population density, health care system as well as environmental and climatic variations . Enormous genetic diversity has been shown by positive-strand RNA viruses including SARS-CoV-2, because they exhibit extremely high frequency of mutation, substantially higher than those of DNA viruses; although vast majority of mutation shows detrimental effects. Moreover, a mutation favors viral evolvability and enhances its ability to survive in the dynamic environment of the host. High frequency of mutation drives viral evolution as well as genome variability and hence enabling viruses to evade the immunity of host and soon develop drug resistance. RNA viruses including SARS-CoV-2 have potential ability to accumulate genomic mutations through an error-prone viral enzyme reverse transcriptase and better adapt inside the host that further creates hurdles in designing antiviral therapeutics against RNA viruses (Mishra et al. 2021) . Quasispecies dynamics and high rate of mutations are most important features of RNA viruses. These viruses can adapt to a new environment as a result of continuous genetic variation leading to selection of viral populations. The adaptive potential of RNA viruses must be taken into account while designing antiviral therapeutics. In the present work, we have attempted to identify and characterize mutations in RdRp protein of SARS-CoV-2 from Indian isolates, as this protein is essential for replication and transcription of coronavirus. In addition to this, several existing drugs are targeting RdRp and they are regarded as potent antiviral agents. Our study showed the occurrence of recurrent RdRp Archives of Microbiology mutations in the SARS-CoV-2 of Indian isolates with reference to Wuhan wet sea food market isolate which provides crucial insights to understand the effect of variations on the activity of RdRp protein. The phylogenetic analysis revealed that many Indian isolates of SARS-CoV-2 showed divergence from the Wuhan type isolate due to variations generated in them upon mutation. In this investigation, mutations were more frequently found in the C-terminal region (thumb subdomain) of the RdRp protein than in the N-terminal. Further, the analysis showed that the mutations at Y149C and V880I did not alter the secondary structure whereas the mutations at R118C, T148I, Q822H and D893Y lead to noticeable changes in secondary structure of RdRp protein (Fig. 2a) . The mutation at R118C occurs at the NiRAN domain and is an important structural block with five antiparallel beta strands and two helices and therefore, can influence the structural integrity of the virus (Gao et al. 2020) . The Swiss modeling predicted the occurrence of most amino acid residues of this RdRp protein in the favored region of Ramachandran plot. Furthermore, the protein dynamics study revealed the occurrence of rigidness in the C-terminal region of the RdRp protein after mutation except for Y149C which leads to flexibility in the N-terminal region. This structural flexibility can influence the viral replication and hence can result in emergence of fidelity variants. These variations alter the atomic and deformation energies of the mutants and interfere with their intramolecular interactions. We analyzed that the interactions like hydrogen bonds, hydrophobic interactions, metal ion interactions and others were amended in the mutant RdRp as compared to the wild type. Earlier studies have shown that drug resistance occurred due to the prevalence of multiple natural mutations in the RdRp machinery that significantly reduces drug-RdRp binding affinity (Agostini et al. 2018; Young et al. 2003; Goldhill et al. 2018) . The results of our study was found consistent with Pachetti et al. (2020) where they have described RdRp of SARS-CoV-2 acquired drug resistance properties due to occurrence of high frequency of mutations in the RdRp of infected populations. For instances, according to an estimate all probable single, double as well as triple point mutants have been produced by HIV-1in every single infected patient every day (Perelson et al. 1997) . Delang et al. (2012) have also reported mutations in the RdRp of hepatitis C virus at P495, P496 and T389 showing drug resistance properties. In addition, mutations in the SARS-CoV-2-RdRp affects the fidelity of replication process, consecutively, viral load and virulence may also be affected (Shannon et al. 2020) . The information on the potential of SARS-CoV-2 getting genetic alterations (mutation) and how it can be regulated is still scanty (Vankadari 2020) . However, further in vitro and in vivo studies are urgently needed to assess the possible role of mutations in affecting the fidelity of RdRp protein. Similarly, viruses having mutant RdRp, might be showing resistant towards anti-viral therapeutic drugs like remdesivir which is a most commonly used repurposing drug for SARS-CoV-2 now-a-days. Nguyen et al. (2020) have reported the binding affinity and mechanisms of remdesivir interactions with two main targets. They have suggested that the electrostatic interaction, a type of noncovalent force, has been observed as the key determinant in stabilizing the RdRp-remdesivir complex, whereas the van der Waals force was found to be dominating interactions Archives of Microbiology in the Mpro-remdesivir case. Thus, remdesivir can target both RdRp and Mpro, which seems to be an effective drug in treating COVID-19 infections. The findings of our study also revealed that mutations occurring in the RdRp protein alters intramolecular interactions with adjacent residues and, therefore, this information might be a contributing factor for the development of antiviral therapeutic drug. Moreover, several plant products such as flavonoids, alkanoids, lactones and terpenes have also been found as inhibitors for the target protein (RdRp) of coronavirus (Saha et al. 2021) . These implications favor RdRp as a potential drug target for pharmacological as well as epidemiological studies. Present findings, therefore, further suggest that evolvability of SARS-CoV-2 is greatly associated with the onset of novel mutations that spread at several new locations of the viral genome and also provides immense insight to develop specific and effective control strategies to combat COVID-19 infections. Further, in vivo and in vitro studies are mandatory to gain understanding on the implications of these recurrent mutations on the functions of RdRp. The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s00203-021-02527-9. Funding Nil. The authors declare that they have no conflict of interests. Ethical approval The present work does not contain any animal or human subject. Interatomic interactions were altered by mutations at locus R118C, T148I, Y149C, E802A, Q822H, V880I and D893Y as shown in figure. Wild type amino acid residues are colored in light green and represented as stick with the surrounding residues where any interactions exist Coronavirus susceptibility to the antiviral remdesivir (GS-5734) is mediated by the viral polymerase and the proofreading exoribonuclease CFSSP: Chou and Fasman secondary structure prediction server PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels Identification of a novel resistance mutation for benzimidazole inhibitors of the HCV RNA-dependent RNA polymerase RNA virus mutations and fitness for survival Why are RNA virus mutation rates so damn high? Favipiravir (T 705), a novel viral RNA polymerase inhibitor Structure of the RNA-dependent RNA polymerase from COVID-19 virus The mechanism of resistance to favipiravir in influenza A SARS-CoV-2-human protein-protein interaction map reveals drug targets and potential drug-repurposing Immunological and mutational analysis of SARS-CoV-2 structural proteins from Asian countries Galidesivir, a direct-acting antiviral drug Rhesus Macaques challenged with zika virus. Open Forum Infect Dis Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding The EMBL-EBI search and sequence analysis tools APIs in 2019 Identifying the natural polyphenol catechin as a multi-targeted agent against SARS-CoV-2 for the plausible therapy of COVID-19: an integrated computational approach Ribavirin and interferon-β synergistically inhibit SARS-associated coronavirus replication in animal and human cell lines Remdesivir strongly binds to both RNA-dependent RNA polymerase and main protease of SARS-CoV-2: evidence from molecular simulations Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant Dynamics of HIV-1 CD4+ lymphocytes in vivo DynaMut: predicting the impact of mutations on protein conformation, flexibility and stability Mutation-driven parallel evolution during viral adaptation Discovering potential RNA dependent RNA polymerase inhibitors as prospective drugs against COVID-19: an in silico approach Remdesivir and SARS-CoV-2: structural requirements at both nsp12 RdRp and nsp14 exonuclease active-sites Epidemiology, genetic recombination, and pathogenesis of coronaviruses MEGA X: molecular evolutionary genetics analysis across computing platforms Overwhelming mutations or SNPs of SARS-CoV-2: A point of caution Temperature significant change COVID-19 transmission in 429 cities A new coronavirus associated with human respiratory disease in China Immunoinformatics identification of B-and T-Cell epitopes in the RNA-dependent RNA polymerase of SARS-CoV-2. Can J Infec Diseases Med Microbiol 2021:8 Identification of a ribavirin-resistant NS5B mutation of hepatitis C virus during ribavirin monotherapy A novel coronavirus from patients with pneumonia in China