key: cord-1019018-3pribklz authors: Chand, Gyanendra Bahadur; Banerjee, Atanu; Azad, Gajendra Kumar title: Identification of novel mutations in RNA-dependent RNA polymerases of SARS-CoV-2 and their implications on its protein structure date: 2020-05-11 journal: bioRxiv DOI: 10.1101/2020.05.05.079939 sha: 4cdbab4b46ce92f33d2749e620d36335c3549c90 doc_id: 1019018 cord_uid: 3pribklz The rapid development of SARS-CoV-2 mediated COVID-19 pandemic has been the cause of significant health concern, highlighting the immediate need for the effective antivirals. SARS-CoV-2 is an RNA virus that has an inherent high mutation rate. These mutations drive viral evolution and genome variability, thereby, facilitating viruses to have rapid antigenic shifting to evade host immunity and to develop drug resistance. Viral RNA-dependent RNA polymerases (RdRp) perform viral genome duplication and RNA synthesis. Therefore, we compared the available RdRp sequences of SARS-CoV-2 from Indian isolates and ‘Wuhan wet sea food market virus’ sequence to identify, if any, variation between them. We report seven mutations observed in Indian SARS-CoV-2 isolates and three unique mutations that showed changes in the secondary structure of the RdRp protein at region of mutation. We also studied molecular dynamics using normal mode analyses and found that these mutations alter the stability of RdRp protein. Therefore, we propose that RdRp mutations in Indian SARS-CoV-2 isolates might have functional consequences that can interfere with RdRp targeting pharmacological agents. December 2019 and it became a pandemic by spreading to almost all countries worldwide. The SARS-CoV-2 causes COVID-19 disease that has created a global public health problem. As of May 8, 2020, more than 3.9 million confirmed COVID-19 cases were reported worldwide with 0.27 million confirmed deaths. To design antiviral therapeutics/ vaccines it is important to understand the genetic sequence, structure and function of the viral proteins. When a virus tries to adapt to a new host, in a new environment at a distinct geographical location, it makes necessary changes in its genetic make-up to bring slight modifications in the nature of its proteins (1) . Such variations are known to helpviruses to exploit the host's cellular machinery that promotes its survival and proliferation. The β lymphocytes of host's adaptive immune system eventually identify the specific epitopes of the pathogenic antigen and start producing protective antibodies, which in turn results in agglutination and clearance of the pathogen (2) (3) (4) . Being an efficient unique pathogen, a virus, often mutate its proteins in a manner that it can still infect the host cells, evading the host's immune system. Even when fruitful strategies are discovered and engaged, the high rate of genetic change displayed by viruses frequently leads to drug resistance or vaccine escape (5) . The SARS-CoV-2 has a single stranded RNA genome of approximately 29.8Kb in length and accommodates 14 ORFs encoding 29 proteins that includes four structural proteins: Envelope (E), Membrane (M), Nucleocapsid (N) and Spike (S) proteins, 16 non-structural proteins (nsp) and 9 accessory proteins(6) , (7) including the RNA dependent RNA polymerase (RdRp) (also named as nsp12). RdRp is comprised of multiple distinct domains that catalyze RNA-template dependent synthesis of phosphodiester bonds between ribonucleotides. The SARS-CoV-2 RdRp is the prime constituent of the replication/transcription machinery.The structure of the SARS-CoV-2 RdRp has recently been solved (8) . This protein contains a "right hand" RdRp domain (residues S367-F920), and a nidovirus-unique N-terminal extension domain (residues D60-R249). The polymerase domain and N-terminal domain are connected by an interface domain (residues A250-R365). For RNA viruses, the RNA-dependent RNA polymerase (RdRp) presents an ideal target because of its vital role in RNA synthesis and absence of host homolog. RdRp is therefore, a primary target for antiviral inhibitors such as Remdesivir(9) that is being considered as a potential drug for the treatment of COVID-19. Since, RNA viruses constantly evolve owing to the rapid rate of mutations in their genome, we decided to analyse RdRp protein sequence of SARS-CoV-2 from different geographical region to see if RdRp also mutate. Here, in the present study, we identified and characterised three mutations in RdRp protein isolated from India against that of the 'Wuhan wet sea food market'(7) SARS-CoV-2. Altogether, our data strongly suggests the prevalence of mutations in the genome of SARS-CoV-2 needs to be considered to develop new approaches for targeting this virus. The SARS-COV-2 sequencing data was downloaded from NCBI (NCBI-Virus-SARS-CoV-2 data hub). There are 26 SARS-CoV-2 sequences available from India. We downloaded all Indian SARS-CoV-2 sequences and Wuhan SARS-CoV-2 sequence, which was deposited for the first time after COVID-19 cases started to appear in Wuhan province, China. We downloaded protein sequence of NSP12/ RdRp from the database. All these RdRp sequences were first aligned in CLUSTAL Omega to check for similarities or differences. The Clustal Omega algorithm produces a multiple sequence alignment by producing pairwise alignments. Using this algorithm, we identified seven mutations in isolates from Indian SARS-CoV-2 samples compared to Wuhan SARS-CoV-2 RdRp sequence as shown in figure 1. We considered the original Wuhan sequence as the wild type for this comparison. These mutations on RdRp are A97V, A185V, I201L, P323L, L329I, A466V and V880I respectively. Out of these, one isolate (QJQ28357) have triple mutation, two isolates (QJQ28429, QJQ28417) have double mutation, and rest have single mutations (figure 1). Next, we studied the effect of these mutations on secondary structure of RdRp. Our data revealed that mutation at four positions have no effect in secondary structure that includes I201L, L329I, A466V and V880I (figure 2 C, E, F and G). However, mutation in rest of the three sites causes change in secondary structure (A97V, To correlate the reflection of changes in secondary structure, if any, in the dynamics of the protein in its tertiary structure, we used DynaMut software (10) Figure 3B and D). However, the mutation at I201L leads to increase in flexibility ( Figure 3C ). Further, we also calculated the free energy differences, G, between wild-type and mutant. The free energy differences, Δ Δ G, caused by mutation have been correlated with the structural changes, such as changes in packing density, cavity volume and accessible surface area and therefore, it measures effect of mutation on protein stability (12) . In general, a Δ Δ G below zero means that the mutation causes destabilization and above zero represents protein stabilization. Here, our analysis showed positive Δ Δ G for A185V and P323L mutations suggesting that P323L mutation is stabilising protein structure ( Figure 3A) ; however, we observed negative Δ Δ G for I201L mutation indicating its destabilising behaviour ( Figure 3A ). We further closely analysed the changes in the intramolecular interactions due to these three mutations in RdRp. Our data showed that it is affecting the interactions of the residues which are present in the close vicinity of alanine, isoleucine and proline. The substitution of wild type residues with mutant residues alters the side chain leading to change of intramolecular bonds in the pocket, where these amino acid resides as shown in figure 4A , B and C. Therefore, it can be conclusively stated that the three mutations namely, A97V, A185V and P323L, in large, are changing the stability and intramolecular interactions in the protein that might have functional consequences. RNA viruses exploit all known mechanisms of genetic variation to ensure their survival. Distinctive features of RNA virus replication include high mutation rates, high yields, and short replication time (13) . Mutation rates determine the amount of genetic variation generated in a population, which is the material upon which natural selection that can act (14) . For this reason, a higher mutation rate correlates with a higher evolutionary rate, subject to natural selection. The mutation rates also determine the probability that a mutation conferring to drug resistance, antibody escape, or expanded host range may arise (15) . Additionally, mutation rates can decide whether a virus population will become susceptible to drug-induced lethal mutagenesis. The SARS-CoV-2 infected humans from Wuhan province in China and quickly spread to almost all countries worldwide. Since, this virus has already spread to different demographic areas having different climatic condition, temperature, humidity and seasonal variations; therefore, we can predict that this virus might be mutating to adapt to new environments. Towards this, we investigated the mutations in RdRp of SARS-CoV-2. We focused primarily on RdRp because it is an indispensable protein of this virus that helps in its replication and transcription. More importantly, there are many drugs which specifically target RdRp and are potent antivirals. Our study reports the RdRp mutations present in the Indian isolates of SARS-CoV-2 and, will help to understand the effect of variations on RdRp protein. Our data demonstrate that A97V, A185V and P323Lmutations lead to significant changes in the protein secondary structure (Figure 2 (17) . Altogether, it is conceivable that many RNA viruses acquire drug resistance through mutations in RdRp. However, the functional characterisation of RdRp mutations investigated in our study needs to be carried out to understand the exact role of these mutations. These variations might lead to virus diversification and eventual emergence of variants or antibody escape mutants. Therefore, elaborate studies on SARS-CoV-2 protein sequence variations needs to be carried out that will help scientific community in better therapeutic targeting. We downloaded all SARS-CoV2 sequences from the NCBI virus database as shown in Sequence alignments and structure: All the RdRp protein sequences were aligned by multiple sequence alignment platform of CLUSTAL Omega (18) . Clustal Omega is a multiple sequence alignment program that uses seeded guide trees and HMM profile-profile techniques to generate alignments between three or more sequences. The alignment file was carefully studied and differences in the amino acid changes were recorded. Secondary structure predictions: We used CFSSP (19) (Chou and Fasman secondary structure prediction), an online server, to predict secondary structures of SARS-CoV-2 RdRp protein. This server predicts the possibility of secondary structure such as alpha helix, beta sheet, and turns from the amino acid sequence. CFSSP uses Chou-Fasman algorithm, which is based on analyses of the relative frequencies of each amino acid in secondary structures of proteins solved with X-ray crystallography. RdRp dynamics study: To investigate the effect of mutation on the RdRp protein structural conformation, its molecular stability and flexibility, we used DynaMut software(10) (University of Melbourne, Australia). To run this software we first downloaded the known protein structure of SARS-CoV-2 RdRp from RCSB (PDB ID: 6M71 (8)) and used it for analysis. Next, the 6M71 structure was uploaded on DynaMut software and effect of mutation in various protein structure stability parameters such as vibrational entropy; the atomic fluctuations and deformation energies were determined. DynaMut, a web server implements well established normal mode approaches that is used to analyse and visualise protein dynamics. This software samples conformations and measures the impact of mutations on protein dynamics and stability resulting from vibrational entropy changes and also predicts the impact of a mutation on protein stability. We would like to thank Patna University, Patna for providing necessary infrastructural support. No funding was used to carry out this research. Authors declare no conflict of interests. Mutation-driven parallel evolution during viral adaptation Viral mechanisms of immune evasion Sensing of RNA Viruses: a Review of Innate Immune Receptors Involved in Recognizing RNA Virus Invasion Overview of the immune response Microbial and viral drug resistance mechanisms A SARS-CoV-2-Human Protein-Protein Interaction Map Reveals Drug Targets and Potential Drug-Repurposing A new coronavirus associated with human respiratory disease in China Structure of the RNAdependent RNA polymerase from COVID-19 virus Remdesivir is a direct-acting antiviral that inhibits RNA-dependent RNA polymerase from severe acute respiratory syndrome coronavirus 2 with high potency DynaMut: Predicting the impact of mutations on protein conformation, flexibility and stability Vibrational entropy of a protein: Large differences between distinct conformations Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect RNA VIRUS MUTATIONS AND FITNESS FOR SURVIVAL Why are RNA virus mutation rates so damn high? Antiviral drug resistance as an adaptive process The mechanism of resistance to favipiravir in influenza Identification of a novel resistance mutation for benzimidazole inhibitors of the HCV RNA-dependent RNA polymerase The EMBL-EBI search and sequence analysis tools APIs in 2019 Chou and Fasman Secondary Structure Prediction server Table 1: Details of SARS-CoV-2 sequences used in the analysis S.No Accession Number Institute/ University