key: cord-0790592-2hr21cwa authors: Kumar, Sushant; Kumari, Khushboo; Azad, Gajendra Kumar title: Emerging genetic diversity of SARS-CoV-2 RNA dependent RNA polymerase (RdRp) alters its B-cell epitopes date: 2021-11-17 journal: Biologicals DOI: 10.1016/j.biologicals.2021.11.002 sha: 197c0df871a608511929d753a5dbe46b758a0641 doc_id: 790592 cord_uid: 2hr21cwa The RNA dependent RNA polymerase (RdRp) plays crucial role in virus life cycle by replicating the viral genome. The SARS-CoV-2 is an RNA virus that rapidly spread worldwide and acquired mutations. This study was carried out to identify mutations in RdRp as the SARS-CoV-2 spread in India. We compared 50217 RdRp sequences reported from India with the first reported RdRp sequence from Wuhan, China to identify 223 mutations acquired among Indian isolates. Our protein modelling study revealed that several mutants can potentially alter stability and flexibility of RdRp. We predicted the potential B cell epitopes contributed by RdRp and identified thirty-six linear continuous and twenty-five discontinuous epitopes. Among 223 RdRp mutants, 44% of them localises in the B cell epitopes region. Altogether, this study highlights the need to identify and characterize the variations in RdRp to understand the impact of these mutations on SARS-CoV-2. SARS-CoV-2 genome encodes 29 protein molecules which are categorised into three groups including structural, non-structural and accessory proteins. SARS-CoV-2 has four structural proteins namely Spike glycoprotein, Membrane protein, Envelope protein and Nucleocapsid Phosphoprotein [1] . It also encodes sixteen non-structural proteins (Nsp1-16) and nine accessory proteins. The 16 non-structural proteins are synthesised as a single polypeptide molecule of 7096 amino acids known as Orf1ab that is subsequently cleaved into 16 separate proteins [2] . The RNA dependent RNA polymerase (RdRp), also known as Nsp12, is a non-structural protein that replicates SARS-CoV-2 RNA genome [1] . It associates with Nsp7 and Nsp8 and exist as a trimeric complex inside the viral envelope structure [3] . By itself, RdRp has a very weak polymerase activity; however, the complex of RdRp with Nsp7 and Nsp8 significantly increases RdRp processivity and template affinity [4] . RdRp of SARS-CoV-2 is 932 residues in length and contains distinct polymerase and nucleotide binding domains with a central connecting domain. Structurally, RdRp is comprised of an N-terminal β-hairpin (residues 31-50) followed by an extended nidovirus RdRp-associated nucleotidyl-transferase domain (NiRAN, residues 115-250) [5] . Following the NiRAN domain is an interface domain (residues 251-365) connected to the RdRp domain (residues 366-920). Further, the domains of RdRp arranges in such a way that it forms a canonical right-handed cup configuration [6] , with the finger subdomain (resides 397-581 and residues 621-679) forming a closed circle with the thumb subdomain (residues 819-920) [5] . Bioinformatics has enabled researchers to study large number of epitopes and their properties without the risk of growing pathogens. It has drastically reduced the cost of study and faster output over the conventional methods of vaccine study. Further, the amalgamation of various genome-wide studies with the immunoinformatics has revolutionised the identification of epitopes contributed by a protein or virus and accelerated our understanding of vaccine design and action [7, 8] . Several studies show that SARS-CoV-2 RdRp participates in host immune response and thus provides insights into viral pathogenesis [9] [10] [11] [12] . RdRp stimulates a considerable amount of immunogenicity due to its lower glycosylation density as compared to other structural proteins and several studies have revealed that RdRp induces both innate and adaptive immune response of host [9] [10] [11] . Another study has revealed that RdRp, suppresses host antiviral responses by inhibiting IRF3 nuclear translocation [12] . Furthermore, RdRp is one of the most conserved enzyme across several viral species, such as influenza virus, hepatitis C virus (HCV), ZIKA virus (ZIKV), and coronavirus (CoV), suggesting that its function and mechanism of action might be well conserved [13, 14] . As the SARS-CoV-2 spread to new geographical areas, it started to mutate [14] . The mutations acquired by the SARS-CoV-2 are retained as a consequence of natural selection, if the variants are more adaptable. In order to understand the variations occurring in RdRp among Indian geographical area, we analysed 50217 RdRp sequences reported from India to identified 223 mutations. The B cell epitopes contributed by RdRp were predicted in silico and the mutations were also mapped. The sequences used in this study are available on publicly accessible CoVal database (https://coval.ccpem.ac.uk/). A total of 50217 SARS-CoV-2 sequences reported between Jan 2020 till Sept 2021 from different geographical locations within India were used in this study. The mutations occurring in SARS-CoV-2 were obtained from CoVal webserver. The CoVal Webserver uses sequences from the GISAID repository and updates its information at frequent intervals. In order to identify the variations present in the RdRp sequences among Indian isolates of SARS-CoV-2, the MSAs were conducted by Clustal omega programe [15] as described earlier [16] . The prediction of linear continuous B cell epitopes were conducted by IEDB [17] . The IEDB webserver provides training set for evaluation of existing epitope prediction methods and constitute platform for development of novel and better algorithm for prediction. IEDB webserver also provides tools for the prediction of linear B-cell epitopes from protein sequence including amino acid scales and HMMs, DiscoTope, ElliPro, Paratome, and PIGS. The IEDB contains epitopes derived from the peer-reviewed literature, patent applications, direct submission, and other publicly available databases, for example, FIMM, HLA Ligand database, and MHC binding database. The IEDB prediction method known as 'Bepipred linear epitope prediction method 2.0' was used in this study. For this prediction the threshold value of 0.500 was used during the evaluation. The prediction of discontinuous B cell epitopes was performed by an online tool 'DiscoTope 2.0'. For this prediction the threshold value was set at − 3.7. We performed protein modelling studies by DynaMut programe [18] as described earlier [19] . The DynaMut web server was used for analysing thermodynamic stability of different RdRp mutations observed in this study. The DynaMut webserver introduces dynamics component for mutational analysis to predict difference in free energy (ΔΔG) and vibrational entropy (ΔΔS). This webserver implements Normal mode analysis (NMA) through two different approaches, Bio3D and ENCoM, that provide rapid and simplified access to analyse protein dynamics and stability resulting from vibrational entropy changes [18] . It also enables to assess the effects of missense mutations on protein stability and provide comprehensive suite for protein motion and flexibility analysis and visualization (http://biosig.unimelb.edu.au/dynamut/). For this study, we used recently reported structure of RdRp (PDB ID: 7BV1) [5] . The effect of mutations on protein is shown in terms of difference in free energy (ΔΔG). DynaMut provides difference in vibrational entropy (ΔΔSvib ENCOM) between the wild type and mutant protein. We ran DynaMut webserver to calculate the ΔΔG and ΔΔSvib ENCOM that provides the impact of mutation on protein structure and stability. In order to identify the mutations in RdRp, we used CoVal webserver that compares the first reported sequence of RdRp from Wuhan, China with the sequences reported from India. Till Sept 2021, a total of 50217 sequences has been analysed by CoVal webserver. The data revealed 223 mutations present among the Indian sequences of RdRp as shown in Table 1 . The mutations are also demonstrated on the schematic representation of RdRp as shown in Fig. 1A . Our result show that the mutations are spreading all over the RdRp polypeptide sequence. The distribution of mutations in different domains of RdRp has been highlighted in Fig. 1A . This data strongly indicates that RdRp is one of the most frequently mutated proteins of SARS-CoV-2 because we observed 223 mutations till Sept 2021. Furthermore, we looked at the time course of the samples used for mutational study by CoVal webserver. This webserver shows the monthly appearance of new mutations from India. Based on the mutational analysis of RdRp by CoVAL webserver, we observe that during initial phase of COVID19 pandemic, the rate of occurrence of new mutations were high but it slowed down as the time progresses (Fig. 1B) , suggesting that the virus is attaining mutational stability over time. We performed protein modelling studies using DynaMut programe to understand, if the mutation observed in RdRp can alter protein structural integrity. Our data revealed that mutations at 89 positions cause stabilisation in protein structure (positive ΔΔG) as shown in Table 1 , maximum positive ΔΔG was obtained for Q822K (1.801 kcal/mol). Similarly, the mutations at 111 positions cause destabilisation (negative ΔΔG) in protein structure upon mutation (Table 1) , maximum negative ΔΔG was obtained for the mutant I244T (-2.233 kcal/mol). Subsequently, we measured the changes in vibrational entropy energy (ΔΔSVibENCoM) between the wild type and the mutant. Our data revealed that mutation at 89 positions causes increase in flexibility of mutant protein (positive ΔΔSVibENCoM). The maximum positive ΔΔSVibENCoM was obtained for T929I (4.55 kcal.mol-1.K-1) mutant. Similarly, the mutations at rest of the 111 positions cause rigidification of protein structure (negative ΔΔSVibENCoM) in protein structure upon mutation ( Table 1 ). The maximum negative ΔΔSVibENCoM was obtained for Q932H (-4.93 kcal.mol-1.K-1) mutant. Altogether, our data revealed that the mutation observed in RdRp affects both protein dynamicity and flexibility. The continuous B-cell epitopes of RdRp were predicted by IEDB webserver tool and the epitopes are shown in Fig. 2A . The yellow area of the graph corresponds to those regions of the RdRp that can potentially contribute to the B cell epitopes. Our data demonstrated thirty-six epitopes of varying lengths that could potentially act as B cell epitopes (Fig. 2B ). Among those peptides, the 'peptide 18' is the largest epitope of 44 amino acids (from RdRp residue 482 to 525). Similarly, peptide 5, 19, 30, 31 and 34 are comprised of single amino acid only (Fig. 2B ). Subsequently, we predicted the B cell epitopes of RdRp based on its three dimensional structure using DiscoTope 2.0 webserver tool [20] . Our analysis revealed twenty-five discontinuous epitopes of RdRp having high score. The locations of these epitopes are listed in Fig. 2C along with its propensity and DiscoTope score. Among discontinuous epitopes, approximately 80% of them (20 out of 25) reside towards the C-terminal end of RdRp (from residue 800 to 932) as shown in Fig. 2C . Altogether, our data revealed B cell epitopes contributed by RdRp. The list demonstrates the location and details of mutations of RdRp identified by CoVal webserver. The RdRp sequence reported from Wuhan, China was used as wild type sequence for this analysis. The 50217 sequences of RdRp reported from India (till Sept 2021) were used for identifying mutations and their frequency. The ΔΔG and ΔΔSvib ENCOM values were obtained by protein modelling using DynaMut programe. The positive and negative ΔΔG represents increase and decrease in protein stability upon mutation. Similarly, the positive and negative ΔΔSvib ENCOM represents the increase in flexibility and rigidity upon mutations. The mutation that localises in the unmodeled region of RdRp was not used in the analysis of ΔΔG and ΔΔSvib ENCOM and they were left blank (denotes by -) in Next, we analysed and compared the RdRp mutations that reside in the linear-continuous and discontinuous B cell epitopes. Our data revealed that out of 223 mutants observed in this study, 98 resides in the B cell epitope region of RdRp (Fig. 3A) . These 98 mutants correspond to 44% of the total mutants observed among Indian isolates. The details of all 98 mutants that localises in B cell epitope region are shown in Fig. 3B . Altogether, our data strongly suggest that several RdRp mutations localises in the B cell epitope region. The coronaviruses belongs to RNA viruses that exhibits high rate of mutations in their genome [21] . As these viruses spread to new locations they keep on acquiring mutations and few of them are naturally selected because of their beneficial effect on the virus. The investigation on the genomic variation acquired by SARS-CoV-2 is indispensable for understanding the epidemiology, pathogenesis; devise preventive measures and treatment strategies against COVID-19. The earlier variation studies on SARS-CoV-2 revealed that RdRp is among the mutational hotspot protein [14] . In the similar directions, this study was conducted with an aim to identify mutations in RdRp from Indian isolates. Our earlier study revealed seven crucial mutations in RdRp of SARS-CoV-2 [22] that can have potential impact on this protein function. The present study identifies and characterises B cell epitope contributed by RdRp and correlate them with the observed mutants. In this study, we analysed 50217 RdRp sequences reported from India till Sept 2021 and identified 223 mutations in RdRp, which indicates that RdRp is one of the mutational hotspot protein of SARS-CoV-2. Furthermore, our data revealed that there are thirty-six high rank linear-continuous B cell epitope as well as twenty-five discontinuous B cell epitopes. Moreover, we also identified that out of 223 mutants identified among Indian isolates, 98 resides (44%) in these B cell epitope region. We used bioinformatics approach to identify probable epitopes that offer various advantages over conventional approaches. However, Such as the final selection of epitopes from the probable epitopes identified using bioinformatics is still a challenging task. The RdRp epitopes revealed in this study requires validation using in vivo experiments, which is slow and herculean task. Furthermore, the algorithms used for predicting epitopes are liable to alter if the criteria are changed during the tool selection. Therefore, the algorithms are constantly improved to get better output and more reliable data [7, 8] . The variations in RdRp or any other protein of SARS-CoV-2 will possibly tell us how the virus is evolving. Earlier studies with RNA viruses have also shown that these viruses keep on mutating to better adapt and survive in the host [23] . Here, in this study, we have reported RdRp mutations, its correlation with B cell epitopes. However, it warrants future studies to understand the possible effect of these mutations on virus infectivity and life cycle. Linear continuous B-cell epitopes contributed by RdRp, the Y-axis of the graph corresponds to BepiPred score, while the X-axis depicts the RdRp residue positions in the sequence. The data was generated by IEDB webserver using 'Bepipred Linear Epitope Prediction 2.0' method. The chart is divided into two parts, yellow and green. The RdRp residues present in the yellow area have higher probability to be part of the linear continuous B cell epitope. B) The details of the linear continuous B cell epitopes are listed. The sequence of each peptide along with its start and end point in the RdRp polypeptide sequence is also mentioned. C) Prediction of discontinuous B-cell epitopes of RdRp by DiscoTope 2.0 web tool. The position of each predicted epitope is mentioned along with its propensity and DiscoTope score. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.) (Project number: SRG/2020/000808). A new coronavirus associated with human respiratory disease in China Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan Structural and biochemical characterization of nsp12-nsp7-nsp8 core polymerase complex from SARS-CoV-2 The SARS-coronavirus nsp7+ nsp8 complex is a unique multimeric RNA polymerase capable of both de novo initiation and primer extension Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir RNA synthetic mechanisms employed by diverse families of RNA viruses Towards a consensus on datasets and evaluation metrics for developing B-cell epitope prediction tools An overview of bioinformatics tools for epitope prediction: implications on vaccine development Immune response to SARS-CoV-2 and mechanisms of immunopathological changes in COVID-19 Immunoinformatics identification of B-and Tcell epitopes in the RNA-dependent RNA polymerase of SARS-CoV-2 Analysis of SARS-CoV-2 RNA-dependent RNA polymerase as a potential therapeutic drug target using a computational approach SARS-CoV-2 nsp12 attenuates type I interferon production by inhibiting IRF3 nuclear translocation RNA-dependent RNA polymerase of SARS-CoV-2 as a therapeutic target Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant The EMBL-EBI search and sequence analysis tools APIs in 2019 The molecular assessment of SARS-CoV-2 Nucleocapsid Phosphoprotein variants among Indian isolates The immune epitope database (IEDB): 2018 update DynaMut: predicting the impact of mutations on protein conformation, flexibility and stability Identification and molecular characterization of mutations in nucleocapsid phosphoprotein of SARS-CoV-2 Reliable B cell epitope predictions: impacts of method development and improved benchmarking The 2019-new coronavirus epidemic: evidence for virus evolution Identification of novel mutations in RNAdependent RNA polymerases of SARS-CoV-2 and their implications on its protein structure Mechanisms of viral mutation We would like to acknowledge Patna University, Patna, Bihar (India) for providing infrastructural support for this study. This work has been partly funded by (Science and Engineering Board, Department of Science and Technology, Government of India) a project awarded to GKA