key: cord-0741050-et2n4dnl
authors: Periwal, Neha; Rathod, Shravan B.; Pal, Ranjan; Sharma, Priya; Nebhnani, Lata; Barnwal, Ravi P.; Arora, Pooja; Srivastava, Kinshuk Raj; Sood, Vikas
title: In silico characterization of mutations circulating in SARS-CoV-2 structural proteins
date: 2021-04-02
journal: Journal of biomolecular structure & dynamics
DOI: 10.1080/07391102.2021.1908170
sha: b426523ae741080e13c922a757be93305472185b
doc_id: 741050
cord_uid: et2n4dnl

SARS-CoV-2 has recently emerged as a pandemic that has caused more than 2.4 million deaths worldwide. Since the onset of infections, several full-length sequences of viral genome have been made available which have been used to gain insights into viral dynamics. We utilised a meta-data driven comparative analysis tool for sequences (Meta-CATS) algorithm to identify mutations in 829 SARS-CoV-2 genomes from around the world. The algorithm predicted sixty-one mutations among SARS-CoV-2 genomes. We observed that most of the mutations were concentrated around three protein coding genes viz nsp3 (non-structural protein 3), RdRp (RNA-directed RNA polymerase) and Nucleocapsid (N) proteins of SARS-CoV-2. We used various computational tools including normal mode analysis (NMA), C-α discrete molecular dynamics (DMD) and all-atom molecular dynamic simulations (MD) to study the effect of mutations on functionality, stability and flexibility of SARS-CoV-2 structural proteins including envelope (E), N and spike (S) proteins. PredictSNP predictor suggested that four mutations (L37H in E, R203K and P344S in N and D614G in S) out of seven were predicted to be neutral whilst the remaining ones (P13L, S197L and G204R in N) were predicted to be deleterious in nature thereby impacting protein functionality. NMA, C-α DMD and all-atom MD suggested some mutations to have stabilizing roles (P13L, S197L and R203K in N protein) where remaining ones were predicted to destabilize mutant protein. In summary, we identified significant mutations in SARS-CoV-2 genomes as well as used computational approaches to further characterize the possible effect of highly significant mutations on SARS-CoV-2 structural proteins. Communicated by Ramaswamy H. Sarma

Coronaviruses are enveloped viruses having single-strand RNA as genetic material. First pandemic of coronaviruses was caused by SARS coronavirus in 2002 resulting in 919 deaths worldwide . Another wave of coronavirus pandemic was caused by MERS coronavirus in 2012 leading to 2499 infections and 858 deaths (Memish et al., 2020) . Notably, MERS coronavirus was more pathogenic with around 34.3% mortality as compared to SARS coronavirus with mortality ratio of around 9.6%. Currently, we are witnessing another case of pandemic caused by a novel SARS coronavirus which originated in Wuhan, China . The virus known as SARS-CoV-2 has infected millions and has caused more than 2.4 million deaths worldwide (https://www.who.int/publications/m/item/weekly-epidemiological-update-23-february-2021). Since the onset of infection, several full-length sequences of viral genome have been made available with an aim to gain insights into the viral pathogenesis. These genomic sequences have been instrumental in gaining understanding into evolution and pathogenesis of SARS-CoV-2. Recent reports suggest that SARS-CoV-2 could have originated from bats and showed around 89.1-96% similarity with other bats originated coronaviruses Zhou et al., 2020) . Further analysis identified bat coronavirus RaTG13 as the closest relative of SARS-CoV-2. Scientists have obtained several insights from SARS-CoV-2 genomes which might help explain the rapid transmission of SARS-CoV-2. Apart from information obtained from viral genomes, the pattern of mutations that virus accumulate also sheds light into the possible evolution and pathogenesis of the virus. High rate of mutations among RNA viruses further hinders the development of potential therapeutics against them (Peck & Lauring, 2018) .

As SARS-CoV-2 infections are progressing, there are chances that virus might accumulate several new mutations. Nucleotide substitution plays a major role in virus evolution and several groups are studying viral genomic sequences to identify mutations driving SARS-CoV-2 evolution and pathogenesis. One of the earlier studies identified several nucleotide substitutions and deletions among SARS-CoV-2 genomic sequences from around the world (Phan, 2020) . Data from phylogenetic analysis of SARS-CoV-2 identified three clusters within the virus circulating throughout the world. The study revealed that cluster A and C were more prevalent in Europe and America whereas cluster B was confined to East Asia (Forster et al., 2020) . Another study identified 782 variant sites in 3067 genomes from 55 countries. Out of these 782 variants, around 65% of them were non-synonymous in nature suggesting that the majority of the mutations that occur in SARS-CoV-2 can result in a non-synonymous mutation (Laamarti et al., 2020) . A recent report further aimed to characterize patient derived mutations on the pathogenesis of SARS-CoV-2 (Yao et al., 2020) . The authors characterized eleven different mutations in patient derived viral isolates. These viral isolates differed significantly towards viral load as well as their infectivity of Vero cells thereby pointing towards the crucial role of mutations in viral pathogenesis. Another group utilized computational methods to characterize D614G mutation in SARS-CoV-2 spike protein and observed that the mutation led to generation of serine protease cleavage site near S1-S2 junction (Bhattacharyya et al., 2020) . Further characterization of this mutation revealed that pseudovirus containing spike protein with D614G mutation entered Angiotensin-converting enzyme 2 (ACE2) expressing cells more efficiently as well as higher incorporation of S protein in virion particles . Further reports suggested that D614G spike mutation led to increased infectivity of the SARS-CoV-2 (Korber et al., 2020) . Since the onset of SARS-CoV-2 infections, it has accumulated several mutations and the recent emergence of new SARS-CoV-2 strain in the United Kingdom further attest to the fact. Further analysis revealed that some of the mutations in this new strain might lead to higher fitness of the virus (Singh et al., 2021) .

As SARS-CoV-2 infections are increasing rapidly with each passing day, the virus is accumulating new mutations that might help the virus infectivity. Hence, we utilized meta-analysis based approaches to identify highly significant mutations among viral genomic sequences from around the world. In order to further understand the possible role of the mutations in viral structural proteins, we utilized a suite of algorithms including normal mode analysis, discrete molecular dynamics and all-atom molecular dynamic simulations to predict the possible effect of these mutations on parent proteins. Our extensive analysis of several mutations on SARS-CoV-2 structural proteins revealed that L37H mutation in E protein, G204R and P344S in N protein and D614G in S protein were destabilizing the parent protein whereas P13L, S197L, and R203K in N protein had stabilizing effect on the parent protein. This study provides valuable insights into possible changes of SARS-CoV-2 structural proteins due to novel mutations from around the world.

SARS-CoV-2 sequences used in this study were obtained from Virus Pathogen Resource (ViPR) (Pickett et al., 2012) . A total of 829 sequences that were available to download till 18 April 2020 were used in this study. All the sequences were broadly grouped according to their geographic locations. A total of twelve groups were formed representing the six continents from where the virus was sequenced. We utilized a meta-data driven comparative analysis tool for sequences (Meta-CATS) algorithm to identify significant changes in various SARS-CoV-2 genomes (Pickett et al., 2013) . All the analysis was performed on default settings and mutations having p < 0.05 were considered to be significant.

The crystal structures of the envelope protein (PDB: 7K3G) (Mandala et al., 2020) , C-terminal domain (CTD) of nucleocapsid phosphoprotein (PDB: 6ZCO) (Zinzula et al., 2021) , and spike glycoprotein (PDB: 6VYB) (Walls et al., 2020) were retrieved from the protein data bank (PDB) (https://www. rcsb.org/). We studied single mutation, L37H, and D614G in envelope and spike proteins respectively. However, in nucleocapsid protein, we analysed five mutations, P13L, S197L, R302K, G204R, and P344S that were spanning the whole protein. While searching for N protein crystal structures through the PDB database, we found that only Pro344 residue containing protein crystal structure (CTD, PDB: 6ZCO) has been reported. Since crystal structure of N protein spanning the remaining mutations were not available, hence we used modelled structure (Code: QHD43423) (https://zhanglab. ccmb.med.umich.edu/COVID-19/) to study the effect of remaining mutations on N protein. The structures used in the study i.e. envelope, nucleocapsid C-terminal domain, nucleocapsid modelled, and spike proteins have 31 (a.a.: 8-38), 135 (a.a.: 230-364), 419 (a.a.: 1-419) and 698 (a.a.: 1-698) amino acid residues respectively. Envelope, nucleocapsid CTD, and spike have 2.1 Å (Mandala et al., 2020) , 1.36 Å (Zinzula et al., 2021) , and 3.20 Å (Walls et al., 2020) resolution respectively. The missing residues in N protein CTD and spike protein were added using ModLoop (Fiser & Sali, 2003) and SWISS-MODEL web server (Schwede et al., 2003) . To investigate the effects of mutation on the stability and flexibility of protein, mutations were inserted in all three wild-type proteins using the protein editing tool, PyMOL (Schr€ odinger LLC, 2010). Further, the modelled N protein was refined using Chiron (Ramachandran et al., 2011) web server without imposing any constraints. During the refinement, the van der Waals repulsion energy between the two residues was computed using the CHARMM19 force field (Brooks et al., 1983) . The cut-off value of repulsion energy was set to 0.3 kcal mol À1 (0.5 K B T) hence, all of the steric clashes between the residues were removed that had greater than 0.3 kcal mol À1 repulsion energy. The root mean square deviation (RMSD) between the unrefined and refined N modelled protein was found 1.28 Å. For further analysis, experimentally solved structures, E, N protein CTD, and S proteins were directly used without any refinements.

The impact of mutation on protein functions were evaluated using PredictSNP (Bendl et al., 2014) . This web server compiles six best prediction tools (MAPP, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT, and SNAP) and gives accurate prediction confidence scores for deleterious or neutral types of mutations. Embedded algorithms in PredictSNP use different approaches to predict the mutation effect. To evaluate accurate nature of the mutation, MAPP, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT, and SNAP utilize physicochemical properties and alignment score, support vector machine, expert set of empirical rules, naïve Bayes, alignment score, and neural network methods respectively (Bendl et al., 2014) . PredictSNP uses the following mathematical expression to calculate the score,

indicates the number of integrated tools, d i is an overall prediction (neutral prediction, À1; deleterious prediction, þ1) and S i represents the transformed confidence scores. PredictSNP Score is from À1 to þ1 in which, À1 to 0 is denoted for neutral and 0 to þ1 for deleterious nature of mutation.

To probe the effects of mutation on the stability and flexibility of proteins, free energy change (DDG) was calculated using the structure-based tool DynaMut (Rodrigues et al., 2018) . DynaMut employs normal mode analysis (NMA) to calculate DDG between the wild-type (WT) and mutant-type (MT) structures. Along with its own prediction, DynaMut also gives DDG prediction of NMA based elastic network contact model (ENCoM) (Frappier & Najmanovich, 2014) and structure-based predictions of mCSM (Pires et al., 2014a) , SDM (Worth et al., 2011) , and DUET (Pires et al., 2014b) . Furthermore, DynaMut predicts ENCoM based vibrational entropy difference (DDS Vib ) of WT and MT which predicts whether the mutation will increase or decrease the flexibility of the protein. ENCoM utilizes coarse-grained normal mode analysis that calculates the DDS Vib between WT and MT structures by screening the all pairwise atomic interactions. To calculate the number of total interactions in WT and MT proteins, the output protein structures from the DynaMut were uploaded to Arpeggio web server (Jubb et al., 2017) .

Additionally, to validate the predictions of DDG of WT and MT proteins calculated using DynaMut, sequence-based tools, single amino acid folding free energy changessequence (SAAFEC-SEQ) and I-Mutant2.0 (Capriotti et al., 2005) were used. SAAFEC-SEQ employs physicochemical properties at mutation position, sequence characteristics, and evolutionary data to predict DDG. However, I-Mutant2.0 employs the support vector machines (SVMs) based predictors and predicts the sign and magnitude of DDG at the different pH values. We used 7 pH to compute DDG of WT and MT structures.

In order to understand the structural dynamics of WT and MT proteins, C-a discrete molecular dynamics implemented in MDWeb was used (Hospital et al., 2012) . The DMD simulations of N and S proteins were performed for 500 ns at 300 K temperature. During the simulation, output frequency, sigma (square well amplitude for consecutive carbons), sigma_go (square well amplitude for non-consecutive carbons), and cut-off distance between pairs of atoms were kept fixed at 10 ps, 0.05, 0.1, and 8 Å respectively. Root mean square fluctuation (RMSF) and radius of gyration (Rg) were calculated and plotted for further investigation.

The WT and variants of novel SARS-CoV-2 protein were subjected to MD simulation using CHARMM36 (Huang et al., 2017) force field available with the GROMACS 2020 kit (Abraham et al., 2015; Berendsen et al., 1995) . The protein was solvated with water molecules (spc216) and placed in the canter of a dodecahedron box with a minimum of 1 nm from the box edge. Ions were added to neutralize the system by substituting water molecules. Energy minimization was performed using the steepest descent method with the Verlet cut-off scheme and particle mesh ewald (PME) longrange electrostatics. The system was then subjected to an equilibration (NVT and NPT) for 100 ps at 300 K. The final MD production runs were performed for 50 ns for each protein structure using the leapfrog integration algorithm, PME, and Verlet cut-off scheme at 300 K temperature and 1 bar pressure. Each trajectory was then subjected to further analysis of root mean square deviation (RMSD) of backbone, root mean square fluctuation (RMSF) for alpha-carbon of protein, radius of gyration (Rg) of whole protein structure, Intramolecular hydrogen bonds of protein structure, and solvent accessible surface area (SASA) of entire protein to analyse the dynamic behaviour of residues and stability of proteins using inbuilt tools in GROMACS suit. Additionally, the Arpeggio web server was used to calculate the total intramolecular interactions (Jubb et al., 2017) .

As the SARS-CoV-2 infections are increasing to new geographical locations, new mutations are arising among the viral genomes. In order to identify highly significant mutations among SARS-CoV-2 genomes, we utilized meta-analysis based approaches to identify novel mutations in 829 viral genomes from twelve countries (Supplementary Table S1 ). We observed that most of the mutations were confined to three locations in the SARS-CoV-2 genome namely nsp3, RdRp and N proteins ( Figure 1A and Supplementary Table  S2 ). However, apart from these mutations, we identified several other mutations that were randomly distributed throughout the SARS-CoV-2 genome ( Figure 1A ). Next, we studied the effect of various mutations for possible amino acid substitutions. We identified nearly thirty-seven non-synonymous mutations in the genome of SARS-CoV-2 ( Figure 1B -F, Supplementary Table S3) . Out of these, twenty non-synonymous mutations were found in ORF1ab (open reading frame 1ab), five mutations were the part of S protein, one mutation was present in E protein and five mutations were present in N protein. In order to study the frequency of these mutations among circulating SARS-CoV-2 genomes, we calculated the percentage of genomes carrying the mutations. We observed that most of these mutations were present in >30% of genomes suggesting that these mutations are part of majority SARS-CoV-2 circulating genomes ( Figure 1G ).

Amino acid variation in proteins affects protein folding (Alfalah et al., 2009; Lorch et al., 1999 Lorch et al., , 2000 , stability (Doss et al., 2012) , and functioning (Yamada et al., 2006 ) that leads to various degenerative diseases (Blocquel et al., 2019) . It has reported that around half of the total disease-related amino acid mutations arise from non-synonymous single nucleotide variants (SNVs) (Halushka et al., 1999) . It is difficult to differentiate the nature of mutation using any experimental procedure due to the meteoric growth of SNVs (Capriotti et al., 2012; Tranchevent et al., 2011) . To fill this gap, computational approaches are available to analyse the effects of various mutations on protein structure and function.

Hence, we studied the effect of mutations in circulating SARS-CoV-2 genomes and report the study of several mutations in envelope (E), nucleocapsid (N), and spike (S) structural proteins. The envelope protein of SARS-CoV-2 consists of 75 amino acids and three regions, N-terminal domain (NTD), transmembrane domain (TMD), and C-terminal domain (CTD). TMD is composed of 31 amino acids (a.a.: 8-38) (Mandala et al., 2020) . The structures of E protein are illustrated in Figure 2A -C. N protein contains disordered regions and has several domains and flexible linkers which is shown in Figure 2D . Recent report has shown that though the structure of the N protein of SARS-CoV-2 is similar to the previously reported SARS-CoV N protein, however, they differ in electrostatic properties (Kang et al., 2020) . SARS-CoV-2 N protein contains two RNA binding domains, the C-terminal domain (CTD) and N-terminal domain (NTD) (Tomaszewski et al., 2020) . These two domains are connected via a central linker. The terminal regions have been reported as intrinsically disordered regions (IDRs) (Wang et al., 2021) . The crystal structure of N protein CTD ( Figure 2E ) has been solved and deposited in the PDB database (PDB: 6ZCO). The surface of the virus is decorated by the homotrimeric spike glycoproteins which looks like a crown. Each spike protomer consists of S1 and S2 domains which are separated by furin cleavage site. S1 and S2 domains have multiple regions like N-terminal domain (NTD), NTD to residue binding domain (RBD) linker N2R, and subdomains SD1 and SD2 in S1 while, fusion peptide (FP), heptad repeat 1 (HR1), central helix (CH), connector domain (CD), heptad repeat 2 (HR2), transmembrane domain (TM), and cytoplasmic tail (CT) in S2 (Gobeil et al., 2021; Saputri et al., 2020) . The crystal structure of spike protein trimer and S1 domain are shown in Figure 2F and Figure 2G respectively. Mutation sites and WT residues of three proteins (E, N and S) are shown in Figure 1I -VI. Interestingly it was observed that spike protein of SARS-CoV-2 harboured an RBD domain which was predicted to facilitate efficient binding of viral spike protein with human ACE-2 protein which acts as a receipt for this virus. SARS-CoV-2 spike protein analysis further revealed presence of a polybasic site at S1-S2 junction, a feature absent from SARS coronaviruses (Andersen et al., 2020) .

We started with the L37H mutation that is located in the transmembrane domain (TMD) of the envelope protein. It is reported that this mutation has the potential to change relative solvent accessibility (Nguyen et al., 2021) . Literature review suggested that the nature of this mutation has not been studied yet completely (Bianchi et al., 2020; Sarkar & Saha, 2020) . In this mutation, nonpolar leucine is substituted by a polar histidine residue. This mutation was predicted to be neutral at an 83% confidence score. In N protein, it has been reported that the mutations, P13L and R203K are among the top ten high-frequency SNPs . Moreover, P13L is present in the first disordered region, R203K and G204R are in the second disordered region, while P344S is in the CTD domain (Dasgupta, 2020) . Three mutations in N protein (P13L, S197L, and G204R) were found to have a deleterious effect on protein functions whereas, the remaining two mutations, R203K and P344S were predicted to be neutral in nature. The single mutation, D614G in S protein was predicted neutral. Hence, mutations in all three proteins which are predicted to be deleterious, may alter the functions and reduce the fitness of coronavirus, while those which are found neutral do not have adverse effects on the protein function. PredictSNP results for the mutations are shown in Table 1 while the predicted results of all predictors with a confidence score of deleterious and neutral mutation for each protein are tabulated in Supplementary Table S4 .

To inspect the overall effects of missense mutation on protein stability, DDG was calculated using structure and sequence based predictors tools including DynaMut, SAAFEC-SEQ, and I-Mutant2.0.

L37H mutation is part of the terminal region in a helix of the pentameric transmembrane protein. Among seven predictors that we studied, six of them predicted that L37H mutation in E protein destabilizes the overall structure. I-Mutant2.0, SDM, and DUET also gave the significant magnitude of destabilization whereas the remaining predictors predicted minute destabilization for E protein. The majority of predictors predicted that the mutation results in destabilization of the protein. Looking at the interactions in WT and MT, it can be observed that a weak hydrogen bond of Leu37 with Ile33 in WT is missing in MT. In WT, Leu37 has 33 proximal interactions at around 4-5 Å with surrounding residues, Ala32, Ile33, Leu34, Thr35, and Ala36. However, in MT these proximal interactions are reduced to 20 only. Hence, these two major factors play a significant role in destabilizing MT. 

In N protein, the P13L mutation site is located at the centre of protein and at the loop region. DDG values confirmed that P13L mutation gives stability to the MT. Although four predictors predict stabilization whereas three predict destabilization of the N protein, data from all-atom MD simulation (50 ns simulation) suggested that MT forms more number of intramolecular contacts as compared to WT thereby leading to stabilization of the N protein. The major contribution for the stability comes from proximal polar contacts. In WT, 12 proximal polar and hydrophobic contacts with Asn11, Asn48, Asn150, and Ile15 are observed but at longer distances (5-7.5 Å) as compared to seven contacts with Asn11, Asn48, and Ile15 at shorter distances (4-6 Å) in MT. So, the stability of MT protein is mainly attributed to the proximal polar and hydrophobic interactions. Our data is in agreement with another study which suggests that the P13L mutation in NTD stabilizes the N protein .

The second mutation, S197L is situated at a peripheral position of IDR in N protein. From the DDG values as shown in Table 1 , we can observe that the majority of the predictors predict stabilization of MT and that is mainly due to MT having 41 proximal polar contacts in comparison with WT that has 35. Additionally, in MT, Leu197 forms van der Waals bond with Ser193 and weak hydrogen bonds with Ser193 and Pro199. Hence, the overall stability is gained by polar contacts. Our In silico data is further supported by the experimental evidence which suggests that the S197L mutation has mild patient outcome ranging from 76% to 1% . For the third mutation R203K, DynaMut and EnCOM based DDG values were significantly higher, 1.774 kcal mol À1 and 4.666 kcal mol À1 respectively and predicted stabilization compared to the other predictors. Further analysis suggests reduced vibrational entropy in MT (-4.166 kcal mol À1 K À1 ) thereby indicating the rigidification of structure leading to reduced flexibility of MT as compared to WT. Additionally, the radius of gyration (Rg) from MD shows that the MT structure becomes compact at the end of the simulation. Another supporting evidence is the formation of new van der Waals bonds with Ala211 and Gly212 and hydrophobic contacts with Phe307, Leu339, and Asp216 in MT for the stabilization of overall structure due to R203K mutation. MD data shows the overall contacts in MT are higher compared to WT and that can be also considered supporting evidence of stabilizing effect of R203K mutation.

In the G204R mutation, all predictors predict the destabilization of mutant N protein except for ENCoM because negative free energy change is occurring after mutation. MD results show that MT has less number of intramolecular interactions compared to WT which is a good indication of the destabilizing effect of G204R mutation on N protein. Both the mutations, R203K and G204R have inferior outcomes according to the previously reported study .

In the crystal structure of the C-terminal domain (CTD) of N protein, P344S mutation was found to have a destabilizing effect on mutant protein. Though the magnitudes of DDG are slightly negative, however, this miniscule difference leads to destabilization of CTD. The destabilizing nature of this mutation was reported elsewhere as well . Since the total number of contacts predicted by DynaMut in WT and MT are roughly similar, hence formation of contact could not be considered an evidence of destabilization. However, MD analysis revealed that the total number of interactions has been decreased in MT in comparison with WT. Furthermore, the structural features of amino acids can be considered one of the factors for the destabilization.

Here, rigid and conformationally blocked proline is replaced by flexible serine residue thereby resulting in a flexible CTD Pro344 and Ser344 both are interacting with the same residues, Lys342, Phe346, and Lys347 in WT and MT respectively.

In spike protein, D614G mutation is one of the most widely studied mutations (Fernandez, 2020; Hou et al., 2020; Korber et al., 2020; Plante et al., 2021; Volz et al., 2021; Zhang et al., 2020) . In this mutation, aspartic acid is replaced by glycine and the predicted values of free energy change show that the mutation leads to less stable structure formation. DDG values are negative in all the seven predictions thereby pointing towards the destabilizing effect of this mutation on parent protein. The higher stability of the WT structure is largely due to more numbers of polar contacts which are nearly double (contacts: 35) as compared to MT (contacts: 18) . In WT protein, Asp614 forms approximately 40 polar contacts with Asp111, Gln115, Val120, Phe596, Tyr612, Asn616, and Cys649 residues whereas, only 16 proximal contacts are observed in MT with Gly614 with Lys97, Arg102, Trp104, Leu117, Leu118, Asn121, Tyr124, and Val130 residues. Also, a positive change in entropy indicates the highly dynamic nature of MT spike protein. Table 2 shows the number and types of interaction of mutant and wild-type residues with surrounding residues while visual representation is given in Figure 3 .

To understand how the mutations affect the dynamic behaviour of various SARS-CoV-2 proteins, we further calculated vibrational entropy change between WT and MT structures of the proteins. It can be depicted from Figure 4 that mutations, L37H of E protein, P13L and P344S of N protein and D614G in S protein give flexibility to the MT structures. In E protein, L37H mutation is situated in the terminal part of the transmembrane domain and makes the whole transmembrane helix slightly flexible. P13L mutation is located at the N-terminal IDR and substitution of proline with leucine gives the highest flexibility to the NTD as compared to the CTD with DDS vib 0.857 kcal mol À1 K À1 . The second mutation, S197L is present at the central linker IDR which is the loop region of the protein. This mutation slightly increases the flexibility (DDS vib : 0.013 kcal mol À1 K À1 ) of only the IDR region. Similarly, R203K and G204R mutations are also positioned at the central linker IDR and the loop of proteins. From the DDS vib values, it was confirmed that due to these two mutations, N protein gains the highest rigidity among all mutations. Like a P13L mutation, due to proline residue substitution in P344S mutation, MT attains flexibility but less extent at the CTD region. In S protein, the D614G mutation site is located at the loop region of S1 domain. In this mutation, charged aspartic acid is substituted by nonpolar glycine. Charged amino acids contribute to protein stability by forming electrostatic contacts or hydrogen bonds (Zhou & Pang, 2018) . Hence, spike protein MT gains flexibility due to the lack of polar aspartic acid residue. Flexibility in MT is observed maximum at the CTD region and maximum at residue binding domain (RBD). The values of DDS Vib of proteins are given in Table 1 whereas visual representations are shown in Figure 4 .

The most problematic challenge in molecular dynamics (MD) and molecular mechanics (MM) is the demand for high computational power. To simplify and reduce the computational cost, Monte Carlo, Brownian dynamics, discrete molecular dynamics (DMD) and their hybrids simulation techniques have been developed to study the dynamics of biomolecules (Proctor & Dokholyan, 2016) . Dynamic behaviour of each amino acid in protein is closely related to the functionality of the protein. Mutation, which leads to decreased flexibility, may alter the activity of a protein (Boehr et al., 2013; Eisenmesser et al., 2005) . The mutations are also known to increase flexibility and make protein less rigid (Verma et al., 2012; Wieczorek & Zielenkiewicz, 2008) . Hence, to probe the impacts of mutation on flexibility, 500 ns C-a DMD was performed along with normal mode analysis (NMA). After running 500 ns DMD, we analysed trajectories and plotted RMSF and radius of gyration (Rg) for all mutations of N and a single mutation of S protein. Our results confirmed that C-a DMD results were in good agreement with the all-atom MD results.

It was observed from the RMSF plots ( Figure 5A ), all the mutations in N protein give flexibility to the protein at loop regions of NTD. However, R203K and G204R mutations bring the highest flexibility in the protein whereas residues from 225 to 315 show the rigidity in the CTD region. The NTD loop (a.a.: 95-100) has the highest fluctuations (8-10 Å) among all the residues. S197L and P13L MT structures have the lowest fluctuations which shows the gaining of rigidity compared to WT protein. The radius of gyration plots ( Figure  5D ) show that the WT and MT structures initially open and close at the end of simulation.

In CTD crystal structure, two regions, loop between two short helices (a.a.: 279-283) and beta-sheet (a.a.: 315-335) show the dynamic behaviour in WT and MT structures. In P344S MT, the values of RMSF ( Figure 5B ) are observed significantly greater (6-19 Å) than WT (6-13 Å) structure. It can be observed from the radius of gyration plots ( Figure 5E ) that the sequentially closing-opening-closing loop of MT whereas, WT closes initially and opens at the end of the simulation.

In spike protein, MT (D614G) has slightly lower values of RMSF ( Figure 5C ) compared to the WT structure. The radius of gyration plots ( Figure 5F ) indicate that the opening and closing of MT structure while WT shifts between opening and closing slightly in the middle of simulation and at the end of the simulation, it has little opening structure compared to WT structure. Residual analysis and Rg are illustrated in Figure 5 .

To investigate the effects of point mutation on the stability and dynamics of S protein and N protein, the wild-type (WT) and mutant structure of S protein (WT, and D614G), and modelled N protein (WT, P13L, S197L, R203K, and G204R) were subjected to 50 ns MD simulation. Additionally, we also simulated the available crystal structure of N protein CTD and it's mutant P344S for 50 ns for the same. Furthermore, we calculated the total number of intramolecular interactions using the Arpeggio webserver and the results are summarized in Supplementary Table S5 which will indicate the structural stability of protein variants.

To examine the change in the protein dynamics and stability of WT and mutant N protein, we have performed 50 ns simulation of models of WT and mutant of N protein. Since the partial crystal structure of P344s mutant was available, we simulated that crystal structure and compared it with the corresponding WT model created based on crystal structure. The structure of 50 ns frames of WT and mutants along with the initial structure are presented in Figure 6A , SASA and Rg plots of WT and mutant are presented in Figure 6B , and 6C respectively, RMSD, Hydrogen bonds, and RMSF are presented in Supplementary Figure S1 .

The Rg and SASA values of mutant P13L and S197L were observed to be lower as compared to WT and other mutants which indicates that the structure of these mutants are compact compared to WT and other mutants. Compaction of P13L and S197L mutant structures lead to significant increase in total number of contacts especially hydrogen bonds, polar and aromatics contacts as compared to WT and is indicated in Supplementary Table S5 . These results suggest that P13L and S197L mutation is possibly stabilizing the structure leading to compactness of protein N structure. Contrastingly, the Rg of R203K appears to have lower values compared to WT in the entire trajectory, while the SASA of mutant appears to have a lower profile than WT. The total number of intramolecular contacts is also higher in mutant R203K which suggests that at 50 ns, R203K adopts a stable structure compared to WT. However, Rg and SASA of G204R appear to have similar profiles like WT. Further the observed slight increase in hydrogen bonding and polar contacts compared to WT is compensated with slight loss in total number of aromatic and non-polar contacts compared to WT structure populated at 50 ns. This suggests that WT and G204R mutants adopt similar compact structures during simulation. We observed the RMSD of WT and all the mutants are slightly different in the first half of simulation, but in the second half of simulation all the variants and WT display similar behaviour, except R203K which continuously have slightly higher values compared to all other variants. Furthermore, the RMSF of all the mutants are similar except for R203K and G204R mutants which shows higher continuous fluctuations compared to WT and other two mutants suggesting that R203K and G204R mutations had an effect on the flexibility of the protein structure throughout the simulations. Based on these data, it appears that R203K and G204R structures are more dynamic compared to WT and other two mutants.

In summary, mutations P13L and S197L appear to cause structural stabilization leading to an increase in compactness of structure as well as increase in polar and non-polar intramolecular interactions compared to WT and other two mutants. However, G204R appears to have similar compactions and less intramolecular contacts as of WT, but is more dynamic, therefore G204R mutations could be destabilizing the protein N structure. Based on simulation data only, it is difficult to predict whether the behaviour of R203K is stabilizing or destabilizing.

A small segment (230-364 amino acids, C-terminal domain) of N protein has been experimentally crystallized (PDB: 6ZCO) recently, so we utilized that opportunity to investigate the effect of mutations P344S on the stability and dynamics of protein (Zinzula et al., 2021) . The Rg and SASA values, described ( Figure 7B and C), indicate that mutation leads to decrease in globularity of protein structure and statistically have less number of hydrogen bonds compared to WT. The mutants appear to have continuously higher fluctuations in RMSF plots (Supplementary Figure S2) , indicating that the mutations had an effect on the flexibility of protein. To further investigate the structural basis of flexibility, we overlaid the structure populated at the interval of every 10 ns during the simulation of N protein WT and mutant and we observed that the mutation of proline at 344 lead to structural fluctuations at remote site and b-sheet region (a.a.: 318-334) changes its orientations dramatically during the simulation ( Figure 7A ). The results from the present work confirm that P344S mutations cause the structural destabilization of N protein leading to loss of protein compactness.

For MD simulation, we have taken the structural component (a.a.: 1-698) from the experimentally solved S protein (PDB: 6VYB). The initial structure (0 ns) and final structure (50 ns) of S protein variants are presented in Figure 8A . The calculated SASA, Rg, and hydrogen bonds were plotted against time of simulation and are presented in Figure 8B -D. It appears that Rg and SASA of WT is lower as compared to D614G mutant whereas the total number of intramolecular hydrogen bonds, calculated for the entire trajectory, are high in number in WT as compared to mutant. These results further provide supportive evidence that the structure of WT is compact compared to D614G mutant. The observed high value of RMSD (Supplementary Figure S3A) for WT structure indicates the conformational dynamics associated with structural transition of open state (initial model) to compact state during the course of simulation. However, initial model structure for mutant protein is opening further only slightly, during the course of simulation thereby resulting in less deviation as compared to WT structure.

The dynamic behaviour of individual amino acid residues for WT and mutant was determined in terms of RMSF values, denoted by the peak elevation in Supplementary Figure S3B . RMSF plot indicated similar residue fluctuation profile for WT and mutant. We have highlighted the dynamic region in the model structure according to RMSF value (Supplementary Figure S3C ). In conclusion, WT S protein was observed to adopt a compact fold in the latter half of the trajectory via a large conformational change in a process to adopt compact state during simulation from initial structure. However, the mutant was observed to adopt more open conformation compared to WT. Calculated value of total number of intramolecular interactions (Supplementary Table S5 ), further supports destabilizing effects of mutation on protein structure, because mutant protein structure at 50 ns comprises less number of intramolecular interactions compared to WT. The present results suggest that WT is stable and mutation D614G caused a destabilization leading to a loss of protein compactness. Previous study reported that the open conformation of spike protein increases the infectivity of virus (Wrapp et al., 2020) . In our study, MD data shows that the mutant structure has open conformation at the end of simulation compared to wild-type structure and hence it was predicted destabilizing and shows the higher infectivity. 

Since the first cause reported on 1st December 2019 from China, SARS-CoV-2 has infected nearly all the countries throughout the globe. This alarming rate of spread of SARS-CoV-2 has prompted many countries to declare a complete shutdown, an event that was not witnessed till now. However, as the scientists from around the world are working hard to develop antivirals, the virus is spreading and accumulating mutations which might help it to combat host immune responses. Hence, it becomes imperative to gain understanding into viral evolution for the development of effective antivirals. In this study, we performed meta-analysis of SARS-CoV-2 genomes from around the world to identify highly significant mutations in the viral structural proteins. Our analysis identified sixty-one highly significant mutations spanning complete viral genomes. Further analysis revealed that most of these mutations were concentrated around few regions of viral genomes including NS3, RdRp and N protein.

Since integrity of structural proteins is vital for immune responses, hence we sought to study the putative effect of highly significant mutations on viral structural proteins. Meta-CATS algorithm identified one mutation in spike protein, five mutations in N protein as well as one mutation in envelope protein. We then performed molecular simulations on these mutations to predict their effect on stability of parent protein. Among seven mutations in three proteins (E, N and S), four mutations (L13H in E, R203K and P344S in N and D614G in S) were predicted neutral while, remaining three (P13L, S197L and G204R) found to have deleterious effects on protein functions. Discrete molecular dynamics and allatom molecular dynamics results revealed that three mutations (P13L, S197L and R203K) in N protein give stability to the parent proteins. However, two mutations (G204R and P344S) in N protein and D614G in S protein destabilize the parent proteins. Though our analysis of D614G mutation in Spike protein predicted destabilization of the protein, several experimental studies have shown that the mutation contributes towards enhanced infectivity of the virus. D614G mutation opens up the spike protein which though might lead to destabilization of protein but might lead to enhanced viral infections. Hence, it seems that there is a trade-off between the stabilization of protein structure versus the infectivity of the virus. Therefore, our studies provide vital clues about the effect of highly significant mutations on the structure of viral structural proteins.

wrote the first draft of the manuscript. VS, RPB, SBR, PA and KRS reviewed the data and finalized the manuscript.

No potential conflict of interest was reported by the author(s).

This work was supported by grants from Department of Science and Technology (DST), India and University Grants Commission (UGC), India. 

Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX, 1-2, 19-25

Compound heterozygous mutations affect protein folding and function in patients with congenital sucrase-isomaltase deficiency

The proximal origin of SARS-CoV-2

PredictSNP: Robust and accurate consensus classifier for prediction of disease-related mutations

Global spread of SARS-CoV-2 subtype with spike protein mutation D614G is shaped by human genomic variations that regulate expression of TMPRSS2 and MX1 genes

Sars-CoV-2 envelope and membrane proteins: structural differences linked to virus characteristics?

CMT disease severity correlates with mutation-induced open conformation of histidyl-tRNA synthetase, not aminoacylation loss, in patient cells

A distal mutation perturbs dynamic amino acid networks in dihydrofolate reductase

CHARMM: A program for macromolecular energy, minimization, and dynamics calculations

I-Mutant2.0: Predicting stability changes upon mutation from the protein sequence or structure

Bioinformatics for personal genome interpretation

Mutations in structural proteins of SARS-CoV-2 and potential implications for the ongoing outbreak of infection in India

Screening of mutations affecting protein stability and dynamics of FGFR1-A simulation analysis

Intrinsic dynamics of an enzyme underlies catalysis

Structural impact of mutation D614G in SARS-CoV-2 spike protein: Enhanced infectivity and therapeutic opportunity

ModLoop: Automated modeling of loops in protein structures

Phylogenetic network analysis of SARS-CoV-2 genomes

A coarse-grained elastic network atom contact model and its use in the simulation of protein dynamics and the prediction of the effect of mutations

D614G mutation alters SARS-CoV-2 spike conformation and enhances protease cleavage at the S1/S2 junction

Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis

MDWeb and MDMoby: An integrated web-based platform for molecular dynamics simulations

SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo

CHARMM36m: An improved force field for folded and intrinsically disordered proteins

Arpeggio: A web server for calculating and visualising interatomic interactions in protein structures

Crystal structure of SARS-CoV-2 nucleocapsid protein RNA binding domain reveals potential unique drug targeting sites

Tracking changes in SARS-CoV-2 spike: Evidence that D614G increases infectivity of the COVID-19 virus

Large scale genomic analysis of 3067 SARS-CoV-2 genomes reveals a clonal geo-distribution and a rich genetic variations of hotspots mutations

SAAFEC-SEQ: A sequence-based method for predicting the effect of single point mutations on protein thermodynamic stability

Effects of core mutations on the folding of a beta-sheet protein: Implications for backbone organization in the I-state

Effects of mutations on the thermodynamics of a protein folding reaction: Implications for the mechanism of formation of the intermediate and transition states

Structure and drug binding of the SARS-CoV-2 envelope protein transmembrane domain in lipid bilayers

Middle East respiratory syndrome

Different mutations in SARS-CoV-2 associate with severe and mild outcome

Genomic mutations and changes in protein secondary structure and solvent accessibility of SARS-CoV-2 (COVID-19 virus)

Complexities of viral mutation rates

Structures of the SARS -CoV-2 nucleocapsid and their perspectives for drug design

Genetic diversity and evolution of SARS-CoV-2. Infection

Metadata-driven comparative analysis tool for sequences (meta-CATS): An automated process for identifying significant sequence variations that correlate with virus attributes

ViPR: An open bioinformatics database and analysis resource for virology research

DUET: A server for predicting effects of mutations on protein stability using an integrated computational approach

MCSM: Predicting the effects of mutations in proteins using graph-based signatures

Spike mutation D614G alters SARS-CoV-2 fitness

Applications of discrete molecular dynamics in biology and medicine

Mutational insights into the envelope protein of SARS-CoV-2

Evolutionary dynamics of SARS-CoV-2 nucleocapsid protein and its consequences

Automated minimization of steric clashes in protein structures

DynaMut: Predicting the impact of mutations on protein conformation, flexibility and stability

Flexible, functional, and familiar: Characteristics of SARS-CoV-2 spike protein evolution

Structural insight into the role of novel SARSCoV-2 E protein: A potential target for vaccine development and other therapeutic strategies

The PyMOL molecular graphics system

SWISS-MODEL: An automated protein homology-modeling server

Mutational signatures in countries affected by SARS-CoV-2: Implications in host-pathogen interactome

Structure function investigation of a new VUI-202012/01 SARS-CoV-2 variant

New pathways of mutational change in SARS-CoV-2 proteomes involve regions of intrinsic disorder important for virus replication and release

A guide to web tools to prioritize candidate genes

Changes in lysozyme flexibility upon mutation are frequent, large and long-ranged

Evaluating the effects of SARS-CoV-2 spike mutation D614G on transmissibility and pathogenicity

Structure, function, and antigenicity of the SARS-CoV

SARS-CoV-2 nucleocapsid protein undergoes liquid -liquid phase separation into stress granules through its N-terminal intrinsically disordered region

Decoding SARS-CoV-2 transmission, evolution and ramification on COVID-19 diagnosis, vaccine, and medicine

DeltaF508 mutation increases conformational flexibility of CFTR protein

SDM -A server for predicting effects of mutations on protein stability and malfunction

Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation

A new coronavirus associated with human respiratory disease in China

Catalytic inactivation of human phospholipase D2 by a naturally occurring Gly901Asp mutation

The deadly coronaviruses: The 2003 SARS pandemic and the 2020 novel coronavirus epidemic in China

Patient-derived SARS-CoV-2 mutations impact viral replication dynamics and infectivity in vitro and with clinical implications in vivo

SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity

Electrostatic interactions in protein structure, folding, binding, and condensation

A pneumonia outbreak associated with a new coronavirus of probable bat origin

High-resolution structure and biophysical characterization of the nucleocapsid phosphoprotein dimerization domain from the Covid-19 severe acute respiratory syndrome coronavirus 2

NP is thankful to UGC for PhD fellowship and PS is thankful to CSIR for PhD fellowship. VS received research grant from UGC, Govt. of India. SBR is thankful to his Chemistry Department for providing computational and infrastructure facilities.