key: cord-0941761-viy4pv70 authors: Rahman, Sadniman; Shishir, Md. Asaduzzaman; Hosen, Md Ismail; Khan, Miftahul Jannat; Arefin, Ashiqul; Khandaker, Ashfaqul Muid title: The status and analysis of common mutations found in the SARS-CoV-2 whole genome sequences from Bangladesh date: 2022-04-04 journal: Gene Rep DOI: 10.1016/j.genrep.2022.101608 sha: e6673e7ab940ae57b99ccb818365b649b24d8bac doc_id: 941761 cord_uid: viy4pv70 Rapid emergence of covid-19 variants by continuous mutation made the world experience continuous waves of infections and as a result, a huge number of death-toll recorded so far. It is, therefore, very important to investigate the diversity and nature of the mutations in the SARS-CoV-2 genomes. In this study, the common mutations occurred in the whole genome sequences of SARS-CoV-2 variants of Bangladesh in a certain timeline were analyzed to better understand its status. Hence, a total of 78 complete genome sequences available in the NCBI database were obtained, aligned and further analyzed. Scattered Single Nucleotide Polymorphisms (SNPs) were identified throughout the genome of variants and common SNPs such as: 241:C>T in the 5′UTR of Open Reading Frame 1A (ORF1A), 3037: C>T in Non-structural Protein 3 (NSP3), 14,408: C>T in ORF6 and 23,402: A>G, 23,403: A>G in Spike Protein (S) were observed, but all of them were synonymous mutations. About 97% of the studied genomes showed a block of tri-nucleotide alteration (GGG>AAC), the most common non-synonymous mutation in the 28,881–28,883 location of the genome. This block results in two amino acid changes (203–204: RG>KR) in the SR rich motif of the nucleocapsid (N) protein of SARS-CoV-2, introducing a lysine in between serine and arginine. The N protein structure of the mutant was predicted through protein modeling. However, no observable difference was found between the mutant and the reference (Wuhan) protein. Further, the protein stability changes upon mutations were analyzed using the I-Mutant2.0 tool. The alteration of the arginine to lysine at the amino acid position 203, showed reduction of entropy, suggesting a possible impact on the overall stability of the N protein. The estimation of the non-synonymous to synonymous substitution ratio (dN/dS) were analyzed for the common mutations and the results showed that the overall mean distance among the N-protein variants were statistically significant, supporting the non-synonymous nature of the mutations. The phylogenetic analysis of the selected 78 genomes, compared with the most common genomic variants of this virus across the globe showed a distinct cluster for the analyzed Bangladeshi sequences. Further studies are warranted for conferring any plausible association of these mutations with the clinical manifestation. SARS-CoV-2, the causative agent of COVID-19 infection, is a faster spreading pathogenic virus than the earlier SARS and MERS coronaviruses and belongs to the βcoronavirus genera (Naqvi et al., 2020) . SARS-CoV-2 pathogenesis involves both the innate as well as the adaptive immune system (Morse et al., 2020) leading to the activation of signaling cascades, culminating in the release of cytokines, and chemokines and causes the recruitment of immune cells to the site of infection (Fung and Liu, 2014) . And the dysregulation of the host's immune response leads to excessive inflammation, altered adaptive immune response, and sometimes even to death (Moens and Meyts, 2020) . Furthermore, emergence of new variants due to the mutation in the viral genome is facilitating newer clinical manifestations (Bakhshandeh et al., 2021) . Although most mutations in the SARS-CoV-2 genome are predicted to be very insignificant, a small proportion might affect the functional properties, modify the infectivity, severity of disease or interactions with host immunity (Harvey et al., 2021) . The complete genome of SARS-CoV-2 is about 29.9 kb (Wuhan variant) with a GC content of 38% and composed of 12 functional open reading frames (ORFs) (Khailany et al., 2020; Naqvi et al., 2020) . The ORF1a and ORF1b (5`-3`) encode 16 non-structural proteins (NSP1-NSP16), i.e. polyproteins (Alanagreh et al., 2020) among which NSP3 (4955~5900 bp) and NSP5 (10055~10977 bp) encode for proteases ( Fig.1) (Davies et al., 2021; Zhou et al., 2021; Sabino et al., 2021; McCallum et al., 2021; Adam 2021) . The first positive case of SARS-CoV-2 infection in Bangladesh was detected through RT-PCR assays in three Bangladeshi individuals on 07 th March, 2020 (Anwar et al., 2020 . Since then, there are certain reports on the genome analysis of the SARS-CoV-2 from Bangladesh, the physiological conditions of the patients, association of the comorbidities to the severity as well as the comparison of global and local mutations (Hasan et al., 2021; Mannan et al., 2021; Rahman et al., 2021) . In this present study, we specifically monitored and analyzed 78 curated whole-genome sequences of SARS-CoV-2 submitted at the NCBI genome databases from Bangladesh to understand the commonly found mutations and the nature of those mutations. Thus, understanding the nature of common mutations in a timeline will help in analyzing the diverse SARS-CoV-2 genomes in the country. This study has been conducted based on the analysis of the whole genome sequences of the and per-residue model quality was assessed using the QMEAN scoring function. Further, the I-Mutant2.0: tool was applied for predicting the stability changes of the N-protein upon mutations (Capriotti et al., 2005) . The phylogenetic analysis and the difference between the nonsynonymous and synonymous distances (dN-dS) per site from averaging over all sequence pairs of each gene were calculated using the MEGA X. The dN-dS analyses were conducted using the Nei-Gojobori model. The genetic relatedness of the Corona virus strains of Bangladesh was estimated with other variants by using the Neighbor-Joining method (Saitou et al., 1987) . The bootstrap consensus tree inferred from 1000 replicates (Felsenstein J., 1985) . is taken to represent the evolutionary history of the taxa analyzed (Felsenstein J., 1985) . The evolutionary distances were computed using the maximum composite likelihood method (Felsenstein J., 1985) . This analysis involved After analyzing all 78 complete genome sequences of SARS-CoV-2 submitted from Bangladesh, a bloc of tri-nucleotide of GGG>AAC (triple base mutation) was most commonly observed in the 28881-28883 location of the genome as missense in nature (non-synonymous) (Table 1) . However, other mutations in the genome were found as single nucleotide polymorphism (SNPs), among them some were also common but synonymous mutations, such as: 241:C>T in the 5`UTR of ORF1A, 3037: C>T in 'NSP3' and 14408: C>T in ORF6 (Table 1) . The A > G mutations located in the Spike Glycoprotein of the virus at positions 23402 and 23403 were also J o u r n a l P r e -p r o o f very frequent (98.36% and 100% respectively, Table 1 ). However, these were synonymous mutations with no structural implications. Phylogenetic analysis of the whole genome sequences of the 71 SARS-CoV-2 sequences (61 Bangladeshi, 1 Wuhan and 9 most common variants) showed that the strains isolated from Bangladesh were more closely related to Wuhan variant. The tree generated two main clusters, cluster 1 and cluster 2. Cluster 1 was comprised of the variants emerged later viz. Gamma, Iota, Mu, Kappa, Beta, Delta, Alpha, Eta, Lambda variants. All the analyzed sequences from Bangladesh were in cluster 2 along with the variant from Wuhan. Cluster 2 formed five subclusters: sub-cluster 1 (SC1), SC2, SC3, SC4 and SC5. Since the strains were also labeled with their times, it could be speculated that the minimum mutations occurred during the month of October, 2020 compared to the month of June, July and August ( Fig. 2A) . The tree generated from the estimation of genetic relatedness among the initially prevailing Moreover, the strains of June 2020 could be classified into three types which continued to circulate till January, May and July 2021. In this current study we especially focused on the most prevalent non-synonymous mutation We performed dN-dS analysis for estimating the non-synonymous to synonymous substitution ratio (dN/dS) for the N-, S-and the NSP 3 genes. Our results for dN-dS analysis of the N-protein of the 61 analyzed sequences showed that the overall dN-dS p-value for all the three genes to be <1.00, indicating a constraint selection (amino acid changes disfavored, Table 2 ). While looking at the surrounding sequence of these amino acids (Fig. 3B ), it appears that the mutation discontinues a serine-arginine (S-R) dipeptide by introducing a lysine in-between them which is a basic and polar hydrophilic charged (+) amino acid. Basically, arginine provides the protein structure with more stability than lysine (Sokalingam et al., 2012) . So, the incorporation of lysine in the motif could have impacts on the overall distinctive properties of the protein as reported before (Tylor et al., 2009) . Especially, the serine-arginine dipeptide disordering may hamper the phosphorylation of the SR-rich domain. This phosphorylation event is critical for cellular localization and regulation of the N protein synthesis (Maitra et al., 2020) . Notably the GSK3 (glycogen synthase kinase 3) phosphorylation site at Ser202 and a CDK (cyclin dependent kinase) phosphorylation site at Ser206 are in the vicinity of our identified block mutation. We thought that this interaction would contribute to reduction of conformational entropy and might affect protein structure. In this study, the change of N protein stability upon mutations at the amino acid positions 203-204 (RG>KR) was predicted using I-Mutant 2.0 tool and found that the incorporation of Lysine in 203 amino acid position predicted a reduction of entropy (∆∆G= -2.26) and thus affecting its stability ( Table 3 ). The structure of the protein with the block mutation was predicted and compared with the reference sequence (Wuhan variant) by SWISS-MODEL, a protein modeling tool (Fig. 3C) . However, no observable difference was found in the block mutation area of the predicted N protein (GGG>AAC). On the other hand, Maitra et al., 2020 found that three miRNAs binding in the mutation site 28881-3 can regulate the mutant pathogenicity. Taken together with these data, we suggest that the block mutation may regulate the stability and function of N protein rather than the structure of the protein. expression was not investigated in this study, further experiments are required for the prediction. The GGG>AAC non-synonymous mutation remained most frequent in the Bangladeshi population during the study period. We predicted that the mutation is responsible for the reduced stability of the N protein due to the intercalation of the amino acid. However, due to the lack of experimental evidence, many questions regarding the influence of these mutations still remain elusive. The phylogenetic tree revealed that during the period of June 2020 to October 2020, the causative strains of infection in Bangladesh were very similar to the Wuhan variant ( Fig. 2A) . The sequences from the year 2020 and 2021 resulted in two distinct clusters-C1 and C2 (Fig. 2B ). Cluster 2 produced two separate sub-clusters C2a and C2b. J o u r n a l P r e -p r o o f ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Mutations in SARS-CoV-2 viral RNA identified in Eastern India: Possible implications for the ongoing outbreak in India and impact on viral structure and host susceptibility A multi-centre, cross-sectional study on coronavirus disease 2019 in Bangladesh: clinical epidemiology and short-term outcomes in recovered individuals SARS-CoV-2 immune evasion by variant B.1.427/B.1.429 Recent human genetic errors of innate immunity leading to increased susceptibility to infection Learning from the Past: Possible Urgent Prevention and Treatment Options for Severe Acute Respiratory Infections Caused by 2019-nCoV Insights into SARS-CoV-2 genome, structure, evolution, pathogenesis and therapies: Structural genomics approach J o u r n a l P r e -p r o o f Highlights:1. The influence of mutations in the functionality of SARS-CoV-2variants of Bangladesh (for a certain time period). 2. A common triple base mutation (block mutation) was identified, an extrapolated cause of instability of the N-protein. 3. Random other silent point mutations were also reported.J o u r n a l P r e -p r o o f