key: cord-1004707-e2jip76a authors: Ikram, Aqsa; Naz, Anam; Awan, Faryal Mehwish; Rauff, Bisma; Obaid, Ayesha; Hakim, Mohamad S.; Malik, Arif title: The impact of mutations on the structural and functional properties of SARS-CoV-2 proteins: A comprehensive bioinformatics analysis date: 2021-03-01 journal: bioRxiv DOI: 10.1101/2021.03.01.433340 sha: fe44b42cdf75d61a82ee151fec15e8cba549cf40 doc_id: 1004707 cord_uid: e2jip76a An in-depth analysis of first wave SARS-CoV-2 genome is required to identify various mutations that significantly affect viral fitness. In the present study, we have performed comprehensive in-silico mutational analysis of 3C-like protease (3CLpro), RNA dependent RNA polymerase (RdRp), and spike (S) proteins with the aim of gaining important insights into first wave virus mutations and their functional and structural impact on SARS-CoV-2 proteins. Our integrated analysis gathered 3465 SARS-CoV-2 sequences and identified 92 mutations in S, 37 in RdRp, and 11 in 3CLpro regions. The impact of those mutations was also investigated using various in silico approaches. Among these 32 mutations in S, 15 in RdRp, and 3 in 3CLpro proteins are found to be deleterious in nature and could alter the structural and functional behavior of the encoded proteins. D614G mutation in spike and P323L in RdRp are the globally dominant variants with a high frequency. Most of them have also been found in the binding moiety of the viral proteins which determine their critical involvement in the host-pathogen interactions and drug targets. The findings of the current study may facilitate better understanding of COVID-19 diagnostics, vaccines, and therapeutics. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), among the seventh known 46 human infecting coronaviruses, is a highly transmissible and pathogenic virus [1] . It For DUET and Dynamut prediction, 3D structure of RdRp and S were predicted using i-TASSER 118 while crystal structure (5re5) of 3CLpro was retrieved from protein data bank (PDB). 119 120 In order to recapitulate the predictive results of above-mentioned tools, a scoring criterion was set 121 (0-6). If a mutation were predicted to be "harmless" or "neutral" by all tools, it would score 0. 122 Though, it would get a score if any of the tool predicted it as a "harmful" or "Pathogenic" mutation 123 respective of the number of tools predicting it. Mutations predicted by four or more tools (thus with 124 score ≥4) were then screened for further evaluations. (Table 139 1 and Figure 1 ). These mutations were found to be in a wide range of countries, including the 190 The 3D structure SARS-CoV-2 protease was retrieved from PDB with PDB ID 5RE5. For S and 192 RdRp proteins, top i-TASSER predicted models were selected on the basis of C-score. RAMPAGE and ProSA web servers were further used to verify the reliability of predicted models. The results of the predicted 3D RdRp model showed 83% of residues in favored region, 10.8% in 195 additional allowed region, and 6.2% in outlier region. Tertiary structure of S protein showed 75.2% in favored region, 14.8% in allowed regions, and 10% in outlier regions that highly indicates a 197 good stereo-chemical quality of the predicted structures. By using these 3D structures, COACH 198 and CASTP servers predicted the possible ligand binding sites of these proteins. Ligand binding 199 sites predicted by both servers were considered as potential binding sites. It was observed that in Emerging SARS-CoV-273 2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant CoV-2: an emerging coronavirus that causes a global threat We shouldn't worry when a virus mutates during 278 disease outbreaks Mechanisms of viral mutation. Cellular and Molecular Life 280 Sciences Genomic characterization of a novel SARS-CoV-2 Nucleic Acids 284 Research Big data analytics for genomic medicine A comprehensive in silico 288 analysis on the structural and functional impact of SNPs in the congenital heart defects 289 associated with NKX2-5 gene-A molecular dynamic simulation approach BioEdit: an important software for molecular biology SIFT: Predicting amino acid changes that affect protein function. Nucleic 294 Acids Research 0: predicting stability changes upon mutation 296 from the protein sequence or structure Better prediction of functional effects for sequence variants DynaMut: predicting the impact of mutations on protein 301 conformation, flexibility and stability DUET: a server for predicting effects of mutations on 303 protein stability using an integrated computational approach PhD-SNPg: a webserver and lightweight tool for scoring single 306 nucleotide variants Tracking 308 changes in SARS-CoV-2 spike: Evidence that D614G increases infectivity of the COVID-19 309 virus In silico analysis of missense mutations as a first 311 step in functional studies: Examples from two sphingolipidoses A review of SARS-CoV-314 2 and the ongoing clinical trials 263 The author declares no conflict of interest.