key: cord-0830899-bw996apu authors: Hossain, Mohammad Uzzal; Ahammad, Ishtiaque; Hossain Emon, Md. Tabassum; Bhattacharjee, Arittra; Chowdhury, Zeshan Mahmud; Mosaib, Md. Golam; Das, Keshob Chandra; Keya, Chaman Ara; Salimullah, Md. title: Whole Genome Sequencing for Revealing the Point Mutations of SARS-CoV-2 Genome in Bangladeshi Isolates and their Structural Effects on Viral Proteins date: 2020-12-11 journal: bioRxiv DOI: 10.1101/2020.12.05.413377 sha: 9a6b9224fb9aab19c6be5d2b61e8f20f172d9eb2 doc_id: 830899 cord_uid: bw996apu Coronavirus disease-19 (COVID-19) is the recent global pandemic caused by the virus Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). The virus has already killed more than one million people worldwide and billions are at risk of getting infected. As of now, there is neither any drug nor any vaccine in sight with conclusive scientific evidence that it can cure or provide protection against the illness. Since novel coronavirus is a new virus, mining its genome sequence is of crucial importance for drug/vaccine(s) development. Whole genome sequencing is a helpful tool in identifying genetic changes that occur in a virus when it spreads through the population. In this study, we performed complete genome sequencing of SARS-CoV-2 to unveil the genomic variation and indel, if present. We discovered thirteen (13) mutations in Orf1ab, S and N gene where seven (7) of them turned out to be novel mutations from our sequenced isolate. Besides, we found one (1) insertion and seven (7) deletions from the indel analysis among the 323 Bangladeshi isolates. However, the indel did not show any effect on proteins. Our energy minimization analysis showed both stabilizing and destabilizing impact on viral proteins depending on the mutation. Interestingly, all the variants were located in the binding site of the proteins. Furthermore, drug binding analysis revealed marked difference in interacting residues in mutants when compared to the wild type. Our analysis also suggested that eleven (11) mutations could exert damaging effects on their corresponding protein structures. The analysis of SARS-CoV-2 genetic variation and their impacts presented in this study might be helpful in gaining a better understanding of the pathogenesis of this deadly virus. CoVs, SARS-, MERS-, and SARS-CoV-2, and they spread through close contact. The essential multiplication number (R0) of the individual-to-individual spread of SARS-CoV-2 is around 2.6, which implies that the confirmed cases develop at a striking exponential rate. [62] CoVs being 26 to 32 kb long have the biggest RNA viral genome. [63] The SARS-CoV-2 genome share approximately 90% identity with essential enzymes and structural proteins of SARS-CoV. Fundamentally, SARS-CoV-2 contains four basic proteins known as-spike (S), envelope (E), membrane (M), and nucleocapsid (N) proteins. These proteins share high sequence similarity with the sequence of the corresponding proteins in SARS-CoV, and MERS-CoV. Hence, it is vital to scrutinize the SARS-CoV-2 genome to determine why this infection is progressively inclined to be more infectious and lethal than its predecessors. Utilizing Sanger sequencing and cutting-edge whole genome sequencing of SARS-CoV-2 isolates from oropharyngeal samples, we depicted the genomic portraits of two genomes alongside other Bangladeshi strains. [64] In this study, we have analyzed the genomic arrangements of SARS-CoV-2 to identify the mutations found within the genomes and anticipate their effect on the protein structure from a structural biology perspective in order to shed light on the suitable therapeutics against this deadly virus. The patient's oropharyngeal samples SARS-CoV-2/human/BGD/NIB_01/2020 and SARS-CoV-2/human/BGD/NIB-BCSIR_02/2020 were collected using the UTM™ kit containing 1 mL of viral transport media (Copan Diagnostics Inc., Murrieta, CA, USA) on day 7 of the patient's illness with symptoms of cough, mild fever, and throat congestion. The specimen was tested positive for SARS-CoV-2 by real-time reverse transcriptase PCR (rRT-PCR). Then, the viral RNA was extracted directly from the patient's swab using PureLink Viral RNA/DNA Mini Kit (Invitrogen). The viral RNA was then converted into cDNA using SuperScript TM VILO TM cDNA Synthesis Kit (Invitrogen) according to the manufacturer's instructions. The forty eight (48) pair primers were designed to cover the whole genome of the virus by following two conditions: (1) their sequence is conserved among all the available SARS-CoV-2 isolates and (2) the terminal of the amplicons will overlap with neighboring amplicons. The polymerase chain reaction (PCR) was performed and the 48 primers then generated 47 amplicons which were visualized in 1.5% Agarose gel electrophoresis. The amplicons were further purified using Purelink PCR purification kit (ThermoFisher Scientific, USA). These purified amplicons were sequenced using Sanger dideoxy method by "ABI 3500" with BigDye Terminator version 3.1 cycle sequencing kit (Applied Biosystems, USA). The raw reads were assembled by DNA Basic Local Alignment Search Tools (BLAST) [68] was employed to identify possible mutations in Sanger Based sequenced nucleotide sequences. Nucleotide program of blast was selected for this identification. The mapped polymorphisms were investigated for their occurrence frequency worldwide and checked for their profile at CNCB 3 resource. Chimera was utilized to visualize the mapped polymorphisms. Besides, all the available Bangladeshi strains (n=323) of SARS-CoV-2 were retrieved from GISAID [69] and further explored to find out the most common mutations. We ran BLAST for all Bangladeshi SARS CoV-2 genome profiles, to get a view on the pairwise alignment comparison. Deleted sequences were closely investigated in Artemis Comparison Tool window. [70] Gene regions are observed discreetly. To observe the mutational effect of mapped polymorphisms, 3 dimensional (3D) structure was built through the ROBETTA prediction server. [71] Later, the difference of the energy was calculated by GROMACS in both wild type and mutant 3D structures to estimate the structural abnormality and change in stability. [72] Binding site of both wild and mutant structure was analyzed to check whether the amino acid residues are into the binding site region or not. We have performed Molecular Docking using Autodock vina [73] for the analysis of interacting residues to the druggable targets. We retrieved the structures of all the interacting drugs (.pdb files) by virtual screening of the Drugbank database. Then we generated the .pdbqt files of the targets of polymorphisms for docking experiments. Blind docking was performed for the identification of the most effective binding site of these drugs. A grid box parameter for covering the whole protein was set for all docking runs. The workflow of this manuscript has been shown in Fig 1. In case of NIB-01 isolates forty eight (48) We have found thirteen (13) Table 1) . However, the MT568643 whole genome showed no mutation against the reference sequence. From them, six (6) mutations namely 93 rd ; C → T, 2889 th ; C → T, 23255 th ; A→G, 28733 rd ; G → A, 28734 th ; G → A and 28735 th ; G → C were found in CNCB 3 resource where the available mutations of SARS-CoV-2 were enlisted ( Table 1) . These mutations have already been found in different countries where the SARS-Cov-2 has been sequenced. These mutations were mostly found in the United States of America (USA) and the United Kingdom (UK) ( Table 1) . The position of 93 rd ; C → T mutation is located in 5'UTR upstream region. And the position 2889 th ; C → T mutation has shown no change to its protein sequence. The other 23255 th ; A→G, 28733 rd ; G → A, 28734 th ; G → A and 28735 th ; G → C mutations can alter the amino acid sequence and can have the missense effect on the protein ( Table 1) . Apart from these 6 mutations, seven (7) mutations have shown as unique variants against the reference sequence (Table 1 and Fig 2) . There is no report previously having found for these mutations. Besides, we analyzed our assemble genome to look for any insertion/deletions but these two genomes contain no deletions/insertions. However, we have identified one (1) insertion and seven (7) 3) . We additionally scrutinized these deleted regions but we didn't find any domain or motif on this region. Apart from our reported complete genomes, we have also identified the most common mutations occurred in Bangladesh from the complete genomes reported in GSAID database from Bangladesh. These genomes showed three mutations in the positions 14408, 23403 and 28878 compared to reference genome ( Table 1) . We have analyzed the mutational effect of all the mutations. Therefore, the 3D structure was built to explore the mutational effect on the protein structure (Figure 4) . In this case, two types of 3D structure was built i) the structure with wild type residue and ii) the structure with mutant residue. We have performed the energy minimization of both the wild type and mutants. We have found significant differences in the stability of the structure upon mutation ( Table 3) . Mutants 479 th ; T → A and 1015 th ; A → T showed higher energy minimization which predicted these proteins to be more stable than the wild type. All the other protein models based on mutation showed less energy minimization than the wild type protein model. Therefore, these protein structures could be more unstable upon mutation in the protein sequence. The highest difference was observed in the mutation in the 5642 th position; G → T mutation (from -23276.78 kj/mol to -22377.976 kj/mol) ( Table 3) . Afterwards, the binding site was analyzed to determine whether the wild type and the mutant residues fell within the ligand binding site or not. The binding site residues confirmed that all the mutations from the complete genome belonged to the binding site region (Fig 5) . Later, we have performed the drug binding analysis followed by virtual drug screening in DrugBank server. Ivermectin and Remdisivir drugs topped the list of potential drug candidates. We then prepared the protein structures and converted them to .pdbqt format for molecular docking experiment. We identified the binding site region for each protein and set the grid box to allow the drugs only to bind to that specific region. The binding affinity analysis showed that compared to the wild type, Table 4) . It is to be noted that the interaction of residues of wild type protein were found to be different than that of mutant model. Glu37, Glu41, LEU177, GLY180, LEU104, VAL108, HIS110, Glu87, Leu88, Lys141, Tyr154 residues whereas wild type interacted with Leu18, Val28, Glu37, Glu41, His 45, Leu53, Val54, Ile71, Arg73, Val86, , Val121, Leu122, Asp139 (Fig 4) . COVID-19 is highly contagious and the variation in its genome could be a leading reason for this feature. Besides, to understand the origin of the strains, the exploration of the whole-genome sequencing (WGS) data of SARS-CoV-2 strains is highly necessary. [75] Insights into the mutations of SARS-CoV-2 is an important factor in developing therapeutics against the virus. [76]- [78] In this study, we investigated the variation, insertion, and deletion of the Bangladeshi Table 1) . Among the thirteen (13) mutations, six (6) mutations, namely 93 rd ; C → T, reported in different countries according to CNCB 3 database ( Table 1) . All the variations showed a missense effect upon structure except 2889 th ; C → T variant ( Table 1) . The other seven (7) mutations, 479 th ; T → A, 481 th ; C→A; 1015 th ; A → T, 5098 th ; G → T; 5237 th ; C → T, 5642 th ; G → T and 8023 rd ; G → A presented themselves as unique mutations in the MT509958 genome (Table 1 and Figure 2) . Surprisingly, we didn't find any mutation in SARS-CoV-2/human/BGD/NIB-BCSIR_02/2020. We also looked for indel profile of our assembled genomes. However, we did not find any insertion/deletion occurred into the genome. We looked for it in the rest of the genomes of SARS-CoV-2 in Bangladesh. Seven (7) These variant's structures consumed more energy than the wild type structure. The other variants exhibited a decrease in stability ( Table 3) . The binding site of the protein structures was analyzed to look for the location of the relevant amino acid variant. We have found that all the variants were located within the ligand binding site (Figure 5) . Therefore, these residues could be considered very important in terms of ligand/drug binding. Ivermectin and Remdisivir were selected for the drug binding analysis. We performed molecular docking for each of the wild type and mutant structures with these two drugs and analyzed the interactions. We observed that the interacting wild type residues were replaced with different residues in after molecular docking with the drugs (Table 4) . Here, the binding affinity was also found to be different from the wild type structure. It is to be clearly understood that only a single amino acid change from the wild type structure was responsible for these changes. From these analysis, it can be concluded that if any therapeutics are to be applied on these variants, the therapeutics might not work effectively due to the alteration of the residues in the mutant proteins. To reiterate the core of our study, we have performed whole genome sequencing of SARS-CoV-2 to identify genetic variations and then analyzed their impact on the structures of their corresponding proteins. We have also identified the insertions/deletions among all the sequenced Bangladeshi SARS-CoV-2 strains. The energy minimization and the drug binding analysis suggested that the identified mutations might have significant impact on structure and function of their target proteins. Therefore, the present study might be of great interest to the researchers/companies working to develop therapeutics against SARS-CoV-2 as well as gaining fundamental insights into pathogenesis of the virus. Tables: Table 1 : NIB_01 polymorphisms against reference sequence and their mutational effect. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia On the origin and continuing evolution of SARS-CoV-2 The origin and underlying driving forces of the SARS-CoV-2 outbreak COVID-19: a novel zoonotic disease caused by a coronavirus from China: what we know and what we don't China's Response to the COVID-19 Outbreak: A Model for Epidemic Preparedness and Management Detection of SARS-CoV-2 in Different Types of Clinical Specimens The origin of SARS-CoV-2 SARS-CoV-2 Viral Load in Upper Respiratory Specimens of Infected Patients Modeling the Onset of Symptoms of COVID-19 Clinical Characteristics of Coronavirus Disease 2019 in China Clinical Presentation of COVID-19: A Systematic Review Focusing on Upper Airway Symptoms Quantifying additional COVID-19 symptoms will save lives Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Features, Evaluation and Treatment Coronavirus (COVID-19) Age separation dramatically reduces COVID-19 mortality rate in a computational model of a large population Demographic perspectives on the mortality of COVID-19 and other epidemics Insights into the first wave of the COVID-19 pandemic in Bangladesh: Lessons learned from a high-risk country The Effect of Age on Mortality in Patients With COVID-19: A Meta-Analysis With 611,583 Subjects Residential context and COVID-19 mortality among adults aged 70 years and older in Stockholm: a population-based, observational study using individuallevel data COVID-19 mortality risk for older men and women Viral sepsis is a complication in patients with Novel Corona Virus Disease (COVID-19) Recovery from Severe COVID-19: Leveraging the Lessons of Survival from Sepsis Coronavirus Disease 2019 Sepsis: A Nudge Toward Antibiotic Stewardship Sepsis and Coronavirus Disease 2019: Common Features and Anti-Inflammatory Therapeutic Approaches Pathological features of COVID-19-associated myocardial injury: a multicentre cardiovascular pathology study COVID-19 and the heart: What we have learnt so far COVID-19 cardiac injury: Implications for long-term surveillance and outcomes in survivors Cardiac injuries in coronavirus disease 2019 (COVID-19) COVID-19 and cardiac injury: clinical manifestations, biomarkers, mechanisms, diagnosis, treatment, and follow up Coronavirus Disease 2019 (COVID-19) and Cardiac Injury -Reply Cardiac inflammation in COVID-19: Lessons from heart failure The Case Fatality Rate in COVID-19 Patients With Cardiovascular Disease: Global Health Challenge and Paradigm in the Current Pandemic Fatal lymphocytic cardiac damage in coronavirus disease 2019 (COVID-19): autopsy reveals a ferroptosis signature Impact of COVID-19 on chronic cardiovascular patients A current review of COVID-19 for the cardiovascular specialist Clinical Characteristics of 138 Hospitalized Patients with 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China First Case of 2019 Novel Coronavirus in the United States The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study China's practice to prevent and control COVID-19 in the context of large population movement Impact of the COVID-19 pandemic on commodities exports to China Lockdown timing and efficacy in controlling COVID-19 using mobile phone tracking Immediate impact of stay-at-home orders to control COVID-19 transmission on socioeconomic conditions, food insecurity, mental health, and intimate partner violence in Bangladeshi women and their families: an interrupted time series Pathological findings of COVID-19 associated with acute respiratory distress syndrome Effects of air temperature and relative humidity on coronavirus survival on surfaces Morphology, Genome Organization, Replication, and Pathogenesis of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Sars-CoV-2 Envelope and Membrane Proteins: Structural Differences Linked to Virus Characteristics? Coronavirus envelope protein: current knowledge SARS-CoV-2 and Coronavirus Disease 2019: What We Know So Far COVID-19 and SARS-Cov-2 Infection: Pathophysiology and Clinical Effects on the Nervous System Coronaviruses disease 2019 (COVID-19): Causative agent, mental health concerns, and potential management options A close look at the biology of SARS-CoV-2, and the potential influence of weather conditions and seasons on COVID-19 case spread Coronaviruses and SARS-CoV-2: A Brief Overview SARS-CoV-2, SARS-CoV, and MERS-CoV viral load dynamics, duration of viral shedding, and infectiousness: a systematic review and meta-analysis The SARS-CoV-2 outbreak: What we know Comparative Review of SARS-CoV-2, SARS-CoV, MERS-CoV, and Influenza A Respiratory Viruses Comparison of the COVID-2019 (SARS-CoV-2) pathogenesis with SARS-CoV and MERS-CoV infections From SARS and MERS to COVID-19: A brief summary and comparison of severe acute respiratory infections caused by three highly pathogenic human coronaviruses Emerging coronaviruses: Genome structure, replication, and pathogenesis Genome Composition and Divergence of the Novel Coronavirus (2019-nCoV) Originating in China Coronaviruses: An overview of their replication and pathogenesis Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Coding-Complete Genome Sequence of SARS-CoV-2 Isolate from Bangladesh by Sanger Sequencing DNASTAR's Lasergene sequence analysis software Sequencing single-stranded libraries on the Illumina NextSeq 500 platform NCBI BLAST: a better web interface GISAID: Global initiative on sharing all influenza data -from vision to reality ACT: the Artemis comparison tool Protein structure prediction and analysis using the Robetta server GROMACS 3.0: A package for molecular simulation and trajectory analysis AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading Fragment merger: An online tool to merge overlapping long sequence fragments Comparative Genomic Study for Revealing the Complete Scenario of COVID-19 Pandemic in Bangladesh A SARS-CoV-2 vaccine candidate would likely match all currently circulating variants Designing a novel mRNA vaccine against SARS-CoV-2: An immunoinformatics approach Wild Type) -10989.878 kj /mol De-stable 8. 5098 (Mutant) -10847.174 kj /mol 9 Wild Type) Ivermectin -8.7 kcal/mol Leu18 5098 (Mutant) Ivermectin -8.0 kcal/mol LEU133, PRO134, SER178, LEU195, ARG213, ALA214, LEU225 /mol ASP137, THR149, TYR170, VAL173, CYS186, ALA206 Ivermectin -7.1 kcal/mol LEU133, PRO134, SER178, LEU195, ARG213, ALA214, LEU225 mol HIS12, CYS23, VAL33, SER52, LYS61, TYR68, GLY73, VAL104, ILE133 Ivermectin -8.5 kcal/mol PHE69, ASN87, ILE105, LYS119, LEU135, VAL150, ALA151, GLU181, ASP251 kcal/mol PHE69, ASN87, ILE105, LYS119, LEU135, VAL150, ALA151, GLU181, ASP251 Remdisivir -8.3 kcal/mol ASN87, ILE128, ASN234, ILE235, ASP290, PHE329, PHE347, LEU368, ASN394, THR415, GLY416, ASN501