key: cord-0683267-8ow952d8 authors: Parvez, Md Sorwer Alam; Rahman, Mohammad Mahfujur; Morshed, Md Niaz; Rahman, Dolilur; Anwar, Saeed; Hosen, Mohammad Jakir title: Genetic analysis of SARS-CoV-2 isolates collected from Bangladesh: insights into the origin, mutation spectrum, and possible pathomechanism date: 2020-06-07 journal: bioRxiv DOI: 10.1101/2020.06.07.138800 sha: ac8f5326e6b72705a5c97752660cd1829bbb1882 doc_id: 683267 cord_uid: 8ow952d8 As the coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), rages across the world, killing hundreds of thousands and infecting millions, researchers are racing against time to elucidate the viral genome. Some Bangladeshi institutes are also in this race, sequenced a few isolates of the virus collected from Bangladesh. Here, we present a genomic analysis of 14 isolates. The analysis revealed that SARS-CoV-2 isolates sequenced from Dhaka and Chittagong were the lineage of Europe and the Middle East, respectively. Our analysis identified a total of 42 mutations, including three large deletions, half of which were synonymous. Most of the missense mutations in Bangladeshi isolates found to have weak effects on the pathogenesis. Some mutations may lead the virus to be less pathogenic than the other countries. Molecular docking analysis to evaluate the effect of the mutations on the interaction between the viral spike proteins and the human ACE2 receptor, though no significant interaction was observed. This study provides some preliminary insights into the origin of Bangladeshi SARS-CoV-2 isolates, mutation spectrum and its possible pathomechanism, which may give an essential clue for designing therapeutics and management of COVID-19 in Bangladesh. Till 20 th May, genome sequence of 16 Bangladeshi SARS-CoV-2 isolates were found deposited in the 36 GSAID, however genome sequence of 2 isolates were found incomplete. Thus, all 14 complete genome 37 sequences of the reported isolates of SARS-CoV-2 in Bangladesh were retrieved from the GISAID 38 database (https://www.gisaid.org/). As many of the Bangladeshi people return during the COVID-19 39 outbreak mainly from China, India, Saudi Arabia, Spain, Italy, Japan, Qatar, Canada, Kuwait, USA, 40 France, Sweden, and Switzerland, the first deposited genome sequence of those countries were also 41 retrieved. Sequence information of the first isolate collected from China was considered as a reference for 42 further analysis. We performed multiple sequence alignment using Clustal Omega [15, 16] , and the sequence of the strain 45 China [EPI_ISL_402124 ] was used as a reference genome. The alignment file was analyzed using 46 MVIEW program of Clustal Omega [17] . Only variations in the coding regions were analyzed in this study. FGENESV of SoftBerry (http://linux1.softberry.com/berry.phtml), which is a Trained Pattern/Markov 49 chain-based viral gene prediction tools, was adopted for the prediction of the genes as well as the 50 proteins from the viral genomes. Each predicted protein (for each viral genomes) was identified using the The structural and functional effects of the missense variants, along with the stability change, were 61 analyzed using different prediction tools. I-mutant was employed to analyze the stability change where all the parameters were kept in default [19] . Additionally, Mutpred2 was adopted to predict the molecular consequences and functional effect of these mutations [20] . Retrieved Genome Sequence of the SARS-CoV-2 80 A total number of 14 complete genome sequences of the SARS-CoV-2 isolates from Bangladesh and 12 81 genome sequence from the isolates of other countries (China, India, Saudi Arabia, Spain, Italy, Japan, Qatar, Canada, Kuwait, USA, France, Sweden, and Switzerland) have been retrieved from GSAID. The 83 strain of Wuhan accession number with EPI_ISL_402124 was considered as the reference strain. Phylogenetic Tree Analysis Phylogenetic tree analysis revealed that all the selected Bangladeshi isolates could be divided into two 86 main groups, where one group shared a common ancestor with Saudi Arabia (Fig 1) . The other group 87 found to have a similarity with the strain from Switzerland, and it could be subdivided into two groups. In (Table 3) . Additionally, three mutations 113 occurring in surface glycoprotein, ORF3a and ORF6 were predicted to alter the molecular consequences, 114 including loss of sulfation in surface glycoprotein and loss of proteolytic cleavage in ORF3a and loss of 115 allosteric site in ORF6 (Table 4 and Supplementary Table 1 ). In total, three models were generated using the template PDB ID: 6VSB; one model for the spike protein 118 of reference strain, and the two others were for two different mutant isolates from Bangladesh (Fig 3) . Two types of mutations were found in the spike proteins of all Bangladeshi isolates, where most of the 120 isolates were found to contain a substitution of D623G. Only one strain, EPI_ISL_445214, found to have assessment scores of these three models were mostly similar to the template, which provided the spike proteins along with mutant models and the human ACE2 receptor. Interestingly, this molecular 127 docking analysis revealed that the docking score for the three models against the human ACE2 receptor 128 was similar, and it was -244.42; mutation in the spike proteins do not hamper binding with ACE2 receptor. For three spike protein models, this study found that a domain of spike protein instead of whole protein, 130 amino acid ranging from 345 to 527, was involved in the interactions. This domain was conserved in all 131 isolates resulting in similar interactions with ACE2 (Fig 4) . EPI_ISL_450340; S10: EPI_ISL_4503441; S11: EPI_ISL_450342; S12: EPI_ISL_450343; S13: EPI_ISL_450344; S14: EPI_ISL_450345; M: Missing) EPI_ISL_450339; S9: EPI_ISL_450340; S10: EPI_ISL_4503441; S11: EPI_ISL_450342; S12: EPI_ISL_450343; S13: EPI_ISL_450344; S14: EPI_ISL_450345) Table 3 : Prediction of the mutational effects on the structural stability. Table 4 : Prediction of the effects of the mutation on the molecular consequences. Table 1 : Mutpred score for all mutations. Scores of < 0.5 indicate no effect on molecular consequences. Table 1 : Predicted number of genes and identity compared to the reference strain. (Legends: S1: EPI_ISL_437912; S2: EPI_ISL_445213; S3: EPI_ISL_445214; S4: EPI_ISL_445215; S5: EPI_ISL_445216; S6: EPI_ISL_445217; S7: EPI_ISL_445244; S8: EPI_ISL_450339; S9: EPI_ISL_450340; S10: EPI_ISL_4503441; S11: EPI_ISL_450342; S12: EPI_ISL_450343; S13: EPI_ISL_450344; S14: EPI_ISL_450345; M: Missing) No Protein S1 S2 S3 S4 S5 S6 S7 S8 S9 S11 S11 S12 S13 S14 COVID-19 and the cardiovascular 206 system Structural basis 208 for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir The hallmarks of COVID-19 disease Bangladesh Expands Covid-19 Testing