key: cord-0053246-6bx4uzok authors: Sharif, Nadim; Dey, Shuvra K title: Phylogenetic and whole genome analysis of first seven SARS-CoV-2 isolates in Bangladesh date: 2020-11-26 journal: nan DOI: 10.2217/fvl-2020-0201 sha: 8ede1137d11c18f66521dccf9fcb8bea90e93460 doc_id: 53246 cord_uid: 6bx4uzok Aim: Whole genome and peptide mutation analysis can specify effective vaccine and therapeutics against severe acute respiratory coronavirus-2 (SARS-CoV-2). Materials & methods: Whole genome similarity for Bangladeshi SARS-CoV-2 was determined using ClustalW and BLASTn. Phylogenetic analysis was conducted using neighbor-joining method. Results: 100% of isolates in Bangladesh were in the G clade. We found 99.98–100% sequence similarity among Bangladeshi isolates and isolates of England, Greece, USA, Saudi Arabia and India. Deletion of bases at 5′ untranslated region and 3′ untranslated region was detected. Substitution 261 (E→D) at NSP13 and 1109 (F→L) at spike (S) protein were detected. Substitution 377 (D→G) at nucleocapsid with common substitution 614 (D→G) at S were also detected. Conclusion: This study will provide baseline data for development of an effective vaccine or therapeutics against SARS-CoV-2. However, none of them transmitted globally as fast as SARS-CoV-2. About 7.3 million COVID-19 confirmed cases and 413,726 fatalities were reported from more than 210 countries within a short period of 6 months. In December 2019, the first confirmed case of SARS-CoV-2 was reported from Wuhan, China. The case fatality rate increased from 2 to 7% globally [5] . Mode of transmission of SARS-CoV-2 includes contact (direct and indirect), droplets and fomites [6, 7] . Primarily, SARS-CoV-2 infection causes respiratory tract illness with symptoms like SARS-CoV in human. COVID-19 patients develop significant clinical symptoms of the respiratory system [8] . Most common clinical features of COVID-19 are fever, cough, sore throat and shortness of breath [9, 10] . Among newly evolving clinical features, chill, loss of taste or smell, feelings of shaking, headache, rash and muscle pain have appeared in a significant number of patients [9] [10] [11] . About 85% of the infected patients with mild symptoms have recovered. However, acute respiratory syndrome, acute pneumonia, difficulty in breathing, heart failure, kidney failure and failure of multiple organs have been detected in patients with severe COVID-19 [10] [11] [12] . SARS-CoV-2 is one of the largest enveloped RNA viruses. It is a nonsegmented, positive sense, ssRNA virus with a genome size of approximately 30,000 bases in length [2, 12, 13] . The genome also contains characteristics of a 5 cap structure at the upstream region along with a 3 poly (A) tail at the downstream region. Most frequently reported open-reading frames (ORFs) of SARS-CoV-2 genome are 1a, 1b, 3a, 3b, 6, 7a, 7b, 8a, 8b and 9b [14] . Of note, the first two ORFs of SARS-CoV-2 genome -1a and 1b -occupy approximately 20,000 bases (two-third of genome) and encode for nonstructural (nsps) replicase proteins [12, 15] . Among nsps, 16 nonstructural proteins -nsp1 to nsp16 -have been identified with defined functions. Four major structural proteins -spike (S), envelope (E), membrane (M) and nucleocapsid (N) are encoded by later ORFs (∼10,000 bases) [16] . The established genome order of coronavirus is 5 -leader-UTR-replicase-S-E-M-N-3 UTR-poly (A) [12, 15, 17, 18] . Variation in the whole genome of novel coronavirus is frequent [14] . The main aim of the study is to analyze the first seven whole genomes of SARS-CoV-2 in Bangladesh. This study aims to build baseline data on SARS-CoV-2 molecular epidemiology in Bangladesh. Another aim is to determine the evolutionary relationship of Bangladeshi SARS-CoV-2 by phylogenetic analysis. Mutational analysis will also be conducted to determine new or previous mutations that may affect the virus replication process, pathogenesis, proof reading mechanism, vaccine and therapeutic effectiveness. Data were collected from different databases. Whole genomes of SARS-CoV-2 were collected from GISAID (www. gisaid.org) databases. COVID-19 cases and fatalities data were collected from Worldometers (www.worldometers .inf o/coronavirus), Johns Hopkins University COVID-19 database (https://coronavirus.jhu.edu/), Epidemiology, Disease Control and Research (www.iedcr.gov.bd/website/) website and Directorate General of Health Services (ww w.dghs.gov.bd/index.php/bd/) in Bangladesh website. Various environmental data were collected from Bangladesh Meteorological Department (http://live4.bmd.gov.bd/satelite/v/sat inf rared/) and AccuWeather (www.accuweathe r.com). Each month was divided into four equal weeks (W1-W4) except the last week (W5) contained 3 days in January, 1 day in February, 3 days in March, 2 days in April and 3 days in May, respectively. Appropriate institutional review board approval was taken from Biosafety, Biosecurity and Ethical Committee of Jahangirnagar University for this study. Approval number was BBEC, JU/M 2020/COVID-19/(10)1. The nucleotide sequences of whole genome for SARS-CoV-2 were analyzed using Chromas 2.6.5 (Technelysium, Helensvale, Australia). Sequence homology was determined using the BLASTn program. Multiple sequence alignment was conducted in BioEdit 7.2.6 using the ClustalW Multiple Alignment algorithm [19] . Mutational analysis was performed for specific positions of SARS-CoV-2 whole genome and peptide chain. Phylogenetic & evolutionary relationship analysis Phylogenetic and molecular evolutionary relationship analyses of Bangladeshi SARS-CoV-2 were conducted using the whole genome sequences of the references by the MEGA-X software [20] . Phylogenetic tree was generated with 1000 bootstrap replicates of the nucleotide datasets alignment. Neighbor-joining method was used for phylogenetic and molecular evolutionary analysis [21] . Kaimura-2 parameter was used for calculating the genetic distance. Sequence ID of Bangladeshi SARS-CoV-2 used in this study are provided in Table 1 . 23 Mar-W2 New fatality per day On 8 March 2020, the first COVID-19 case was detected in Bangladesh. Till 9 June, with 17% positive cases of total tests, about 71,675 confirmed cases and 975 fatalities were detected in Bangladesh. Most cases (∼35,000) had been reported from Dhaka, the capital of Bangladesh with an average increase rate of 5973 cases/week and 81 fatalities/week in the country. COVID-19 is increasing relatively slowly in Dhaka, which has a population density of 121,720/mi 2 . During 1 February 2020 to 9 June 2020, the minimum temperature average was 20 • C, maximum temperature average was 32.5 • C and mean temperature average was 26.5 • C in Dhaka ( Figure 1 ). Along with environmental factors, genotype variation is the main reason for a low number of COVID-19 cases compared with the expected number in Bangladesh. The first seven whole genomes of SARS-CoV-2 were sequenced from seven COVID-19 patients in Dhaka. Among these patients, 57% (four of seven) were male and 43% (three of seven) were female. Highest frequency (42.9%, three of seven) of SARS-CoV-2 was detected in age group 21-30 years, followed by 14.3% in 11-20 years, 28.6% in 31-40 years and 14.3% in 41-50 years, respectively. The first seven sequenced SARS-CoV-2 in Bangladesh were from the G clade. Compared with 40,000 whole genomes, Bangladeshi SARS-CoV-2 were found to have 100-99.98% sequence similarity with reference sequences. The first sequenced whole genome in Bangladesh, Bangladesh/CHRF had 99.99% sequence similarity with whole genome of SARS-CoV-2 from Germany/FFM3, Sweden/20-07237, USA/NY-NYUMC623, Saudi Arabia/KAUST-Jeddah60, Latvia/011, United Arab Emirates/L0881 and Mexico/CDMX-InDRE 01 ( (Table 3 ). In amino acid peptide sequence, significant and rare point mutations were found in Bangladeshi novel coronavirus genomes. At NSP2 protein, 120 (I→F) was detected for the first time in SARS-CoV-2 genome. While at NSP3, 1184 (Q→H) was very rare and detected in Bangladeshi isolates. At NSP6, 3 (V→M) was detected for the first time in DNAS CPH 467/2020|EPI ISL 445213 globally. Furthermore, at NSP12, 323 (P→L) was found in all of the Bangladeshi isolates. However, at NSP13, 261 (E→D) was detected in CHRF/EPI ISL 437912. Spike Table 3 . Mutational analysis of Bangladeshi severe acute respiratory coronavirus-2 whole genome with reference strain. Bold letter indicates first time/rare. proteins common mutation 614 (D→G) was present in every isolate from Bangladesh. Of note, 1109 (F→L) at S protein was detected for the first time in DNAS CPH 471/2020|EPI ISL 445214 worldwide. At NS3, 172 (G→C) was detected in some of the Bangladeshi isolates. Furthermore, at N protein, 203 (R→K), 204 (G→R) and rare 377 (D→G) were detected in some of isolates (Table 4 ). The novel coronavirus has triggered the ongoing COVID-19 pandemic by infecting over 7.2 million people worldwide [1] . Like other beta coronaviruses, SARS-CoV-2 is also acquiring mutations in its genome and evolving rapidly [14, 22] . Whole genome analysis of SARS-CoV-2 in any region will be essential to understand the infection dynamics of COVID-19. The whole genome of Bangladeshi SARS-CoV-2 has been sequenced recently. In comparison to regions with lower temperature, both case numbers and fatalities are less in Bangladesh. The circulating SARS-CoV-2 isolates in Bangladesh are less deadly than those of the USA and Europe [22] . The first seven isolates in Bangladesh were in the G clade. Among the first seven isolates, 57% (four of seven) were detected in male and 43% (three of seven) female. Furthermore, the highest percentage (42.9%, three of seven) of SARS-CoV-2 in Bangladesh was detected in patients of 21-30 years, followed by 28.6% in 31-40 years, 14.3% in 11-20 years and 14.3% in 41-50 years, respectively. The distribution of gender was similar with previous studies in Europe, China and Asia, but age distribution of COVID-19 patients in Bangladesh was unique [1, 23, 24] . The phylogenetic analysis revealed that the first sequenced SARS-CoV-2 in Bangladesh CHRF|EPI ISL 437912 was closely related with beta coronavirus from the UAE, Latvia, Saudi Arabia, Mexico and the USA and clustered with them. In the BLAST analysis of CHRF|EPI ISL 437912, this study detected 99.99% 10 Whole genome analysis of novel coronavirus is necessary to understand its infectivity, fatality associated with specific variants and to predict any alteration of efficacy of possible drug or vaccine due to target proteins modification of the virus [25, 26] . In whole genome analysis, we detected unique and new point mutations in Bangladeshi novel coronavirus isolates. Numbers of sequences from 5 UTR (1-265) and 3 UTR (29675-29903) regions were missing for six isolates in Bangladesh. Both 5 and 3 ends of the coronavirus are important for regulatory functions of the genome [27] . Deletion at 5 UTR (1-25) of Bangladeshi isolates confirmed the deletion of stem loop-1 (SL1) regions that might cause defective cis-acting elements interaction during the virus replication and RNA synthesis. Besides 5 UTR (1-25) deletion, Akbiomed|EPI ISL 445244 isolate had a number of point synonymous mutations at SL1, SL5A and SL5B regions with one insertion of G between 202 and 203 base in the SL5A region. Furthermore, substitution point mutation 241 (C→T) was common in all Bangladeshi isolates. Deletion and substitution mutations at SL5A and SL5B of 5 UTR are involved in altered efficiency of coronavirus replication and infection pattern by changing interaction of the genome with viral nucleocapsid protein (N) and nsp1 protein [27] . Of note, at 3 UTR (29675-29903) regions, substitution point mutation and deletion were common in SL regions in six Bangladeshi isolates. Along with 5 UTR stem loop structure, 3 UTR stem loop structure regulates viral replication as a cis-acting elements [27] . Alteration or deletion of sequences at 3 UTR (29675-29903) in Bangladeshi isolates indicates changes of coronavirus replication strategy in these regions. In the protein-coding regions, significant substitution point mutation was detected in Bangladeshi isolates. At ORF1ab (266-21555) regions, 1163 (A→T), 3037 (C→T) and 14408 (C→T) were frequent in Bangladeshi isolates. Substitution of isoleucine with phenylalanine at NSP2 120 (I→F) was detected in this study [14, 22] . Another substitution mutation at papain like protease, NSP3 1884 (Q→H), glutamine to histidine, in Bangladeshi isolates was the first to be detected worldwide [14, 22] . Substitution of valine by methionine at replicase nonstructural protein (NSP6), 3 (V→M) was reported from DNAS CPH 467/2020|EPI ISL 445213 in Bangladesh [14, 22] . Furthermore, substitution at NSP12, 323 (P→L), one of the common mutations at RNA dependent RNA polymerase region (RdRp), was also detected in seven isolates. Of note, substitution of glutamate by aspartate at helicase peptide region NSP13 261 (E→D) was detected in CHRF/EPI ISL 437912 in Bangladesh [14, 22] . Substitution at ORF1ab has been reported from coronavirus worldwide [14, 22] . However, different new substitutions at NSP2, NSP3 (papain-like protease), NSP6 (replicase nonstructural protein), NSP12 (RdRp) and NSP13 (helicase) in Bangladeshi isolates reported in this study are involved in altered replication efficiency, peptide processing capability, autophagy strategy and proof reading mechanism during genome duplication [14, 22, 28, 29] . Importantly, we also detected substitution mutations at spike protein regions in Bangladeshi coronavirus isolates. At spike protein, 614 (D→G) substitution of aspartate with glycine was detected in all Bangladeshi isolates that had been previously reported to be associated with high case fatality in Europe [30] . In DNAS CPH 471|EPI ISL 445214, substitution of phenylalanine with leucine at 1109 (F→L) of spike protein was detected in this study. This unique mutation at spike protein may affect the receptor binding and also neutralizing antibody binding with virus particles. Substitution mutation at 25609 (G→T) of ORF3a was detected in three of the seven isolates. Mutation at ORF3a is most frequent in Europe followed by Asia, Oceania and North America, respectively [22, 24] . Alteration of bases at ORF3a regions is also crucial as peptide of this portion is involved with T-cell-mediated immunity [26] . In isolate DNAS CPH 467|EPI ISL 445213, deletion at ORF7a region was detected while numerous point mutations and deletion were detected at ORF7a and ORF7b in isolate DNAS CPH 436|EPI ISL 445217. Mutation in accessory protein ORF7a can lead to several altered pathogenic processes including apoptosis of host cell and inhibition of cellular protein synthesis [30] [31] [32] . In three of seven Bangladeshi isolates, substitution of glycine with cysteine was detected at 172 (G→C) in accessory protein NS3 that had been unique and reported for the first time in England. Furthermore, two frequent point mutation at 203 (R→K) and 204 (G→R) and one rare point mutation at 377 (D→G) in the nucleocapsid (N) region were detected in Bangladeshi isolates. First two mutations of N protein have been reported mostly from England and countries of Europe and the third one has been reported for only ten-times including Bangladeshi isolates [14, 22] . Specific mutations, both deletion and substitution, in Bangladeshi coronavirus isolates were detected at various multiple sites in the genome and in the peptide chain. Of note, several new mutations at ORF1ab regions, specifically in the RdRp and its accessory proteins, will allow the virus to multiply without proof reading that will increase the possibility of accumulating more new mutations in the genome. Furthermore, along with a previous mutation associated with high case fatality, a new mutation at spike protein was also detected that increases the virus' chance of escaping antibodies or drugs targeting the spike protein. To the best of our knowledge, this is one of the first studies to report phylogenetic and genomic analysis of the first seven sequenced novel coronavirus in Bangladesh. This study reported significant new mutations at important sites in the novel coronavirus genome and antigenic peptide regions. These mutations will affect virus replication strategy and antigenic properties that will ultimately change the virus capability to infect and help the virus to escape from antibodies and drugs. This study will be the baseline database of coronavirus genome analysis that will help to predict effective vaccine and drug targets of coronavirus. In a pandemic like COVID-19, whole genome analysis of the pathogen is important in order to understand the transmission and severity of the disease accurately. With limited resources, the number of whole genome analysis in Bangladesh is lower than in other developing countries. This study investigated the total mutation, phylogeny and evolution of the first seven whole genomes of SARS-CoV-2 in Bangladesh. They were closely related with each other and isolates from Germany, the USA, Saudi Arabia, France, Greece and India. Acquisition of unique mutations along with common mutations throughout the genome suggested rapid change of the circulating strains in Bangladesh. This study will provide a baseline to whole genome research of novel coronaviruses in Bangladesh. • Approximately 57% (four of seven) severe acute respiratory coronavirus-2 (SARS-CoV-2) isolates were detected in male and 43% (three of seven) in female patients. • Approximately 100% Bangladeshi SARS-CoV-2 shared 99.98-100% sequence similarity with isolates of Germany, Sweden, the USA, Saudi Arabia, England, Myanmar and India. • Deletion of bases at 5 untranslated region and 3 untranslated region in Bangladeshi isolates was specified. • New substitution 1109 (F→L) with deadly substitution 614 (D→G) at spike protein was detected. • Mutations at RNA-dependent RNA polymerase regions may lead to accumulation of random mutations in the novel coronavirus genome in future. N Sharif contributed to the writing -original draft preparation (lead); methodology (lead); investigation (lead), conceptualization (lead), formal analysis (lead); writing -review and editing (lead), validation (lead) and data curation (lead). SK Dey was responsible for the conceptualization (lead), funding acquisition (lead), project administration (lead), resources (lead) and supervision (lead). Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Origin and evolution of pathogenic coronaviruses The SARS, MERS and novel coronavirus (COVID-19) epidemics, the newest and biggest global health threats: what lessons have we learned? The first complete genome sequences of clinical isolates of human coronavirus 229E 2019-Novel coronavirus (2019-nCoV): estimating the case fatality rate-a word of caution The origin, transmission and clinical therapies on coronavirus disease 2019 (COVID-19) outbreak-an update on the status Knowledge, attitude and practice on prevention of airborne and droplet infections during the outbreak of corona virus among the College Students in University of Bisha, Saudi Arabia Clinical features of patients infected with 2019 novel coronavirus in Wuhan Clinical and virological data of the first cases of COVID-19 in Europe: a case series Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding COVID-19): a review of clinical features, diagnosis, and treatment Emerging coronaviruses: genome structure, replication, and pathogenesis Emerging novel coronavirus (2019-nCoV) -current scenario, evolutionary perspective based on genome analysis and recent developments Genomic characterization of a novel SARS-CoV-2 Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China Structural proteins in severe acute respiratory syndrome coronavirus-2 Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan Characterization of a novel coronavirus associated with severe acute respiratory syndrome BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT MEGA X: molecular evolutionary genetics analysis across computing platforms Prospects for inferring very large phylogenies by using the neighbor-joining method Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant Global epidemiology of coronavirus disease 2019: disease incidence, daily cumulative index, mortality, and their association with country healthcare resources and economic status Whole genome and phylogenetic analysis of two SARS-CoV-2 strains isolated in Italy in SARS-CoV-2 viral spike G614 mutation exhibits higher case fatality rate Emergence of drift variants that may affect COVID-19 vaccine development and antibody treatment The structure and functions of coronavirus genomic 3 and 5 ends Structural and biochemical characterization of nsp12-nsp7-nsp8 core polymerase complex from SARS-CoV-2 Functional studies of the coronavirus nonstructural proteins SARS-CoV-2 proteins (version 2020.2) in the IUPHAR/BPS guide to pharmacology database. IUPHAR/BPS Guide to Pharmacology CITE SARS-CoV-2-encoded nucleocapsid protein acts as a viral suppressor of RNA interference in cells The proteins of severe acute respiratory syndrome coronavirus-2 (SARS CoV-2 or n-COV19), the cause of COVID-19 The authors are thankful to the Faculty members of Department of Microbiology, Jahangirnagar University for their support.The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.No writing assistance was utilized in the production of this manuscript. The authors state that they have obtained appropriate institutional review board approval or have followed the principles outlined in the Declaration of Helsinki for all human or animal experimental investigations.