key: cord-0428275-ygbku1xi authors: Zainulabid, U. A.; Mat Yassim, A. S.; Soffian, S. N.; Mohd Ibrahim, M. S.; Kamarudin, N.; Kamarulzaman, M. N.; Hin, H. S.; Ahmad, H. F. title: Whole Genome Sequencing Analysis of Spike D614G Mutation Reveals Unique SARS-CoV-2 Lineages of B.1.524 and AU.2 in Malaysia date: 2021-08-18 journal: nan DOI: 10.1101/2021.08.11.21261902 sha: 75a292e71d222a82d26f6dc312224f135f7ee992 doc_id: 428275 cord_uid: ygbku1xi The SARS-CoV-2 has spread throughout the world since its discovery in China, and Malaysia is no exception. WGS has been a crucial tool in studying the evolution and genetic diversity of SARS-CoV-2 in the ongoing pandemic, and while an exceptional number of SARS-CoV-2 complete genomes have since been submitted to GISAID and NCBI, there is a scarcity of data from Malaysia. This study aims to report new Malaysian lineages responsible for the sustained spikes in COVID-19 cases during the third wave of the pandemic. Patients whose nasopharyngeal and oropharyngeal swabs were confirmed positive by real-time RT-PCR with Ct-value < 25 were chosen for WGS. The 10 SARS-CoV-2 isolates obtained were then sequenced, characterized and analyzed, including 1356 sequences of the dominant lineages of D614G variant currently circulating throughout Malaysia. The prevalence of clade GH and G formed strong ground of the discovery of two Malaysian lineages that caused sustained spikes of cases locally. Statistical analysis on the association of gender and age group with Malaysian lineages revealed a significant association (p < 0.05). Phylogenetic analysis revealed dispersion of 41 lineages, for which 22 lineages are still active. Mutational analysis observed unique G1223C missense mutation in Transmembrane Domain of Spike protein. Thus, calls for the large-scale WGS analysis of strains found around the world for greater understanding of viral evolution and genetic diversity especially in addressing the question of the effect of deleterious substitution mutation in transmembrane region of Spike protein. The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-57 CoV-2) also known as COVID-19 in Wuhan, China in December 2019 resulted in an 58 unprecedented global outbreak and has now become a major public health issue 59 are variants with specific genetic markers that have been associated with changes to 77 receptor binding, reduced neutralization by antibodies generated against previous 78 infection or vaccination, reduced efficacy of treatments, potential diagnostic impact, 79 A total of 1005 complete whole genome sequences of Malaysian variant with 161 D614G mutation were retrieved from GISAID database (S1 Table 1 ). A complete 162 genome of Wuhan-Hu-1 (NC_045512) was downloaded from GenBank 163 (https://www.ncbi.nlm.nih.gov/sars-cov-2/) for outgroup. The multiple sequence 164 alignment was performed using DECIPHER [26] and SeqinR [27] packages in R 165 version 4.0.2 and finalized using MEGA X 11 [28] . 166 Evolutionary analyses conducted in MEGA X was inferred using the Neighbor-167 Joining (NJ) method [29] . The bootstrap consensus tree inferred from 500 replicates 168 was taken to represent the evolutionary history of the taxa analyzed. Branches 169 corresponding to partitions reproduced in less than 50% bootstrap replicates were . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2021. ; https://doi.org/10.1101/2021.08.11.21261902 doi: medRxiv preprint protein TM domain) was used. Both 3D structure model of YP_009724390.1 and 188 7LC8 were uploaded to mCSM-PPI2 server [30] . Next Furthermore, splitting the genomes analysis based on years, we found a clear 232 pattern of lineage distribution which demonstrates how the major lineages disperse 233 throughout Malaysia in 2020 and 2021 (S1 Fig 1) . While the B.1.524 may have 234 contributed heavily to the initial number of D614G lineage actively spreading locally, 235 data suggest the AU.2 lineage, is currently taking its place as the major D614G 236 variant contributor in spreading the disease. This raised a question of where these 237 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) To infer the origin of the D614G variant that was responsible in causing 249 widespread COVID-19 infections in Pahang this year, we built a NJ phylogenetic tree 250 using D614G variant complete genomes downloaded from GISAID (Fig. 2) . The virus 251 sample collection dates restricted to January 1, 2021 until July this year (n=1005). is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The effects of single-point mutations on protein-protein interaction binding 300 affinity was performed using mCSM-PPI2. The result of the analysis was 301 summarized in Table 2 . To do this, a 3D structural model of wild type Spike protein 302 (YP_0097243901) was first generated through SWISS MODEL using a protein 303 template model of 6XR8 (distinct conformation states of SARS-CoV-2 Spike protein). 304 The 3D structure model of YP_0097243901 generated only covered residues 14 to 305 1162. As such, prediction of the effects of missence mutation on protein-protein 306 interaction binding affinity only covered residues within this region. Of note, mCSM-307 PPI2 is unable to predict the change in protein interaction affinity in single amino acid 308 deletions, hence analysis on L241del, L242del and A243del were not included in 309 Table 2 . To analyse the effects of G1223C mutation in the TM region of Spike 310 protein, a 3D structure model of the SARS-CoV-2 Spike protein TM domain 7LC8 311 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2021. ; https://doi.org/10.1101/2021.08.11.21261902 doi: medRxiv preprint downloaded from RCSB Protein Data Bank was directly uploaded to the mCSM-312 PPI2 server. The effects of G1223C mutation in the TM domain was included in 313 Table 2 . Together, the missence mutations L18F, N501Y, A701V and G1223C 314 seems to have increased the binding affinity of the Spike protein, while mutations 315 The first incidence of COVID-19 in Malaysia was discovered on January of 331 2020 and was traced back to be originated from China [34] . The local Malaysian 332 authority quickly developed standard guidelines for the management of COVID-19, 333 including the set-up of designated hospitals and screening centers in each state [34] . 334 To date, more than 951,884 COVID-19 cases was recorded, with increasing 335 fatalities. Based on earlier report, we found that D614G mutation sample had been 336 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2021. Moreover, other study reported that the infection with clade G was not related with 385 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2021. concern, N439K mutation promotes evasion of antibody-mediated immunity by 413 conferring resistance against several neutralizing monoclonal antibodies and 414 reduces the activity of some polyclonal sera from patients recovered from infection 415 CoV-2 into target cells through two steps. First, it involves binding of RBD to its 434 receptor human ACE2 and is proteolytically activated by human proteases at the 435 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2021. ; https://doi.org/10.1101/2021.08.11.21261902 doi: medRxiv preprint S1/S2 boundary. Second, it follows by S2 of which include TM domain will 436 undergoes structural change to mediate viral membrane fusion with targeted cells 437 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2021. ; https://doi.org/10.1101/2021.08.11.21261902 doi: medRxiv preprint specifying whether the virus samples were collected from asymptomatic or mild 461 symptoms, to severe or deceased might help to identify the prevalence of each 462 major clade and lineage frequently detected. We also discovered a plethora of 463 unclear entries that offer very little information about the real source of a sample. All 464 of these issues can affect the effectiveness and accuracy of association studies. We 465 therefore advocate for SARS-CoV-2 genomic data providers to comprehensively 466 when submitting metadata, and encourage genomic database maintainers to be 467 aware of potential errors in incoming samples and to actively support metadata 468 standards. One option may be to entirely disregard samples with suspected 469 metadata issues, however this may result in considerable reduction of sample size, is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2021. Implications for viral infectivity, disease severity and vaccine design. Biochem 564 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2021. ; https://doi.org/10.1101/2021.08.11.21261902 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2021. ; https://doi.org/10.1101/2021.08.11.21261902 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 18, 2021. ; https://doi.org/10.1101/2021.08.11.21261902 doi: medRxiv preprint The evolutionary history was inferred using the Neighbor Joining (NJ) method with 500 bootstrap replications. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches. The evolutionary distances were computed using the Kimura 2-parameter method and the rate variation among sites was modeled with a gamma distribution (shape parameter = 1). This analysis involved 1006 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All ambiguous positions were removed for each sequence pair (pairwise deletion option). There was a total of 29672 positions in the final dataset. Evolutionary analyses were conducted in MEGA X. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 18, 2021. ; https://doi.org/10.1101/2021.08.11.21261902 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 18, 2021. ; https://doi.org/10.1101/2021.08.11.21261902 doi: medRxiv preprint :S1/S2 cleavage site; FP: fusion peptide; HR1:heptad repeat 1; HR2: heptad repeat 2; TM: transmembrane domain; CD: connector domain. An amino acid mutation was analysed using Nexclade v.1.5.2 (https://clades.nextstrain.org). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 18, 2021. ; https://doi.org/10.1101/2021.08.11.21261902 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 18, 2021. ; https://doi.org/10.1101/2021.08.11.21261902 doi: medRxiv preprint SNAP: predict effect of non-synonymous polymorphisms 615 on function COVID-19 Outbreak in Malaysia. Osong Public Heal Res Perspect SARS-CoV-2 lineage B.6 was the 619 major contributor to early pandemic transmission in Malaysia SARS-CoV-2 in Asia show highest amount of SNPs A global analysis of replacement 625 of genetic variants of SARS-CoV-2 in association with containment capacity 626 and changes in disease severity Increased household transmission of COVID-19 cases associated with SARS-630 CoV-2 Variant of Concern B.1.617.2: a national case-control study Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in 633 Emergence and rapid spread of a new severe acute respiratory syndrome-637 related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in 638 Association of SARS-CoV-2 clades with clinical, inflammatory and virologic 670 outcomes: An observational study Patients With COVID-19: Focus on Severity and Mortality Male sex identified by global COVID-19 meta-analysis as a risk 677 factor for death and ITU admission The new SARS-CoV-2 strain shows a stronger binding affinity 680 to ACE2 due to N501Y mutant Higher infectivity of the SARS-CoV-2 683 new variants is associated with K417N/T, E484K, and N501Y mutants: An 684 insight from structural data Molecular dynamic simulation reveals E484K mutation enhances spike RBD-687 ACE2 affinity and the combination of E484K, K417N and N501Y mutations 688 (501Y.V2 variant) induces conformational change greater than N501Y mutant 689 alone, potentially resulting in an escap SARS-CoV-2 RBD in vitro evolution follows contagious mutation spread, yet 693 generates an able infection inhibitor SARS-CoV-2 variants, spike 696 mutations and immune escape Circulating SARS-CoV-2 spike N439K variants 700 maintain fitness while evading antibody-mediated immunity SARS-CoV-2 variants of concern as of 22 Map of cities with the Spike G1223C mutation Mutagenesis of the 705 transmembrane domain of the SARS coronavirus spike glycoprotein: 706 refinement of the requirements for SARS coronavirus cell entry Cell entry 709 mechanisms of SARS-CoV-2 A Trimeric Hydrophobic Zipper Mediates the Intramembrane 712 Assembly of SARS-CoV-2 Spike New Zealand's science-led response to the SARS-715 CoV-2 pandemic Genomics-informed responses in the elimination of COVID-19 in Victoria Australia: an observational, genomic epidemiological study Gozashti L, Corbett-Detig R. Shortcomings of SARS-CoV-2 genomic 500 replicates) are shown next to the branches. The evolutionary distances were 740 computed using the Kimura 2-parameter method and the rate variation among sites 741 was modeled with a gamma distribution (shape parameter = 1) Evolutionary analyses were conducted in MEGA X. B. A closed-up view of 746 NJ phylogenetic tree focusing on Pahang D614G variant of IIUM 5763/2021 1.466.2. C. A closed-up view of NJ phylogenetic tree focusing on Pahang 748 D614G variant of IIUM5754/2021 UMP5480/2021, IIUM-UMP5437/2021, IIUM5556/2021 and IIUM6472/2021 A closed-up view of NJ Phylogenetic tree focusing on Pahang 751 D614G variant of UMP5371/2021 and IIUM5676/2021 Legends: Computational prediction was performed by using three web-based tools: 812 mCSM