key: cord-0704470-vu5ohe18 authors: Jeewandara, C.; Jayathilaka, D.; Ranasinghe, D.; Hsu, N. S.; Ariyaratne, D.; Jayadas, T. T.; Madusanka, D.; Lindsey, B. B.; Gomes, L.; Parker, M. D.; Wijewickrama, A.; Karunaratne, M.; Ogg, G.; de Silva, T.; Malavige, G. N. title: Genomic and epidemiological analysis of SARS-CoV-2 viruses in Sri Lanka date: 2021-05-10 journal: nan DOI: 10.1101/2021.05.05.21256384 sha: 76345aa99b531ba8bd64329d3a13a271521c3cd6 doc_id: 704470 cord_uid: vu5ohe18 Since identification of the first Sri Lankan individual with the SARS-CoV-2 in early March 2020, small clusters that occurred were largely contained until the current extensive outbreak that started in early October 2020. In order to understand the molecular epidemiology of SARS-CoV-2 in Sri Lanka, we carried out genomic sequencing overlaid on available epidemiological data. The B.1.411 lineage was most prevalent, which was established in Sri Lanka and caused outbreaks throughout the country. The estimated time of the most recent common ancestor of this lineage was 10th August 2020 (95% lower and upper bounds 6th July to 7th September), suggesting cryptic transmission may have occurred, prior to a large epidemic starting in October 2020. Returning travellers were identified with infections caused by lineage B.1.258 , as well as the more transmissible B.1.1.7 lineage. Ongoing genomic surveillance in Sri Lanka is vital as vaccine roll-out increases. The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has emerged as the leading cause of mortality in several countries in the world. As of the 20 th of February 2021, 111.3 million cases and 2.5 million deaths have been reported worldwide (1) . Due to the emergence of variants of concern, the World Health Organization has recommended whole genomic sequencing of the SARS-CoV-2 viruses within countries regularly and systematically for early identification of such variants (2). The first patient infected with SARS-CoV-2 in Sri Lanka was reported on the 27 th January 2020, who was a foreign national, with the first Sri Lankan patient reported on the 10 th of March 2020 (3). In the following six months (March to September), the spread of the virus was largely contained with only 3111 reported cases, of which 38.8% were imported (4). However, there was a surge in the number of cases with discovery of a new cluster in early October 2020 in a clothing factory in the district adjacent to Colombo (Gampaha). This was followed by rapid spread of SARS-CoV-2 within the Colombo Municipality region (CMC), fish markets and subsequently to the whole country. This outbreak continues to evolve, with infections now being reported in all regions of the country. As of 4 th of February 2021 a total of 66,409 cases have been reported with 332 deaths (319 within the current large outbreak) (5) . We carried out SARS-CoV-2 sequencing from isolates collected throughout the different phases of the pandemic in order to determine the molecular epidemiology of SARS-CoV-2 in Sri Lanka, including current . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. and SH-aLRT branch test (-alrt 1000). Molecular clock phylogenetic analysis was undertaken using sequences from Sri Lanka. The alignment and maximum likelihood tree construction were performed using MAFFT and IQ-TREE2 as described above. TreeTime (6) was used to infer a molecular clock phylogeny using a strict evolutionary rate of 1.1 × 10−3 substitutions/site/year (estimated by Duchene et al. (7) ) and a standard deviation of 0.00004. The tree was rerooted with least-squares criteria in TreeTime. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 10, 2021. ; https://doi.org/10.1101/2021.05.05.21256384 doi: medRxiv preprint 6 Eight samples were excluded from the analysis due to their inconsistent temporal signal. Lineages were assigned using Pangolin (version v2.3.8, lineages version 2021-04-23). Phylogenetic tree visualizations were produced using R (v3.5.3), R/ape, R/ggtree, R/ggplot2, R/ggtreeExtra, R/dplyr, R/phytools, R/tidytree. A proportional symbol map of Sri Lanka was plotted with GPS coordinates of the sampling locations of B.1.411 sequences using R (v4.0.1), R/maps, R/ggplot2, R/ggrepel, R/cowplot and R/dplyr. Each sampling location was indicated by a coloured bubble proportionate to the number of sequences sampled within. Colombo district was zoomed into a sub map (longitude: 79.80 -79.98, latitude: 6.80 -6.98) in order to visualize the suburbs as Colombo had the highest sampling density. Of six samples collected in March 2020 from returning travellers and their contacts (period A, Period C in figure 1 was thought to be due to an outbreak initiated by the returning workforce from the Middle East. Sequence was obtained from only one virus, which belonged to lineage B. Again, due to detection of infected patients at the airport and mandatory quarantine of all individuals for at least 14 days, cases appeared not to spill over to the community. However, there was a sudden surge in the number of cases in mid-July in a drug rehabilitation centre (DRC), in the North Central Province (period D, figure 1 ). The origin of this outbreak was not . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. We report the first description of SARS-CoV-2 molecular epidemiology in Sri Lanka from March 2020 to early March 2021. The virus strains identified in March 2020 belonged to clades B.1, B.2, B 1.1 and B.4, demonstrating that SARS-CoV-2 strains were introduced to Sri Lanka from multiple locations (8, 9) . Sri Lanka underwent a national lockdown very early in the pandemic on the 20 th of March 2020, when only 66 patients with SARS-CoV-2 were confirmed. This lockdown, which continued until mid-May, managed to contain the outbreak and prevent community transmission, except within isolated community clusters. A further contained outbreak occurred in mid-July within a drug rehabilitation centre (DRC). Sequencing of a limited number of these samples showed that this outbreak was due to viruses belonging to lineage B.1 but which were distinct to the former B.1 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 10, 2021. ; https://doi.org/10.1101/2021.05.05.21256384 doi: medRxiv preprint 9 samples. The outbreak in the DRC was also subsequently controlled and Sri Lanka did not report any cases of locally acquired infection during the months of August and September. Small numbers of reported cases were from imported infections only. A large outbreak was abruptly discovered in early October after a clothing factory employee presented with pneumonia caused by SARS-CoV-2, which was followed by the emergence of a large second wave. We report that these were due to a lineage first described in samples from Sri Lanka, B.1.411, that dispersed throughout the country. A molecular clock analysis revealed that this lineage most likely emerged in August and therefore, it is possible that the virus was circulating in the community for several weeks before leading to the large outbreak that started in October. This highlights the potential for cryptic community transmission leading to a national epidemic wave even in the face of strict quarantine rules for returning travellers. The, B.1.411 Sri Lankan lineage has a unique spike mutation H1159Y in the C terminal region, which was seen in 185/192 viruses belonging to this lineage. The significance of this mutation is unknown. Also, the P323L mutation in NSP12 region, which is known to have co-evolved with D614G mutation was seen in 172/192 of the B.1.411 genomes (10) . Even though there is no direct correlation between P323L mutation and infectivity, given the fact that this mutation is widespread and almost 100% co-existent with D614G some argue that this mutation could contribute to the enhanced viral replication and infectivity seen in D614G dominant strains (10) . Most importantly, one B.1.411 genome carrying the E484K mutation on the spike protein was detected from the community in mid-February, 2021, demonstrating the potential for this lineage to evolve mutations that may evade antibody responses. Even though E484K mutation is predominantly seen in B.1.351 and P.1 lineages, recent evidence indicates introduction of this mutation into other lineages such as B.1.1.7 and B.1.243 (11) . The other more frequent mutations . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 10, 2021. ; https://doi.org/10.1101/2021.05.05.21256384 doi: medRxiv preprint 1 0 were T166I in NSP2, L37F in NSP6, and T205I in N-protein. The L37F mutation in NSP6 is thought to render the NSP6 protein less stable and therefore, compromise the function of NSP6 (12) . The other mutations have been frequently reported in many other SARS-CoV-2 lineages (13) while the other changes that were detected in the amino acids have not been associated with increased or reduced virulence. Since the emergence of the 'second wave' of SARS-CoV-2 infections in early October 2020, all repatriation from overseas was stopped for a few months and subsequently, restarted in December 2020. Along with this, viruses of many lineages were identified within the quarantine centers where overseas returnees were housed. Importantly, the B.1.1.7 variant, which has been associated with higher transmissibility (14) was initially identified within these quarantine centres, but later from the community. Four viruses carried the N439K mutation in the receptor binding motif (RBM) in the S-protein which is known to enhance the binding affinity of the Sprotein to human ACE2 receptor and the resistance against several neutralizing monoclonal antibodies (15). In addition, three of those genomes showed the S:H69/V70 mutation, which often co-occurs with the RBM with amino acid replacements such as N439K (16). This also is shown to associate with increased infectivity. None of these were identified within the community. In summary, the viruses identified in March 2020, appear to be predominantly introduced by multiple sources such as from Europe and the Middle East and these strains were responsible for the subsequent outbreaks that were seen in Sri Lanka until July. The large ongoing outbreak that started in early October, appears to be due to spread of a single virus lineage, B.1.411 until the current data availability to end of March 2021, with new lineages introduced recently from . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 10, 2021. ; https://doi.org/10.1101/2021.05.05.21256384 doi: medRxiv preprint 1 1 visitors from overseas. As SARS-CoV-2 vaccine rollout commences in Sri Lanka, ongoing genomic surveillance for variants of concern will be vital. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 10, 2021. ; https://doi.org/10.1101/2021.05.05.21256384 doi: medRxiv preprint World Health Organization; 2021. p. 94. 3. Epidemiology unit MoH, Sri Lanka. COVID-19 COVID-19 CORONAVIRUS OUTBREAK Temporal signal and the phylodynamic threshold of SARS-CoV-2. Virus Evol A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Infectivity of SARS-CoV-2: there Is Something More than D614G? Covid-19: The E484K mutation and the risks it poses Decoding asymptomatic COVID-19 infection and transmission Mutational Frequencies of SARS-CoV-2 Genome during the Beginning Months of the Outbreak in USA. Pathogens Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: Insights from linking epidemiological and genetic data International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted We thank Dr. Julian Villabonas-Arenas for sharing some scripts for viral sequence analysis. We are grateful to the World Health Organization, UK Medical Research Council and the Foreign and Commonwealth Office for support. Sagulenko P, Puller V, Neher RA. TreeTime: Maximum-likelihood phylodynamic analysis. Virus Evol. 2018 Jan;4(1):vex042.. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)The copyright holder for this preprint this version posted May 10, 2021. ; https://doi.org/10.1101/2021.05.05.21256384 doi: medRxiv preprint 4 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.