key: cord-0722487-57lnomni authors: Benslimane, F. M.; AlKhatib, H. A.; Al-Jamal, O.; Albatesh, D.; Boughattas, S.; Ahmed, A. A.; Bensaad, M.; Younuskunju, S.; Mohamoud, Y. A.; Al Badr, M.; Mohamed, A. A.; El Kahlout, R. A.; Al Hamad, T.; Elgakhlab, D.; Al-Kuwari, F. H.; Saad, C.; Jeremijenko, A.; Al-Khal, A.; Al-Maslamani, M. A.; Bertollini, R.; Al-Kuwari, E. A.; Al-Romaihi, H. E.; Al-Marri, S.; Al-Thani, M.; Badji, R. M.; Mbarek, H.; Al-Sarraj, Y.; Malek, J. A.; Ismail, S. I.; Abu-Raddad, L. J.; Coyle, P.; Althani, A. A.; YASSINE, H. M. title: One year of SARS-CoV-2: Genomic characterization of COVID-19 outbreak in Qatar date: 2021-05-20 journal: nan DOI: 10.1101/2021.05.19.21257433 sha: f6d51b2e70c4ea88e57f44c923dcd379bb9b9ac2 doc_id: 722487 cord_uid: 57lnomni The state of Qatar has emerged as a major transit hub connecting all parts of the globe, making it as a hotspot for infectious disease introduction and providing an ideal setting to monitor the emergence and spread of variants. In this study, we report on 2634 SARS-CoV-2 whole-genome sequences from infected patients in Qatar between March-2020 and March-2021, representing 1.5% of all positive cases in this period. Despite the restrictions on international travel, the viruses sampled from the populace of Qatar mirrored nearly the entire global population's genomic diversity with nine predominant viral lineages that were sustained by local transmission chains and the emergence of mutations that are likely to have originated in Qatar. We reported an increased number in the mutations and deletions in B.1.1.7 and B.1.351 lineages in a short period. This raises the imperative need to continue the ongoing genomic surveillance that has been an integral part of the national response to monitor SARS-CoV-2 profile and re-emergence in Qatar. Following its first appearance in China, the ongoing pandemic of Corona Virus Disease 19 caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has urged the global community to take steps to ease its transmission [1, 2] . Qatar, which is characterized by a diverse population of 2.8 million [3, 4] , has experienced a large outbreak with over >75,006 infections per million population, ranking as one of the highest countries with laboratoryconfirmed rates. The first confirmed COVID-19 case in Qatar was reported on February 28 2020 in quarantined individuals with a history of travel from Iran. A large outbreak of over 300 infections was then identified as the first community cluster on March 6 2020 among manual workers and expatriate craft [5] . The country declared a suspension on entry of foreign nationals on March 17 2020, allowing only exceptional entry to Qatari citizens and posed strict measures in order to control the epidemic curve. Despite these measurements, Qatar witnessed a large outbreak, with the highest confirmed cases of 2,355 per day reported on May 30 2020. By July 2020, cases dropped dramatically to an average of 200 cases per day. As such, the country lifted the travel restriction in August 2020 and started accepting returning residence while implementing a mandatory quarantine. Restrictions inside Qatar were also eased, and the lockdown was elevated in four phases [6] . In December 2020, new variants of concerns (VOC) were detected in other countries, including the UK and South Africa [7] [8] [9] [10] . These strains were related to increased transmissibility and posed higher public health concerns. Qatar posed strict restrictions on travelers arriving from those countries to prevent the strains' entrance into the country. However, soon after their global detection, COVID-19 positive cases increased significantly, indicating that a second wave is hitting the country. As of March 31 2021, 179,965 patients have tested positive, including 291 fatalities [6] , although seroprevalence studies indicated a higher infection rate within the population [3, 11] . Throughout the epidemic in Qatar, we implemented sequence strategy to detect, monitor, and evaluate the spread of the virus, including the VOC. Such sequencing efforts were crucial towards understanding the epidemiological and clinical significance of variants, especially when it comes to estimating reinfection rates and vaccine effectiveness [12] [13] [14] [15] [16] . Here we report on the clusters of SARS-CoV-2 infection reported in Qatar for the past year, with the aim to understand the origin, evolution, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 20, 2021. ; https://doi.org/10.1101/2021.05. 19 .21257433 doi: medRxiv preprint transmission patterns of the virus. This is the first study of its kind from the MENA region, which will add context to the international genomic consortiums and global data towards combating the pandemic. The study was approved by the Institutional Review Board (IRB) committees of Qatar University (QU-IRB 1289-EA/20) and Hamad Medical Corporation (MRC-01-20-145) ethical boards. Samples collected were retrospective as such, in accordance with the national legislation and the institutional requirements, written informed consent for participants was not required for this study. Samples selection was based on epidemiological characteristics representing various cluster types and nationalities. The selection was based on investigations that were conducted by public health teams in the country on data that was fed into the national database. Data including patient's demography (age, gender, nationality) and epidemiology (date of onset and designation) were retrieved. COVID19 positive (CT<35) respiratory specimens (nasopharyngeal swabs) of 2634 patients were retrieved from the virology laboratory at Hamad Medical Corporation (HMC), which is the main lab that provides diagnostic testing in the nation. Samples were handled in a biosafety level 3 laboratory with full personal protective equipment and adapted procedures to airborne pathogens by trained personnel as recommended by the World Health Organization [17] . Samples were selected from main clusters since the start of local transmission for a period of one year, March 10 2020 to March 29 2021. The RNA was extracted using NucleoSpin RNA Virus isolation kits (Macherey-Nagel) or MGI automated extraction platforms and kits (MGI, China) according to the manufacturers' protocols. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Two sequencing platforms and protocols were used, Oxford Nanopore Technology (ONT) and Illumina Miseq. The ARTIC Network SARS-CoV-2 sequencing protocol and V3 primer amplicon set were used for sequencing SARS-CoV-2 near full genome on Oxford Nanopore's GridION platform (https://artic.network/ncov-2019) [18] . In brief, complementary DNA was synthesized from the extracted RNA based on the CT values of the samples using SuperScript IV reverse transcriptase (Thermofisher scientific). A multiplex PCR was then performed using Q5® Hot Start High-Fidelity Master Mix (NEB) and V3 primer set to generate amplicon that are 400bp in length. Sequencing libraries were prepared using Nanopore's ligation kits (SQK-LSK109) and native barcode kits (EXP-NBD104 and EXP-NBD114) to multiplex up to 24 samples per sequence run on a R9.4 flow cell (1). DNA sequencing and analysis on Illumina Miseq platform was carried out as previously described [14, 16] . Briefly, libraries were constructed from viral RNA using the Paragon Genomics CleanPlex SARS-CoV-2 Panel and sequenced on the Illumina MiSeq according to the manufacturers recommended protocol. For ONT reads, MinKNOW software was used for base calling, and reads with a minimum Q score of 7 were considered for downstream analysis. ARTIC network nCoV-2019 novel coronavirus bioinformatics protocol (https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html) was used for variant calling and generation of consensus sequences. In brief, guppy_barcoder was used to demultiplex the data. Only fragments that had barcodes at both ends were analyzed. Reads between 400bp and 700bp were filtered and variant calling was performed using Medaka tool in reference to SARS-CoV-2 genome sequence (accession MN908947). Consensus was called when there is at least 30x coverage. For Illumina reads, sequences were quality/adapter trimmed with CUTADAPT, primer sequences removed with FGBIO, aligned with BWA-MEM and SNPs called with SAMTOOLS as previously described [14, 16] . SARS-CoV-2 lineages were determine based on pangolin (V.2.4.2) nomenclature [19] . For mutation analysis, the cut-off value for considering a mutation for analysis was set to a prevalence of ≥ 1%. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) [20] . Independent ML phylogenies were generated for the B.1.1.7 (n=134) and B.1.351 (n=112) genomes. Global SARS-CoV-2 genomes representative of the two lineages were retrieved from GISAID (https://www.gisaid.org/) and aligned with our sequences. Global sequences were selected to include four genomes per continent per week (based on sample collection day), from December 2020 to March 2021. Models for phylogenetic tree configurations were selected based on IQ-Tree's ModelFinder tool [21] . Phylogenetic tree of the B. The population in Qatar is divided between manual and craft workers (~60%) and the urban population. While during the first wave (April to June 2020) the virus transmission was mostly sustained in the first larger population that reached ~65%-70% seroprevalence by September 2020, the transmission was more confound in the urban population during the second wave (started early March 2021). As such, around 70% of samples selected for this study were from the large cluster, while 30% were from the urban cluster. An average of 3.12±8.18% of daily COVID19 positive samples were sequenced (Supplementary Figure. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 20, 2021. ; https://doi.org/10.1101/2021.05.19.21257433 doi: medRxiv preprint Out of 2634 samples, 2241 (85.1%) yielded a coverage above 60% and were used for downstream analysis. Sequence coverage was significantly inversely correlated with the CT value (R 2 =0.0436 and p value < 0.0001). All generated SARS-CoV-2 genome in this study were deposited to the Global Initiative on Sharing All Influenza Data (GISAID). Accession IDs are listed in supplementary Data 2. The consensus sequences were analyzed using Pangolin COVID-19 Lineage Assigner tool. Data was then manually checked to exclude any sequences that were missing collection date, had no amino acid mutations, or their lineage assignment was not accurate in reference to the detected sequence mutations. This resulted in a total of 2013 sequences that were included in variant analysis. Nine lineages were reported in at least 1% of the samples (Supplementary Figure. In-depth non-synonymous mutation analysis was performed independently for each of the major variants that were highly prevalent. Mutations with a frequency of ≥1%, calculated from the total sequences detected in each variant, were considered with focus on mutations that were reported in at least 1% of the studied cohort. The World Health Organization declared the COVID-19 outbreak in China a public health emergency of international concern on January 30 2020, then a worldwide pandemic on 11 March 2020 [17, 23, 24] . By that time, the virus had spread to more than 40 countries and resulted in more than 140,000 infections and more than 4000 deaths worldwide [25] . By the first week of March, the virus was reported in many countries, including eight in Middle East: Iran, Pakistan, Afghanistan, Kuwait, Bahrain, Iraq, Oman, and Qatar [26] . The first covid-19 case in . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Qatar was reported on February 29, from a traveler returning from Iran that was experiencing the worst outbreak in that region of the world [27] . However, the first documented case of SARS-CoV-2 community transmission was identified on March 6 among expatriate craftworkers living in high-density accommodations [3, 5] . The number of cases increased dramatically since then, reaching 781 confirmed cases by the end of March 2020 [28] . The number of cases continued to increase until it reached the peak of the first pandemic wave in mid-May 2020 [29] . Qatar reached the first pandemic peak earlier than neighboring countries such as Saudi Arabia, Iraq, Lebanon, and Jordan, but late compared to Asian and European countries [29] . The positivity rate started to decrease after May 22, reaching 17% on July 9 [3] . As of July 15, a total number of 104,984 cases had been confirmed in Qatar, a rate of 36,439 cases per million population, one of the highest rates worldwide [3] . The first wave (March-July 2020) was characterized by the circulation of seven main SARS-CoV-2 lineages; out of more than 80 lineages identified worldwide [30, 31] . Locally, the most prevalent lineages were B.1.428 (32%), B.1 (21%), and B.1.1 (5%). Worldwide, the B.1 lineage was dominating, accounting for 17% of COVID-19 cases followed by the B.1.1 (13%) [19, 30, 32] . The B.1.428, on the other hand, was less prevalent globally, accounting for less than 1% of the positive cases [30] . In August, the number of local cases declined dramatically, and accordingly, restrictions on socialization and travel were loosened [29, 33] . Despite the strict quarantine policy on traveling, more lineages were still detected locally but had limited spread. Globally, the overall situation of the pandemic seemed to be under control during September and October except for India, Brazil, and the USA [25] . During this period, B. May 2020 to March 2021 [42] . Similarly, only sporadic B.1.427 cases were reported in the other 32 countries that reported this variant [30, 32] . The UK variant, B.1.1.7, was the second VOC to be detected in Qatar. The first ten cases were identified in late December from local cases. At that time, the variant was circulating in Europe and at lower frequency in the USA, Africa, and other Middle East countries [30] . As the number of B.1.1.7 cases was increasing locally and globally, another fast-spreading variant has been identified in South Africa [39] . By December 31, the South African variant, B.1.351, was detected in at least 60 countries, including Qatar [30] . However, the first community transmission case in Qatar was detected on February 1 and soon later, it took hold. By March 15, B.1.351 cases have reached 60% locally and 7% globally [30] . The prevalence of B.1.17 was evident by the S-gene target failure, using Thermo Fisher Scientific's, USA, TaqPath COVID-19 Combo Kit platform [45] , and for B.1.351, the sequencing data presented here [15] . Qatar ranked among the top 20 countries reporting B.1.351 cases, and the highest in the MENA region, despite the late appearance of the variant compared to other countries [30, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 20, 2021 with a sharp increase (doubled in less than a week) in daily cases, consecutively, the Rt value increased to 1.5. Although the daily cases are still relatively lower than the first wave, the death rate, however, is significantly higher (Average death from March 1 to 20 2021 0.50±.57 as compared to 1.73±1.01 from March 21 to 31 2021, P<0.0001). This triggered the government to reimpose strict lockdown measures to control their spread in the community [29, 33] . The CDC classified these two variants as VOC after showing evidence of increased transmissibility and reduced neutralization by convalescent sera [41] . Both variants were found to increase the transmissibility by at least 50% [43, 44] . The fast-spreading variants, however, exhibited differential resistance to neutralization by convalescent sera. According to some studies, while neutralization against the U.K. variant dropped by roughly 2-fold, it dropped by 6.5-to 8.5-fold against the South Africa variant, using convalescent and post-vaccination sera [46] [47] [48] . This raised global concerns about the effectiveness of available vaccines against the two novel variants. In a recent study from Qatar, the effectiveness of the BNT162b2 vaccine was estimated at 89.5% for the B.1.1.7 variant and 75.0% for the B.1.351 variant but exhibited 100% effectiveness toward developing sever disease [15] . The vaccine effectiveness can be rendered by the accumulation of immune-escape mutation(s) in the spike protein, specifically those in the RBD. As a typical RNA virus, mutations frequently arise in the SARS-CoV-2 genome due to the error-prone replication process [49] . However, the majority of these mutations are lost as a result of the natural selection process [50] . Only mutations that confer fitness effect (increased replication, transmissibility, immune evasion) may ultimately fix in the global population of SARS-CoV-2. Here, we analyzed the mutations that emerged in each lineage over a one-year study period. Overall, sequence analysis revealed more than 99% similarity among genomes that belong to the same lineage. In conclusion, this study identified SARS-CoV-2 lineages and their circulation pattern in Qatar, a highly diverse region that was heavily impacted by COVID-19. Regardless of the implementation . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 20, 2021. ; https://doi.org/10.1101/2021.05.19.21257433 doi: medRxiv preprint of restriction measure, particularly on international movements, 61 SARS-COV-2 lineages were detected. Nine were widely represented including two VOCs. We identified a number of novel mutations that are likely to have originated in Qatar including N481K located in the spike protein. We also reported an increased number in the mutations and deletions in the backbone of the VOCs, B.1.1.7 and B.1.351, in a short period. This raises the imperative need to continue the ongoing genomic surveillance that has been an integral part of the national response to monitor any SARS-CoV-2 re-emergence in Qatar. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 20, 2021 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) are presented based on their prevalence. In solid lines, mutations that were detected at a frequency of at least 1% from the study cohort. Mutation in dotted lines did not pass the 1% cutoff in the study cohort but were present in at least 1% of the sequences in a particular lineage. The grey area represents the evolution of each lineage across the studied period. Variant of concern are identified with asterisk. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 20, 2021. ; https://doi.org/10.1101/2021.05.19.21257433 doi: medRxiv preprint 28 112 local sequences, respectively. Global genomes representative of the two clades with collection dates from November 2020 to March 2021 were included in the phylogenetic tress. Global genomes are indicated in black, local genomes are colored in blue, and imported samples are colored in yellow. The first genome detected locally is colored in red. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 20, 2021. ; https://doi.org/10.1101/2021.05.19.21257433 doi: medRxiv preprint A Novel Coronavirus from Patients with Pneumonia in China Phylogenomic analysis of COVID-19 summer and winter outbreaks in Hong Kong: An observational study Characterizing the Qatar advanced-phase SARS-CoV-2 epidemic. Sci Rep Mathematical modeling of the SARS-CoV-2 epidemic in Qatar and its impact on the national response to COVID-19 Epidemiological investigation of the first 5685 cases of SARS-CoV-2 infection in Qatar Coronavirus Disease 2019 (COVID-19) Multiple Early Introductions of SARS-CoV-2 to Cape Town, South Africa. Viruses SARS-CoV-2 Lineages and Sub-Lineages Circulating Worldwide: A Dynamic Overview Genomic characterization of a novel SARS-CoV-2 lineage from Rio de Janeiro, Brazil First detection of SARS-CoV-2 spike protein N501 mutation in Italy in August SARS-CoV-2 seroprevalence in the urban population of Qatar: an analysis of antibody testing on a sample of 112,941 individuals The proximal origin of SARS-CoV-2 Emerging biosensing technologies for improved diagnostics of COVID-19 and future pandemics Within-Host Diversity of SARS-CoV-2 in COVID-19 Patients With Variable Disease Severities Effectiveness of the BNT162b2 Covid-19 Vaccine against the B.1.1.7 and B.1.351 Variants Assessment of the risk of SARS-CoV-2 reinfection in an intense reexposure setting Timeline: WHO's COVID-19 response nCoV-2019 sequencing protocol. Protocols. io A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies ModelFinder: fast model selection for accurate phylogenetic estimates Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) The Novel Coronavirus Originating in Wuhan, China: Challenges for Global Health Governance Possible therapeutic role of a highly standardized mixture of active compounds derived from cultured Lentinula edodes mycelia (AHCC) in patients infected with 2019 novel coronavirus An interactive web-based dashboard to track COVID-19 in real time The coronavirus pandemic in five powerful charts Qatar reports first case of coronavirus, in The peninsula. 2020: Qatar. 28. The ministry of Public Health COVID-19) Genomic epidemiology of novel coronavirus -Global subsampling Nextstrain: real-time tracking of pathogen evolution Available from: github.com/cov-lineages/pangolin. 33. The ministry of Public Health. Coronavirus disease 2019 (COVID-19)-Qatar Travel and Return Policy CoV-2 Is Re-emerging Following the Relaxation of Lockdown Restrictions Covid-19: Prime minister announces relaxation of England's lockdown and social distancing rules New COVID-19 Variants de Vasconcelos RHT, at al, Spike E484K mutation in the first SARS-CoV-2 reinfection case confirmed in Brazil Emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) lineage with multiple spike mutations in South Africa Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations. 2020. 41. Center of Disease Control (CDC) Transmission, infectivity, and antibody neutralization of an emerging SARS-CoV-2 variant in California carrying a L452R spike protein mutation. medRxiv Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England Eggo R Estimates of severity and transmissibility of novel South Africa SARS-CoV-2 variant 501Y.V2 TaqPath™ COVID-19 CE-IVD RT-PCR Kit instructions for use Antibody resistance of SARS-CoV-2 variants B.1.351 and B.1.1.7. Nature mRNA-1273 vaccine induces neutralizing antibodies against spike mutants from global SARS-CoV-2 variants. bioRxiv Safety and efficacy of the ChAdOx1 nCoV-19 vaccine (AZD1222) against SARS-CoV-2: an interim analysis of four randomised controlled trials in Brazil, South Africa, and the UK. Lancet Genetic Recombination, and Pathogenesis of Coronaviruses Coronavirus biology and replication: implications for SARS-CoV-2 A performed the analysis that generated the graphs and drafted the manuscript Y identified the samples for sequencing and extracted demographic information from databases. S.I.I, RB, HM, and YS, supported the idea by securing funds Qatar Petroleum (project number: QUEX-BRC-QP-PW-18/19 and QUEX-BRC-QP-GH-18/19), and Qatar University