key: cord-0718724-c65fuqyb authors: Nasereddin, Abedelmajeed; Al-Jawabreh, Amer; Dumaidi, Kamal; Al-Jawabreh, Ahmed; Al-Jawabreh, Hanan; Ereqat, Suheir title: Tracking of SARS-CoV-2 Alpha variant (B.1.1.7) in Palestine date: 2022-04-04 journal: Infect Genet Evol DOI: 10.1016/j.meegid.2022.105279 sha: 3c5cc45b1e7125994ebf728958b2d549bce8d69c doc_id: 718724 cord_uid: c65fuqyb As surges of the COVID-19 pandemic continue globally, including in Palestine, several new SARS-CoV-2 variants have been introduced. This expansion has impacted transmission, disease severity, virulence, diagnosis, therapy, and natural and vaccine-induced immunity. Here, 183 whole genome sequences (WGS) were analyzed, of which 129 were from Palestinian cases, 62 of which were collected in 11 Palestinian districts between October 2020 and April 2021and sequenced completely . A dramatic shift from the wild type to the Alpha variant (B 1.1.7) was observed within a short period of time. Cluster mapping revealed statistically significant clades in two main Palestinian cities, Al-Khalil (Monte Carlo hypothesis test-Poisson model, P = 0.00000000012) and Nablus (Monte Carlo hypothesis test-Poisson model, P = 0.014 and 0.015). The phylogenetic tree showed three main clusters of SARS-CoV-2 with high bootstrap values (>90). However, population genetics analysis showed a genetically homogenous population supported by low Wright's F-statistic values (Fst <0.25), high gene flow (Nm > 3), and statistically insignificant Tajima's D values (Tajima's test, neutrality model prediction, P = 0.02). The Alpha variant, rapidly replaced the wild type, causing a major surge that peaked in April 2021, with an increased COVID-19 mortality rate, especially, in the Al-Khalil and Nablus districts. The source of introduction remains uncertain, despite the minimal genetic variation. The study substantiates the use of WGS for SARS-CoV-2 surveillance as an early warning system to track down new variants requiring effective control. The recently introduced serious respiratory illness termed COVID-19 was caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It is an enveloped, positive sense, single-stranded RNA virus, belonging to the family Coronaviridae, genus Betacoronavirus. In December, 2019, SARS-CoV-2 was firstly identified and reported in Wuhan, the capital city of Hubei province, China (Wang et al., 2020a) . It has rapidly spread across the world (Vassallo et al., 2021) with a sharp increase in the number of cases (Velavan and Meyer, 2020) . On January 30, 2020, the World Health Organization (WHO) declared SARS-CoV-2 an outbreak and public health emergency of international concern. On March 11, 2020, COVID-19 was declared a pandemic. More than 236 million cases had been reported with more than 4.8 million deaths up to October 6, 2021 (Worldmeter, 2021) . In Palestine, including the West Bank, Gaza strip and East Jerusalem, the first cases were reported on March 4, 2020. The number has grown since then, reaching more than 439 thousands laboratory-confirmed SARS-COV-2 cases and almost 4400 deaths up to October 6, 2021 (http://site.moh.ps/index/covid19/LanguageVersion/1/Language/ar). At the beginning of the pandemic, SARS-CoV-2 was classified into two lineages; A and B. Both of these lineages originated in China. Later on, lineage A spread to Asia and then to the rest of the world whereas lineage B spread to Europe (Rambaut et al., 2020) . However, although SARS- CoV-2 appears to have a relatively stable genome, its spilling into humans and the failure to contain its spread in many countries, several lineages and genetic diversities have been reported (Eden et al., 2020) . In April 2020, Korber et al., (2020) reported a virus with a high frequency containing a spike D614G mutation, which became the dominant type of the J o u r n a l P r e -p r o o f pandemic. In addition, they showed that the G614 variant is associated with greater infectivity with clinical evidence associating it with higher viral loads (Korber et al., 2020) . Later, a N439K mutation in the spike receptor-binding domain (RBD) that was reported in some European countries and the United States. The N439K mutation was reported to confer resistance against several neutralizing monoclonal antibodies and reduced the activity of some polyclonal sera from recovered COVID-19 persons (Thomson et al., 2021) . Furthermore, the new variant SARS-CoV-2 lineage B.1.1.7, recently called Alpha variant (Konings et al., 2021) , which was first reported in England, had both the N501Y and the 69-70 deletion mutations in the spike region of the viral genome. This variant showed an increased viral transmissibility rate above that of the previous circulating variants but with no evidence of increased clinical severity or vaccine escape capability . Later, Tegally, et al., (Tegally et al., 2021 ) described a new SARS-CoV-2 variant (B.1.351-501Y.V2), which was firstly seen in South Africa (Beta variant) (Konings et al., 2021) , and had eight mutations in the spike protein, three of which were located at the RBD (K417N, E484K and N501Y), which may have functional significance. The site of two of the RBD mutations, K417N and E484K, are considered the key for binding neutralizing antibodies. The primary model of this variant (501Y.V2) showed 50% extra transmissibility than previous variants (Tegally et al., 2021) . Other recent variants of concern are the gamma variants, B.1.1.28-484 K.V2 and B.1.1.248, which have in addition to the E484K mutation several other unique mutations in the spike glycoprotein. These two variants may enhance transmissibility similarly to both Alpha and beta variants as they share a similar pattern of mutations (Toovey et al., 2021) . The Delta variant, first reported in India, carried two unique mutations, L452R and T478K, on the spike J o u r n a l P r e -p r o o f region. All the aforementioned variants became a public health concern and, therefore, defined as variants of concern (VOC). These variants showed significant effects on disease distribution (surges of the pandemic), severity and transmissibility as evidenced by increased numbers hospitalizations, case fatality rate, reduction of neutralizing antibody levels post-vaccination and reduced susceptibility to drugs (CDC.gov, 2021) . In addition, the VOC list was extended by Sample collection: In a descriptive cross-sectional study, nasopharyngeal swabs (Poctman, Guandong Poctman Life Technology Co., Ltd, China) were conveniently collected from a pool of 62 Palestinian patients after being officially confirmed as COVID-19 cases with varying degrees of classical symptoms. All samples were transported at 4 o C in a sealed container for RNA extraction and processing. J o u r n a l P r e -p r o o f Samples were collected between October 2020 and April 2021, which during the second surge of the COVID-19 pandemic. The study received ethical approval from the Research Ethics Committee of Al-Quds University under the reference number 154/REC/2020. Verbal informed consent was taken from the patients or their guardians if less than 18 years old. (Afgan et al., 2018) . SARS-CoV-2 consensus sequences were obtained by mapping reads with BWA-MEM -map medium and long reads (> 100bp) against the Wuhan-Hu-1 SARS-CoV-2 reference isolate (hCoV-19/Wuhan/Hu-1/2019 and GenBank accession number NC_045512.0, GISAID accession ID: EPI_ISL_402125) (Gohl et al., 2020; Quick et al., 2017) . The mapping was displayed with a local Integrative Genomics Viewer (IGV) and the consensus sequence from the IGV was copied and accepted as the sequence of each sample. Three free online program platforms were used for the analysis of emerging, PANGO lineages, and GISAID clades; The first, being the GISAID CoVsurver (https://www.gisaid.org/epifluapplications/covsurver-mutations-app/) shows phenotypic or epidemiological amino acid changes compared to hCoV-19/Wuhan/WIV04/2019. The second, the Genome Detective Coronavirus typing tool version 1.132-(https://www.genomedetective.com/app/typingtool/cov/), to identifies coronavirus types, genotypes and lineages of nucleotide sequence, using the basic local alignment search tool (BLAST) and phylogenetic methods and used for this purpose on the Palestinians isolates. Lastly; Nextstrain, a website for real-time tracking of pathogen evolution, was used to identify the Palestinian SARS-CoV-2 isolates; (https://nextstrain.org/ncov/asia?f_country=Palestine). uploaded to the MEGA version X program (Kumar et al., 2018) . The phylogenetic tree was inferred, using the UPGMA method (Felsenstein, 1985; Kumar et al., 2018; Saitou and Nei, 1987; Tamura et al., 2004) . The randomly-retrieved complete SARS-CoV-2 genomes are evenly distributed all over the continents. Maximum likelihood (ML) and neighbor-joining (NJ) phylogenetic trees of the complete genomes with 1000 iterations for bootstrapping were constructed, using MEGA version X. The analysis included the Palestinian study genomes, retrieved genomes, Wuhan reference strain, and an out-group genome of the bat Coronavirus from China (GenBank: MN996532.2). Population genetic parameters included mean genetic diversity (Hd) number of haplotypes (h), nucleotide diversity per site (π) (Saitou and Nei, 1987) , total number of mutations (Eta), average number of nucleotide differences (k) (Tajima, 1983) , and number of variable/segregating sites. Eta and neutrality estimators of mutation rate included Tajima's D, Fu Li's D, and Fu Li's F tests. Genetic differentiation parameters were calculated, using DnaSP ver. 6.12.03 (Rozas et al., 2017) . These estimators included Wright's F-statistics (Fst) as pairwise genetic distance (Wright, 1951) , number of migrant (Nm), gene flow and population migration among populations, Nm=(1-J o u r n a l P r e -p r o o f Fst)/2Fst haploid, Nm=(1-Fst)/4Fst diploid (Hudson et al., 1992; Wright, 1951) . The average number of nucleotide differences between populations 1 and 2 (Kxy), the average number of nucleotide substitutions per site between populations 1 and 2 (Dxy) (Saitou and Nei, 1987) , the number of net nucleotide substitutions per site between populations 1 and 2 (Da) (Saitou and Nei, 1987) . The genetic differentiation index is based on the frequency of haplotypes (Gst) (Nei, 1973) . Also, Hudson-Kreitman-Aguadé, HKA(X 2 ), a neutrality test to assess fitness of data to neutral evolution that assumes the same polymorphism and divergence, was estimated. Cluster mapping of SARS-CoV-2 cases from Palestine with statistical inference was conducted, using two software packages. (Kulldorff et al., 2005) . The study included 62 samples from Palestinian symptomatic and asymptomatic COVID-19 patients from the 11 Palestinian districts in the West Bank (Figure 1) , excluding East Jerusalem. A total of 183 SARS-CoV-2 WGS were used to construct a consensus phylogenetic tree (Figure 3) of which, 129 were Palestinian sequences, 62 from this study and 67 from the Palestinian Ministry of Health (Qutob et al., 2021) , and 52 international GISAID-retrieved strains from various countries in all continents, including neighborhood sequences from Jordan and Egypt. All positions containing gaps and missing data were eliminated, using the complete deletion option. There were a total of 23926 positions in the final dataset (Kumar et al., 2018) . The maximum likelihood phylogenetic tree (Figure 3) using Tamura-Nei model (Saitou and Nei, 1987; Tamura et al., 2004) . The bootstrap values with a percentage above 80 are shown next to the branches (Felsenstein, 1985) . Branches less than that are collapsed. The codes starting with 'EPI' indicate Accession ID from GISIAD while the others are Accession numbers from the GenBank. Green Accession IDs are Palestinian sequences, black accession IDs are international sequences, red accession ID is the Wuhan reference, and the blue accession number is the bat SARS-CoV-2 outgroup (Rhinolophus affinis). In the second SARS-CoV-2 surge of the pandemic in Palestine, population diversity indices and neutrality tests were calculated for the 182 complete SARS-CoV-2 genomes based on the clustering shown in the maximum likelihood phylogenetic tree (Tables 1 and 2 The nucleotide diversity (π) in the three clusters was low (Average of 0.664 and total of 0.669± 0.005), while haplotype diversity (Hd) was high (Total of 1.00± 0.001) ( Table 1 ). The DnaSP ver. 6.12.03 estimated the average number of nucleotide differences between any two sequences (Mor et al., 2021) . This explains the rapid spread of the COVID-19 in Palestine during that outbreak. No Alpha variants were reported in the previous Palestinian study (Qutob et al., 2021) , indicating the need for continuous surveillance by WGS sequencing and phylogenetic analysis. These key strategies appear to be the most effective means of for studying and tracking circulating viral lineages, identifying their transmission pathways, and screening for the introduction and spread of new variants of concern, especially of the highly virulent ones like the Alpha variant. A study conducted by Vassallo et al. (2021) showed that patients infected with the Alpha variant had 3.8-fold higher risk of death or transfer to the intensive care unit (ICU) compared to those who became infected with the original wild strain (Vassallo et al., 2021) . Further studies should be Cluster I was the largest of the three (n=96) and contained the Wuhan reference strain, indicating that the wild type was still dominant in the study area during the second surge of the pandemic, reflecting a very recent population expansion. The genetic variation seen in SARS-CoV-2 is low as evidenced by the constant ratio of the number of genomes-to-number of haplotypes (h:n) across clusters and equal number of mutations (Eta). Moreover, the low nucleotide diversity (π) with concomitantly high values of haplotype diversity (Hd) is another indication of low genetic variation among the studied SARS-CoV-2 genome sequences. The Tajima's D values did not significantly depart from neutrality (Tajima's test, neutrality model prediction, P=0.02) in any of the three clusters, which supports the low genetic variation across the genomes studied (Table 1) . Another core proof of the low genetic diversity among the three clusters is the low pairwise genetic distance (Fst) between the three clusters ranging from 0.12 to 0.14 (<0.25) and the high gene flow and population migration between the clusters (Nm>3). The high gene flow is expected to increase homogeneity between clusters and reduce genetic differentiation (Slatkin, 1987) . The most plausible explanation is the recent expansion of SARS-CoV-2, which did not allow any population differentiation within this relatively short period of time. Similarly, the high mobility of human hosts between clusters in a short period of time owing to extensive international travel allowed rapid exchange of genetic material leading to homogeneity, albeit less likely (Bolnick and Nosil, 2007; Slatkin, 1987) . These results of high homogeneity at the nucleotide level are in complete congruence with other studies in the USA and China, reaching a homogeneity level of 99.99 to 100% (Kaushal et al., 2020; Wang et al., 2020b) . The other genetic differentiation parameters were almost equally low, supporting homogeneity and low genetic diversity (Table 2) . J o u r n a l P r e -p r o o f The study revealed the prompt introduction of the Alpha variant of the SARS-CoV-2 virus during the second surge of the COVID-19 pandemic, which caused serious sickness and death. Two districts in the Palestinian West Bank had the highest burden of disease, the Nablus District and the Al-Khalil District. At the time of this study, the pandemic was relatively recent and the studied sequences were still homogenous with minimal genetic variation. The study emphasized the importance of WGS surveillance in monitoring SARS-CoV-2 in the community in terms of the spread of the disease and in pinpointing early cases caused by highly transmissible emerging variants. ARTIC nanopore protocol for nCoV2019 novel coronavirus. Nextstrain 2021. Nextstrain 2021. Genomic epidemiology of novel coronavirus -Global subsampling The Galaxy platform for accessible, reproducible and collaborative biomedical analyses Natural selection in populations subject to a migration load New variants of SARS-CoV-2 SARS-CoV-2 Variant Classifications and Definitions An emergent clade of SARS-CoV-2 linked to returned travellers from Iran Confidence Limits on Phylogenies: An Approach Using the Bootstrap A rapid, cost-effective tailed amplicon method for sequencing SARS-CoV-2 Estimation of levels of gene flow from DNA sequence data Mutational Frequencies of SARS-CoV-2 Genome during the Beginning Months of the Outbreak in USA A space-time permutation scan statistic for disease outbreak detection MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms The Rise and Fall of a Local SARS-CoV-2 Variant with the Spike Protein Mutation L452R BNT162b2 vaccination effectively prevents the rapid rise of SARS-CoV-2 variant B.1.1.7 in high-risk populations in Israel Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples Genomic epidemiology of the first epidemic wave of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Palestine A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology DnaSP 6: DNA Sequence Polymorphism Analysis of Large Data Sets The neighbor-joining method: a new method for reconstructing phylogenetic trees Gene flow and the geographic structure of natural populations Evolutionary relationship of DNA sequences in finite populations Prospects for inferring very large phylogenies by using the neighbor-joining method Emergence of a new SARS-CoV-2 variant in the UK Introduction of Brazilian SARS-CoV-2 484K.V2 related variants into the UK Patients Admitted for Variant Alpha COVID-19 Have Poorer Outcomes than Those Infected with the Old Strain The COVID-19 epidemic A novel coronavirus outbreak of global health concern The establishment of reference sequence for SARS-CoV-2 and variation analysis COVID-19 Corona Virus Pandemic The genetical structure of populations