key: cord-1042235-q68qbu6t authors: Badar, Nazish; Ikram, Aamer; Salman, Muhammad; Umair, Massab; Rehman, Zaira; Ahad, Abdul; Mirza, Hamza Ahmed; Alam, Masroor; mehmood, Nayab; Bashir, Uzma title: Genomic characterization of SARS-CoV-2 from Islamabad, Pakistan by Rapid Nanopore sequencing date: 2022-02-18 journal: bioRxiv DOI: 10.1101/2022.02.17.480826 sha: 4ee71b4ea92fe9b07be8b94565a5cd13269aed34 doc_id: 1042235 cord_uid: q68qbu6t Since the start of COVID-19 pandemic, Pakistan has experienced four waves of pandemic. The fourth wave ended in October, 2021 while the fifth wave of pandemic starts in January, 2022. The data regarding the circulating strains after the fourth wave of pandemic from Pakistan is not available. The current study explore the genomic diversity of SARS-CoV-2 after fourth wave and before fifth wave of pandemic through whole genome sequencing. The results showed the circulation of different strains of SARS-CoV-2 during November-December, 2021. We have Omicron BA.1 (n=4), Lineage A (n=2) and delta AY.27 (n=1) variants of SARS-CoV-2 in the population of Islamabad. All the isolates harbors characteristics mutations of omicron and delta variant in the genome. The lineage A isolate harbors a nine amino acid (68-76) and a ten amino acid (679-688) deletion in the genome. The circulation of omicron in the population before the fifth wave of pandemic and subsequent upsurges of COVID-19 positive cases in Pakistan highlights the importance of genomic surveillance. The emergence of SARS-CoV-2 in late 2019 and its variants have been responsible for more than 5.6 million deaths globally [1] . Over the period of two years the virus has incorporated many changes in its genomes. The first dominating mutation identified in the original Wuhan-Hu-1 virus was D614G that makes the virus more transmissible by increasing the binding affinity of spike protein with ACE2 receptor [2, 3] . During the late half of 2020 the virus has incorporated many new mutations in the genome leading to the emergence of various variants of concern (alpha, beta, gamma, and delta) and variants of interest. The emergence of SARS-CoV-2 variants has maintained the global transmissibility of SARS-CoV-2 even with rigorous vaccination drive in various countries [4] . Recently, in November 2021, a variant named Omicron emerged in South Africa and rapidly spread in more than 100 countries [5] . The omicron variant has caused successive epidemics of infection and reinfection of SARS-CoV-2 in countries with significantly vaccination coverage. Pakistan has also been affected by COVID-19 pandemic and witnessed upsurges/waves of COVID-19 cases. The first cases of COVID-19 was reported in February, 2020. After wards four epidemic waves have struck in February, 2020, October, 2020, March, 2021 and July, 2021 respectively. Till November, 2021 the COVID-19 positivity rate in Pakistan has declined to 0.5% [6] . The continued genomic surveillance of circulating lineages of SARS-CoV-2 within population is crucial in order to forsee the emergence of new variants. The genomic surveillance has revealed the dominating lineages of SARS-CoV-2 during each epidemic wave but the data regarding the circulating lineages after the fourth wave of pandemic is not available. Hence, the current study has been conducted to understand what strains are circulating in Pakistan that were responsible for flattening the pandemic curve in Pakistan. During the months of November and December 2021, oropharyngeal swab samples were collected from 3500 suspected COVID-19 patients at National Institute of Health. The National Institute of Health's Internal Review Board accepted the study design, and the datasets were anonymised and free of personally identifiable information. Viral RNA was isolated from the samples using MagMAX™ Viral/Pathogen Nucleic Acid Isolation Kit and KingFisher TM Flex Purification System (ThermoFisher Scientific, US). The TaqPathTM COVID-19 CE-IVD RT-PCR kit (ThermoFisher Scientific, Waltham, US) was used to identify the presence of SARS-CoV-2, which targets three genes (ORF1ab, N, and S). The extracted RNA from SARS-CoV-2 positive samples (ct value < 30) were reverse transcribed with Luna script cDNA master mix and utilized as primary input for overlapping tiling PCR reactions spanning the viral genome with New England Biolabs Q5 High-Fidelity 2 Master Mix (M0492L) (primers provided in Supplementary Table S1 ). The ARTIC Network amplicon sequencing procedure v2 and the v3 primer pools were used to create amplicon pools (Quick, 2020). PCR product pools were quantified using a Promega Quantus fluorometer after purification. The ligation sequencing kit from ONT was used to prepare the libraries (SQK-LSK109). The native barcoding expansion 96 kit was used for multiplexing (EXP-NBD114). DNA repair and end-prep were conducted with NEBNext Ultra II End-Repair/dA-tailing (New England Biolabs) with 1000 fmol of input cDNA and incubation periods were raised to 30 minutes at 20°C, The BaseStack software platform [7] was used to create consensus genomes using a modified ARTIC network pipeline v1.0.0. Read length filtering and reference alignment to the Wuhan-Hu-1 genome were used to test variant polishing in Nanopolish v0.13.2, Medaka v0.11.5, and samtools 1.9. (GenBank accession number MN908947.3). The lineage assignment was done through PANGOLIN v2.1.7 [8] . The mutation profile analysis was performed through Nextstrain [9] . For phylogenetic tree NCBI BLAST search of the study isolates were performed in order to get closely related sequence of SARS-CoV-2. Additionally sequences from neighboring countries were also included in the analysis. For multiple sequence alignment MAFFT software was used. For substitution model prediction jModelTest was used, Maximum Likelihood (ML) phylogenetic tree was build using IQtree (http://www.iqtree.org/). Tree was rooted with reference, Wuhan SARS-CoV-2 (hCoV-19/Wuhan/IME-WH01/2019). The tree was edited and visualized using Figtree software (http://tree.bio.ed.ac.uk/software/figtree/). Between November 1 and December 31, 2021, a total of 1500 samples were tested positive for the presence of SARS-CoV-2 at the National Institute of Health's virology department using the TaqPathTM Real-time RT-PCR kit (ThermoFisher Scientific, Waltham, US). Representative samples (n= 10) with Ct values <30 were selected for Nano pore sequencing during the study period. Due to a failed QC following the cDNA enrichment phase, 03 samples were not processed further. All the seven study participants were belonged to Islamabad. The selected subjects ranged in age group from 3-35 years old with median age of 27 years. The male to female ratio was 3:4. Phylogenetic analysis of 7 full length SARS-CoV-2 genomes from this study and 101 global isolates was conducted (Fig 1) . Of the 4 genomes sequenced of this study, make one clustered with Omicron sequences from USA and closely related with Scotland, Netherlands, and Ireland. Two genomic sequence of this study make separate clustered with Lineage A sequences reported from Pakistan and having nine amino acid deletion and closely related with Shanghai sequence (Fig 1) . One genomes sequence of this study, make one clustered with delta sequences from USA. These seven sequence strains were all from Islamabad (S1 Table) , suggesting that the viral strains circulating in the city were predominantly Lineage A, Delta and Omicron. The mean pairwise genetic distance between our sequences; sequences previously deposited from Pakistan, and sequences from USA, China, and Netherlands was found to be 0.00, indicating phylogenetic relatedness between the genomes. The SARS-CoV-2 lineage diversity have been investigated in the current study. The study have shown the Delta (n=01) and Omicron (n=04) variant of SARS-CoV-2. Other than the omicron and delta variant, the study has also shown two lineage A isolates. Table 1 Interestingly, the one lineage A isolate harbors a nine amino acid deletion in the spike protein spanning 68-76 amino acid region. The other lineage A isolate instead of having a nine amino acid deletion at amino terminal, another ten amino acid deletion has been observed at carboxy terminal spanning 679-688 amino acid region. These two isolates have only one substitution (L84S) in the ORF8 while all the other proteins are conserved. During the first year of COVID-19 pandemic in the world, the SARS-Cov-2 constituted few mutations with the circulating viruses being closely related to Wuhan-HU-1 strain. The D614G substitution was the first mutation found in the SARS-CoV-2 with worldwide prevalence and now the D614G is one of the commonest mutation found in all the circulating lineages and sublineages of SARS-CoV-2 [4] . This mutation was found to be associated with increased transmissibility and binding affinity of virus with ACE2 compared to D614 viral strains. The dynamics of pandemic has changed globally after September, 2020 with the emergence of new viral strains carrying large number of mutations in the genome. These new variants has been characterized as alpha, beta, gamma, and delta. Despite these major lineages of the SARS-CoV-2, there are many other lineages and sub-lineages of virus has emerged that changed the global pattern of pandemic. These variants gives the virus a survival advantage by increasing transmissibility, infectivity and immune escape from neutralizing antibodies [10, 11] . Although restrictive measures on limiting international travel and local lockdown on key hotspot areas have been in practice from time to time, the variants still find their way of emergence [12] . Similar scenario was observed in Pakistan, where the first case of SARS-CoV-2 was detected from a traveler. Like all parts of the World, the Pakistan has also been hit by the pandemic in 2020 with Wuhan-HU-1-like strains of SARS-CoV-2. The first and second pandemic wave in Pakistan has been led by lineages of SARS-CoV-2 of the Wuhan-HU-1 strain. While the third wave of pandemic was led by different variants of SARS-CoV-2. Among these, alpha, beta and delta was found to be the dominant one. After the third wave, the alpha and beta strains were not apparent and fourth wave in Pakistan was taken over by predominantly delta variant [13] . France. Omicron multiplies 70 times more in the lung airways as compared to delta but not affect deep into the lungs [14] . Hence, it is less severe than delta. In To conclude, the SARS CoV2 spread pattern has been no different with worldwide comparison, with a first and second wave of pandemic being dominated by WUHAN-HU-1 like strains while the third and fourth wave was led by different variants of concern that lead to increase in number of infections and deaths. The emergence of new variants of SARS-CoV-2 demands for increasing the genomic surveillance in country in order to track the emergence and spread of new variants of SARS-CoV-2. The sequences generated in the study were submitted to GISAID (https://www.gisaid.org/) with accession IDs: EPI_ISL_8038546, EPI_ISL_8038547, EPI_ISL_8038550, EPI_ISL_8767205, EPI_ISL_8767206, EPI_ISL_8767207, EPI_ISL_8767208. Resources (https://www.ncbi.nlm.nih.gov/sars-cov-2/) were subjected to Multiple Sequence Alignment (MSA) using MAFTT online server. The MSA was subsequently used to generate a Maximum Likelihood (ML) phylogenetic tree using IQtree (http://www.iqtree.org/). For reference, Wuhan SARS-CoV-2 (hCoV-19/Wuhan/IME-WH01/2019) sequence was used. The tree was edited and visualized using Figtree software (http://tree.bio.ed.ac.uk/software/figtree/). , S2083-, L2084I, A2710T, T3255I, P3395H, L3674-, S3675-, G3676-, I3758V ORF1b: P314L, I1566V A67V, H69-, V70-, T95I, G142-, V143-, Y144-, Y145D, N211-, L212I, G339D, S371L, S373P, S375F, K417N, N440K, G446S , S2083-, L2084I, A2710T, T3255I, P3395H, L3674-, S3675-, G3676-, I3758V ORF1b: P314L, I1566V H69-, V70-, T95I, G142-, V143-, Y144-, Y145D, N211-, L212I, G339D, S371L, S373P, S375F, K417N, N440K,G446S, S477N, T478K, E484A,Q493R, G496S,Q498R, N501Y,Y505H, T547K,D614G, H655Y,N679K, P681H,N764K, D796Y,N856K, Q954H,N969K, L981F Q19E A63T P13L, E31-, R32-, S33-, R203K, G204R P10S, E27-, N28-, A29-EPI_ISL_8767208 BA.1 ORF1a: K856R, S2083-, L2084I, A2710T, T3255I, P3395H, L3674-, S3675-, G3676-, I3758V ORF1b: P314L, I1566V A67-, I68-, H69V, T95I, G142-, V143-, Y144-, N211-, L212I, G339D, S371L, S373P, S375F, K417N, N440K,G446S, S477N, T478K, E484A,Q493R, G496S,Q498R, N501Y,Y505H, T547K,D614G, H655Y,N679K, P681H,N764K, D796Y,N856K, Q954H,N969K Q19E P13L, E31-, R32-, S33-, R203K, G204R P10S, E27-, N28-, A29-EPI_ISL_8038546 A I68-, H69-, V70-, S71-, G72-, T73-L84S , N74-, G75-, T76-EPI_ISL_8038547 A I68-, H69-, V70-, S71-, G72-, T73-, N74-, G75-, T76-, N679-, Phylogeny of the genus Flavivirus Spike mutation D614G alters SARS-CoV-2 fitness Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus COVID-19 re-infection Classification of Omicron (B.1.1.529): SARS-CoV-2 Variant of Concern. World Health Organization Basestack Platform for Nanopore Sequening Analysis A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Nextstrain: real-time tracking of pathogen evolution Increased mortality in community-tested cases of SARS-CoV-2 lineage B.1.1.7. Nature Risk of mortality in patients infected with SARS-CoV-2 variant of concern 202012/1: matched cohort study SARS-CoV-2 Lineages and Sub-Lineages Circulating Worldwide: A Dynamic Overview Footprints of SARS-CoV-2 genome diversity in Pakistan Loss of furin cleavage site attenuates SARS-CoV-2 pathogenesis