key: cord-0979181-q9chzf6i authors: Sayan, Murat; Arikan, Ayse; Isbilen, Murat title: Variant analysis of SARS-CoV-2 strains with phylogenetic analysis and the Coronavirus Antiviral and Resistance Database date: 2022-01-11 journal: Journal of comparative effectiveness research DOI: 10.2217/cer-2021-0208 sha: 10f06a391a954b8f6f0925bc2549ced2c6584e77 doc_id: 979181 cord_uid: q9chzf6i Aims: This study determined SARS-CoV-2 variations by phylogenetic and virtual phenotyping analyses. Materials & methods: Strains isolated from 143 COVID-19 cases in Turkey in April 2021 were assessed. Illumina NexteraXT library preparation kits were processed for next-generation ]sequencing. Phylogenetic (neighbor-joining method) and virtual phenotyping analyses (Coronavirus Antiviral and Resistance Database [CoV-RDB] by Stanford University) were used for variant analysis. Results: B.1.1.7–1/2 (n = 103, 72%), B.1.351 (n = 5, 3%) and B.1.525 (n = 1, 1%) were identified among 109 SARS-CoV-2 variations by phylogenetic analysis and B.1.1.7 (n = 95, 66%), B.1.351 (n = 5, 4%), B.1.617 (n = 4, 3%), B.1.525 (n = 2, 1.4%), B.1.526-1 (n = 1, 0.6%) and missense mutations (n = 15, 10%) were reported by CoV-RDB. The two methods were 85% compatible and B.1.1.7 (alpha) was the most frequent SARS-CoV-2 variation in Turkey in April 2021. Conclusion: The Stanford CoV-RDB analysis method appears useful for SARS-CoV-2 lineage surveillance. Since its first emergence in December 2019, severe acute respiratory syndrome coronavirus (SARS-CoV-2), the causative agent of the new type of coronavirus disease 2019 (COVID- 19) , has had many genetic variations due to its higher mutation rates during replication. Most of these changes are not detrimental and therefore do not contribute to viral evolution [1] . These low effect or no effect changes, which are called silent amino acid changes, do not alter the basic structure and characteristics of the virus, while changes in the structural and nonstructural proteins of SARS-CoV-2 affect the viral antigenic phenotype and confer a fitness advantage. Consequently, emerging variants of SARS-CoV-2 may increase the rate of virus transmission, leading to hospitalizations and increased mortality rates in all age groups [1] . Therefore, for precise management of the ongoing COVID-19 pandemic, SARS-CoV-2 variations should be monitored. The WHO classifies SARS-CoV-2 variants according to their genetic characteristics associated with transmissibility, increased virulence and ability to escape current diagnostic methods, vaccines and therapeutics. as variants reduce the neutralizing activity of certain monoclonal antibodies and polyclonal antibodies found in the sera of people recovering from infection [2, 3] . While there are four different variants defined as alpha (501Y.V1/ B.1.1.7), beta (501Y.V2/ B.1.351), gamma (501Y.V3/P.1) and delta (lineage B.1.617) in the variant of concern (VOC) category, eta, iota, kappa and lambda have been designated as SARS-CoV-2 variant of interest (VOI) variants [2] . Detrimental variants of SARS-CoV-2 are largely caused by mutations in the spike glycoprotein, which mediates cell attachment and is the main target of neutralizing antibodies [3, 4] . These variants continue to spread globally posing a major public health threat worldwide. As of August 17th, 2021, cases of alpha, beta, gamma and delta have been reported in 190 countries, 138 countries, 82 countries and 148 countries, respectively [5] . The more opportunity a virus has to spread, the more it will evolve. Therefore, early detection of new cases and monitoring the SARS-CoV-2 genomic sequencing for variations is significant to predict the dominant virus circulating within the population, monitor how SARS-CoV-2 changes over time into new variations that might impact health and update the geographic distribution of variants [6, 7] . While SARS-CoV-2 can be detected either by detection of viral nucleic acid, mainly by reverse transcriptase real-time polymerase chain reaction assay (RT-qPCR), or detection of the presence of viral antigen or antibodies against these antigens [8] , these tests cannot discriminate variants. Currently, PCR-based variant screening diagnostic assays are widely used in routine diagnostic settings for tracking these variants; however, gene analysis of whole or partial spike sequencing is the most accurate approach to identify variants associated with a specific trait or population [9] . Comprehensive analysis by next-generation sequencing (NGS) and bioinformatics for the ongoing genomic surveillance of SARS-CoV-2 enables the monitoring of viral spread, evolution and variation patterns worldwide in the fight against COVID-19 [10] [11] [12] . Phylogenetic analysis is widely viewed as the gold standard in genomic epidemiology [13] [14] [15] . However, with the rapid design of new virtual phenotyping technologies, identification of SARS-CoV-2 mutations can be achieved in a short time and at a low cost. Of these, the Coronavirus Antiviral and Resistance Database (CoV-RDB) by Stanford University that is freely accessible [16] , has been designed to promote the comparisons between different candidate compounds against COVID-19, as well as rapid large-scale identification of SARS-CoV-2 mutations, since August 2020 [17] . CoV-RDB explores nucleotide sequences utilizing predetermined consensus SARS-CoV-2 sequences. When performing analysis with CoV-RDB, according to instructions from the database, it is recommended to input the sequences as plain text if only one sequence is analyzed and use the FASTA format if more than one sequence is submitted. The upper limit is currently given as 100 sequences containing ∼30,000 nucleotides per sequence by CoV-RDB. Although CoV-RDB is currently available for clinical diagnosis, its variant diagnostic performance has not been well assessed. The objectives of this study were to reveal the genomic characterization of SARS-CoV-2 by NGS in Turkish patients infected with COVID-19 and identify nucleotide variations by phylogenetic analysis and CoV-RDB virtual phenotyping. The ethical approval of this study was received from the Near East University Scientific Research Ethics Committee (decision number: 1383 NEU/2021/93). In total, 143 SARS-CoV-2 strains isolated from SARS-CoV-2 infected cases in Kocaeli, Istanbul and Ankara in Turkey, at the beginning of April 2021, were included in the study. These strains were included in the study because they were screened with PCR variant screening kits and distinguished as probable SARS-CoV-2 variants. SARS-CoV-2 real-time polymerase chain reaction A fully automatic rotary nucleic acid magnetic particle extraction system, the Auto Extractor GeneRotex96 (Tianlong Science and Technology Co. Xi'an City, China) was used for SARS-CoV-2 RNA isolation from the nasal/oropharyngeal swab samples. In SARS-CoV-2 diagnosis, a routine RT-qPCR kit that targets double gene (BioSpeedy, Bioeksen Inc, Istanbul, Turkey) was used that is officially preferred by the Ministry of Health in pandemic conditions. Two variant-specific screening PCR kits (BioSpeedy SARS-CoV-2 N501Y/variant plus kit, Bioeksen Inc.,İstanbul, Turkey and Diagnovital SARS-CoV-2 N501Y, delHV 69-70, E484K mutation detection kit, RTA Laboratories Inc., Istanbul, Turkey) were used in this study. Consensus positive strains on the variant PCR screening kits were chosen for NGS. SARS-CoV-2 spike next-generation sequencing polymerase chain reaction SARS-CoV-2 real-time PCR products were purified using a NucleoFast 96 PCR kit (Macherey-Nagel GmbH, Dueren, Germany) and quantitated in spectrophotometry (Nanodrop N1000, Thermo Fisher Inc., MA, USA). The nucleic acid concentration was 0.2 ng/ul in the sample. Standardized samples were processed by Nexter-aXT (Illumina Inc, CA, USA) for NGS. According to the SARS-CoV-2 Wuhan Hu-1 isolate (MN908947.3 GenBank accession number), the spike glycoprotein receptor binding domain between 21709-23193 bps was targeted. Between 118F-1652R primers zone (∼1500 bp) was sequenced. The sequence primer pairs were R: 5acacctgtgcctgttaaacca-3 and F: 5 -gacaaagttttcagatcctcagttttaca-3 [18] . NGS was carried out on the Miseq (Illumina Inc, CA, USA) platform. The spike NGS PCR amplification protocol was executed in the following conditions: 45 • C for 10 min, 95 • C for 2 min, then for 40 cycles; 95 • C for 10 s, 57 • C for 30 s, and 72 • C for 30 s. Alignment of the resulting sequences was performed with Miseq Reporter based on BWA software [19] . The analysis of the sequenced data was fitted to the reference genome with BWA software, then analyzed with BaseRecalibrator and ApplyBQSR programs recommended by the Genome Analysis Tool Kit (GATK; Broad Institute, Inc. MA, USA; open source under a BSD 3-clause "New or Revised" license) and refitted according to base-read quality. Variant calling was performed with the Haplotype Caller program and variants with mapping quality below 50, a reading depth below 15 and a variant quality (QUAL) below 500 were eliminated from the analysis with the Variant Filtration program. The sequences of the samples for this region were created by modifying the mutations detected in the reference genome. The neighbor-joining Kimura 80 distance method was performed with other sequences from all SARS-CoV-2 variants from the GeneBank database by using CLC sequence viewer 8.0 software (Qiagen, CLC bio A/S, Aarhus, Denmark). Bootstrap support values were chosen from 1000 replicates in phylogenetic tree construction. Because of numerous samples, the phylogenetic tree has been constructed as circular and rooted. The consensus reference sequence of SARS-CoV-2, MN908947.3, SARS-CoV-2 Wuhan-Hu-1, was used in this study and is available from the GenBank database [20] . CoV-RDB/SARS-CoV-2 Mutations Analysis by Stanford University [21] was used to explore the nucleotide sequences of the SARS-CoV-2 strains with the consensus SARS-CoV-2 reference sequence and identify SARS-CoV-2 mutations of the spike gene. The obtained SARS-CoV-2 variants/lineages were designated according to the WHO categorization and Centers for Disease Control and Prevention (CDC) SARS-CoV-2 Variant Classification and Definitions [22] . One hundred and forty-three spike gene sequences were included in the study. The sequenced data were analyzed for variations using phylogenetic analysis and virtual phenotyping. Phylogenetic analysis can reveal detailed genomic characterization and evolutionary development of organisms. As the most accurate gene tree rooting method, the SARS-CoV-2 variations obtained using the newly designed CoV-RDB were compared with phylogenetic analysis. Based on the variant classification, 109 (76%) and 122 (85%) SARS-CoV-2 variations were reported by phylogenetic analysis and CoV-RDB, respectively. Of these variations detected by CoV-RDB, n = 15, 10% were missense mutations. While the variations were obtained as lineages by phylogenetic analysis, CoV-RDB provided the mutation patterns and protein substitutions in addition to the lineages. Figure 1 illustrates different lineages obtained by the neighbor-joining method and Table 1 ( 144) were considered missense mutations, as they involve different amino acid changes for which the impact has not been well identified. The distribution of SARS-CoV-2 variations as lineages and amino acid mutations identified by phylogenetic analysis and using the CoV-RDB is given in Table 1 . When the variations obtained by CoV-RDB were compared with the variations obtained by phylogenetic analysis, a similarity rate of 121 (85%) was observed in the genome analysis of the two variant detection methods. The highest similarity was observed in the identification of B.1.1.351 (100%), followed by B.1.1.7 (92%), by the two methods. Similarity rates of SARS-CoV-2 variations by phylogenetic analysis and CoV-RDB are given in Table 2 . Consequently, B.1.1.7 (alpha) was the most frequent SARS-CoC-2 variation in Turkey in April 2021. Continuous description of the genomic characterization of SARS-CoV-2 followed by variant analysis with powerful online tools is crucial, as it provides important information on changes in COVID-19 epidemiology, clinical disease outcomes and efficiency of diagnostics, vaccines and therapeutics, due to viral genome diversity [23] . In the current study, we sequenced the spike gene of SARS-CoV-2 strains of COVID-19-infected cases in Turkey in April 2021, as the S gene is key for SARS-CoV-2 surveillance to identify nucleotide variations [15, [24] [25] [26] . In SARS-CoV-2 spike genomes, we reported 76% and 85% nucleotide variations by phylogenetic analysis and CoV-RDB analysis, respectively. The genomic findings revealed that although two major VOCs, including B. [29] and by , 148 countries confirmed the presence of the delta variant [30] . Tracking changes in the SARS-CoV-2 spike reveals that SARS-CoV-2 variations should be monitored continuously by genome sequence analysis in Turkey and in other countries. During the pandemic, it is important to identify variants as quickly as possible. In this study, we evaluated the sequenced data for SARS-CoV-2 variations by two different variant detection methods to better understand the diagnostic power of tools commonly used in variant analysis. As there are no data in the literature that reflect this comparison, we evaluated the detection performance of a virtual phenotyping method with the gold standard method, phylogenetic analysis. The findings showed that the two sequence analysis methods were 85% compatible. Interestingly, we reported the highest similarity in the identification of B.1.1.351 (100%), followed by B.1.1.7 (92%) by two methods. The similarity of the results suggests that the CoV-RDB, which provides more rapid sequence exploring, may also be an alternative appropriate approach in determining SARS-CoV-2 mutations. Although spike sequencing and analysis are used as the gold standard for accurate genomic surveillance, SARS-CoV-2 PCR variant screening kits were performed before NGS to distinguish particular SARS-CoV-2 variants circulating in Turkey among all SARS-CoV-2 PCR-positive cases. The current findings clarified that 24% and 15% of the strains were identified as wildtype by phylogenetic analysis and CoV-RDB, respectively, although these strains were determined as SARS-CoV-2 variants by multiplex PCR kits. Durner et al. demonstrated the feasibility of Y501 variant-specific PCR for fast and reliable detection of UK SARS-CoV-2 variants in routine diagnosis, and their suspected variant was confirmed by the reference laboratory [31] . Similarly, Zhao et al. provided both the specificity and the sensitivity of the SARS-CoV-2 variants based on multiplex PCR-matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF-MS) at 100% [32] . In another study, the positive and negative predictive values were 100% for RT-qPCR assay for screening the spike N501Y mutation [33] . According to the current findings, variant screening PCR kits could be good alternative choices to detect variant strains for NGS analysis, which enables saving time and cost, especially for developing countries. To point out the limitations of this study, our genomic analysis identifies variations of cases infected with COVID-19 only in the provinces of Istanbul, Kocaeli and Ankara in April 2021. To reveal the genomic variations of SARS-CoV-2 in the whole of Turkey, more cases from many different cities should be included and these cases should be investigated periodically to provide updated surveillance. In the COVID-19 pandemic, variant emergence is possible and may be rapid. Therefore, SARS-CoV-2 strains should be constantly monitored. Phylogenetic analysis and Stanford CoV-RDB analysis methods seem useful for this surveillance. • Genomic characterization of SARS-CoV-2 allows the description of important information on phenotypic characteristics, including disease transmission, disease severity, diagnostic escape and immune escape due to emerging new coronavirus variants. • Next-generation sequencing is widely used for genomic characterization of SARS-CoV-2, followed by variant analysis with phylogenetic analysis. • With the rapid design of new virtual phenotyping technologies, identification of SARS-CoV-2 mutations can also be achieved in a short time and at low cost. • B.1.1.7 (alpha) was the most frequent SARS-CoV-2 variation in Turkey in April 2021. • The Coronavirus Antiviral and Resistance Database (CoV-RDB) by Stanford University that is freely accessible at https://covdb.stanford.edu/, has been designed to promote comparisons between different candidate compounds against COVID-19, as well as rapid large-scale identification of SARS-CoV-2 mutations, since August 2020. • The current findings showed that both sequence analysis methods were 85% compatible. • Phylogenetic analysis and Stanford CoV-RDB analysis methods seem useful for tracking SARS-CoV-2 strains. The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties. No writing assistance was utilized in the production of this manuscript. future science group 10 .2217/cer-2021-0208 Coronavirus disease (COVID-19): virus evolution The World Health Organization (WHO), Tracking SARS-CoV-2 Variants SARS-CoV-2 variants, spike mutations and immune escape Emergence of genomic diversity and recurrent mutations in SARS-CoV-2 The World Health Organisation. COVID-19 Weekly epidemiological update On COVID Spike mutation D614G alters SARS-CoV-2 fitness Emerging variants of sars-CoV-2 and novel therapeutics against coronavirus (COVID-19) Summary of the available molecular methods for detection of SARS-CoV-2 during the ongoing pandemic Methods for the detection and identification of SARS-CoV-2 variants March 3 th Bioinformatics resources for SARS-CoV-2 discovery and surveillance Next-generation sequencing reveals the progression of COVI-19 Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research Whole-genome sequencing of SARS-CoV-2 reveals the detection of G614 variant in Pakistan Phylogenetic analysis of SARS-CoV-2 in Boston highlighting the impact of super-spreading events Detection of a SARS-CoV-2 variant of concern in South Africa CORONAVIRUS ANTIVIRAL & RESISTANCE DATABASE Coronavirus Antiviral Resaerch Database (CoV-RDB): an online database designed to facilitate comparisons between candidate anti-coronavirus compounds 40 minutes RT-QPCR assay for screening spike N501Y and HV69-70del mutations Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information The Center for Disease Prevention and control (CDC) SARS-CoV-2 variant classifications and definitions Analysis of SARS-CoV-2 genomic epidemiology reveals disease transmission coupled to variant emergence and allelic variation The impact of mutations in SARS-CoV-2 spike on viral infectivity and antigenicity Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus The World Health Organization (WHO) Weekly epidemiological update on COVID The World Health Organization (WHO) Weekly epidemiological update on COVID-19 13 The World Health Organization (WHO) Weekly epidemiological update on COVID The World Health Organization (WHO) Weekly epidemiological update on COVID Fast and cost-effective screening for SARS-CoV-2 variants in a routine diagnostic setting A novel strategy for the detection of SARS-CoV-2 variants based on multiplex PCR-MALDI-TOF MS Real-time RT-PCR allelic discrimination assay for detection of N501Y mutation in the 2 spike protein of SARS-CoV-2 associated with variants of concern The authors state that they have obtained appropriate institutional review board approval or have followed the principles outlined in the Declaration of Helsinki for all human or animal experimental investigations. In addition, for investigations involving human subjects, informed consent has been obtained from the participants involved.