key: cord-0308499-ur7q72dw authors: Lai, E.; Becker, D.; Brzoska, P.; Cassens, T.; Davis-Turak, J.; Diamond, E.; Furtado, M.; Gandhi, M.; Gort, D.; Greninger, A. L.; Hajian, P.; Hayashibara, K.; Kennedy, E. B.; Laurent, M.; Lee, W.; Leonetti, N. A.; Lozach, J.; Lu, J.; Nguyen, J. M.; O'Donovan, K. M. C.; Peck, T.; Radcliffe, G. E.; Ramirez, J. M.; Roychoudhury, P.; Sandoval, E.; Walsh, B.; Weinell, M.; Wesselman, C.; Wesselman, T.; White, S.; Williams, S.; Wong, D.; Yu, Y.; Creager, R. S. title: A method for variant agnostic detection of SARS-CoV-2, rapid monitoring of circulating variants, detection of mutations of biological significance, and early detection of emergent variants such as Omicron date: 2022-01-09 journal: nan DOI: 10.1101/2022.01.08.22268865 sha: 4c3137e9b05041f45e7d30dc64182fb733b4eb8c doc_id: 308499 cord_uid: ur7q72dw The rapid emergence of new SARS-CoV-2 variants raises a number of public health questions including the capability of diagnostic tests to detect new strains, the efficacy of vaccines, and how to map the geographical distribution of variants to better understand patterns of transmission and possible load on healthcare resources. Next-Generation Sequencing (NGS) is the primary method for detecting and tracing the emergence of new variants, but it is expensive, and it can take weeks before sequence data is available in public repositories. Here, we describe a Polymerase Chain Reaction (PCR)-based genotyping approach that is significantly less expensive, accelerates reporting on SARS-CoV-2 variants, and can be implemented in any testing lab performing PCR. Specific Single Nucleotide Polymorphisms (SNPs) and indels are identified that have high positive percent agreement (PPA) and negative percent agreement (NPA) compared to NGS for the major genotypes that circulated in 2021. Using a 48-marker panel, testing on 1,128 retrospective samples yielded a PPA and NPA in the 96.3 to 100% and 99.2 to 100% range, respectively, for the top 10 most prevalent lineages. The effect on PPA and NPA of reducing the number of panel markers was also explored. In addition, with the emergence of Omicron, we also developed an Omicron genotyping panel that distinguishes the Delta and Omicron variants using four (4) highly specific SNPs. Data from testing demonstrates the capability to use the panel to rapidly track the growing prevalence of the Omicron variant in the United States in December 2021. Since the beginning of the COVID-19 pandemic, variants have emerged that have the potential to evade vaccines, cause diagnostic test performance issues, or cause more severe disease. [1] [2] [3] [4] [5] Monitoring and surveillance of the genetic lineages of SARS-CoV-2 positive samples are critical to the timely identification of emerging variants. [6] [7] [8] SARS-CoV-2 genetic lineages in the United States are primarily monitored by Next-Generation Sequencing (NGS) on a random selection of approximately five percent (5%) of SARS-CoV-2 positive samples. 9,10 While extremely accurate for the detection of existing circulating variants, NGS does not allow for the timely identification of emerging variants. A more focused approach, such as genotyping using Single Nucleotide Polymorphism (SNP) assays, offers significant advantages in terms of cost, throughput, and efficient results reporting (1 to 2 days turnaround time as compared to 10 to 14 days for NGS). [11] [12] [13] [14] Additionally, SARS-CoV-2 positive samples with undetermined genotyping variant classification could provide a prescreened sample set for NGS, and potentially allow for early identification of emerging variants. The National Institutes of Health's (NIH's) Rapid Acceleration of Diagnostics (RADx SM ) initiative created a Variant Task Force (VTF) in January 2021 to assess the impact of emerging SARS-CoV-2 variants on in vitro diagnostic testing. 15 In July 2021, the NIH RADx VTF also initiated an effort to develop a SARS-CoV-2 Polymerase Chain Reaction (PCR) assay for variant agnostic detection of SARS-CoV-2, as well as early detection and monitoring of SARS-CoV-2 variants. The aims of this study were to: 1) identify SARS-CoV-2 markers useful for the detection of SARS-CoV-2 positive samples across all variants; 2) develop a panel of SNP markers that can be used to accurately assign lineages to SARS-CoV-2 positive samples; and 3) implement a genotyping approach for the early detection of new and re-emerging variants that signals when markers need to be updated. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org /10.1101 /10. /2022 This paper outlines the results of the performance validation of the initial genotyping assay and associated marker sets. Additionally, this paper describes how this approach was rapidly adapted to develop a targeted panel of four (4) mutations-three (3) for Omicron and one (1) for Delta-for the purpose of identifying Omicron, and how this Omicron genotyping panel was implemented in several diagnostic labs. Future results from this testing, as well as the associated statistics and trends, will be made available on a publicly accessible dashboard. 16 Data analysis for identifying SARS-CoV-2 markers was performed using the Variant Analysis for Diagnostic Monitoring (DxM) system (ROSALIND). Genome sequences and metadata used for the selection of markers in this study were obtained through a Direct Connectivity Agreement for complete daily worldwide downloads from the GISAID EpiCov database. [17] [18] [19] Sequences not tagged with the "is_complete' and sequences with "n_content" of more than 0.05 were excluded. Pairwise whole-genome alignments of all sequences were performed using LASTZ v1.04.02 with NCBI Reference Sequence: NC_045512.2 as the SARS-CoV-2 reference genome. 20, 21 The Bioconductor package for genetic variants, VariantAnnotation v1.20.2, was then used for the translation into amino acids in R v3.3.2, and the identification of amino acid substitutions or frameshifts were used to call a unique mutation incident. 22, 23 Selection of the lineages considered for the marker panel was performed by combining the top 100 most frequent lineages reported worldwide for the 120-day period between May 12, 2021 and September 11, 2021 (data not shown) . 1,200,791 sequences representing 393 lineages were analyzed. The top 10 most unique mutations for each World Health Organization (WHO) label were then identified, and multiple combinations of these unique mutations were evaluated to classify a viral sequence into a WHO label with at least 90% overall accuracy. Additional mutations were added to ensure coverage . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org /10.1101 /10. /2022 Performance. 24 A classifier algorithm was developed to measure the presence, absence, and combination of mutations to accurately assign the WHO label classification. A dedicated system was established to host the classifier algorithm and provide a web application with Application Programmer Interface (API) capabilities for standardized data submission and processing. 25 This system was established on a secure virtual private cloud instance on the Google Cloud Platform (GCP) with the ability to process thousands of specimens per minute. A panel of SNP assays was developed using methods previously described. 27 Primers were selected based on mapping to genome regions with a mutation frequency of less . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org/10.1101/2022.01.08.22268865 doi: medRxiv preprint than one percent (1%), ensuring no major polymorphisms interfere with the primers. Primer sets were designed such that amplicon sizes were below 150 base pairs (bp). Minor groove binder (MGB) probes were designed to achieve optimal discrimination between the two (2) alleles by taking the position, nucleotide composition, melting temperature (Tm), and the type of allele into consideration. The Tm of the primers ranged from 59-62°C and the Tm of the probes ranged from 59-65°C. Viral RNA was extracted using the MagMAX Viral/Pathogen II Nucleic Acid Isolation Kit (Thermo Fisher Scientific). Real-time reverse transcription PCR using the selected panel was performed using the TaqPath™ 1-Step RT-qPCR Master Mix, CG (Thermo Fisher Scientific) on a QuantStudio™ 7 Real-Time PCR System or ProFlex™ 2 x 384-well PCR System (Thermo Fisher Scientific) followed by endpoint data collection using the QuantStudio™ 7 Real-Time PCR System. Data were analyzed using the TaqMan™ Genotyper v1.6 software (Thermo Fisher Scientific). Normalized reported emission of (Rn) VIC (x-axis) versus Rn FAM (y-axis) from amplification of the reference and mutant alleles was used by the software algorithm to obtain genotype calls. The specific assays for each of the markers are available commercially. 28 The SARS-CoV-2 sequencing and consensus sequence generation methods used in this study are previously described. 29, 30 Results The complete genotyping assay panel consisted of 45 lineage specific markers and three (3) positivity markers (Table 1) . . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org/10.1101/2022.01.08.22268865 doi: medRxiv preprint Table 1 . 48-, 24-, 16-, 12-, and eight (8)-marker set configurations . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org/10.1101/2022.01.08.22268865 doi: medRxiv preprint The three (3) variant agnostic markers selected to detect SARS-CoV-2 positivity were as follows: 1) the S Gene: D614G (S:A23403G) mutation-a nonsynonymous mutation resulting in the replacement of aspartic acid with glycine at position 614 of the viral spike protein; 2) a conserved sequence in nsp10 (nucleotides 13025-13441); and 3) a conserved sequence identified by the CDC in the N Gene SC2 region (nucleotides 29461-29482). 31 A total of 1,128 retrospective samples (1,031 SARS-CoV-2 positive and 97 SARS-CoV-2 negative) were evaluated using the variant agnostic positivity markers ( Table 2 ). The combined markers were detected in all but seven (7) of the 1,031 SARS-CoV-2 positive samples. The PPA using any combination of two (2) or more markers is greater than or equal to 98.9% with the criteria being that one (1) marker detected is enough to make a positive call. Additionally, the PPA using one (1) marker is greater than or equal to 96%. There were no false positive results (data not shown). The performance of the genotyping assay panel and the associated classifier was determined by in silico and in vitro studies with retrospectively collected SARS-CoV-2 specimens. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org /10.1101 /10. /2022 A bioinformatics simulation was performed using GISAID SARS-CoV-2 sequence data from the first week of each month beginning November 2020 through October 2021. 323,148 GISAID sequences were analyzed. With the 48-marker set, simulated PPA ranged from 80.7% to 99.9% and simulated NPA ranged from 98.1% to 100% for the top 10 WHO lineages. The performance for the Kappa variant was impacted by reporting from Asia and Oceania where many Kappa positive samples were misclassified as Delta. We next determined the clinical performance of the genotyping assay and classifier. The 1,031 SARS-CoV-2 positive samples were genotyped and classified with the 48 markers shown in Table 1 . The classifications were then compared to the Phylogenetic Assignment of Named Global Outbreak Lineages (Pango) lineage assignment based on the whole-genome sequences in the GISAID database (Table 4) . 32 The PPA ranged from 96.3% to 100% and the NPA ranged from 99.2% to 100% for the top 10 WHO lineages. The classifier categorized an additional 78 samples as undetermined (data not shown). Pango assigned 77 of these samples to 14 lineages for which the genotyping assay does not include specific markers (Zeta, B.1, B.1.1.507, B.1.2, B.1.221, B.1.241, B.1.517, B.1.596, B.1.609, B.1.625, B.1.628, B.1.634, B.1.637, and C.36.3) , and did not classify one (1) of these samples. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org/10.1101/2022.01.08.22268865 doi: medRxiv preprint Table 4 To optimize assay performance in terms of sample input, reductions of the 48-marker panel were explored. We assessed the performance of 24-, 16-, 12-, and eight (8)marker sets that were defined based on mutation combination performance and targeted lineage prevalence during the 120-day period between May 12, 2021 and September 11, 2021 ( Table 5 ). Each of the panels also included two (2) of the variant agnostic positivity markers (nsp10 gene and S:D614G), which were used as assay internal controls. The 48-, 24-, and 16-marker sets identified the top 10 most prevalent WHO lineages as of September 11, 2021 (Alpha, Beta, Gamma, Delta, Epsilon, Eta, Iota, Kappa, Lambda, and Mu), while the 12-and eight (8)-marker sets identified eight (8) and six (6) of the top 10 WHO lineages, respectively. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org/10.1101/2022.01.08.22268865 doi: medRxiv preprint Table 5 The identification of samples that could not be determined by the classifier was further investigated. The number of undetermined samples for each marker set are shown in is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org /10.1101 /10. /2022 One of the aims of this study was to develop a genotyping approach for the early detection of emerging variants. An increase in the number of undetermined calls by the classifier provides a signal for focused sequencing of those samples, potentially allowing early detection of new variants. To test this hypothesis, a bioinformatics simulation was performed using a modification of the 12-marker panel. The two (2) Delta-specific markers were removed to simulate what would have been observed is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org /10.1101 /10. /2022 In response to the emergence of the Omicron variant in November 2021 in South Africa, we rapidly developed markers specific for this newly emerging variant. Sequence analysis of the first 132 Omicron sequences revealed three (3) markers-ORF1ab:A2710T, ORF1ab:T13195C, and S:T547K-found in high percentages of these sequences. Based on in silico modeling, there was greater than 99% concurrence between the Pango assignment based on the GISAID sequence and the combined . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org/10.1101/2022.01.08.22268865 doi: medRxiv preprint three (3) markers (data not shown). Subsequently, we developed a genotyping assay consisting of the three (3) Omicron-specific markers and one (1) Delta-specific marker (S:T19R). A total of 1,631 SARS-CoV-2 positive samples were collected and genotyped (Table 6 ). Sequencing confirmed that these samples consisted of 615 Omicron, 992 Delta, and three (3) Omicron and 11 Delta. We next deployed the four (4)-marker panel in two CLIA-certified labs and genotyped 5,372 SARS-CoV-2 positive samples collected mainly in the States of Washington, California, and New York in December 2021. Using the four (4)-marker panel, we determined that the relative prevalence of the Omicron variant grew from approximately 15% on December 9, 2021 to approximately 80% on December 21, 2021, while Delta decreased from 80% to 20% over the same period (Figure 4 ). . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org /10.1101 /10. /2022 that were sequenced and should be considered for the development of new assays. We also demonstrated that there are marker combinations that are highly specific for certain variants. Routine use of these genotyping markers could provide early warning that a . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org /10.1101 /10. /2022 new or re-emergent variant is circulating. Importantly, genotyping with this assay is a quick and efficient method enabling results reporting in 1 to 2 days compared to 10 to 14 days with NGS. In addition, genotyping assays are less expensive than sequencing. As such, genotyping can be used to monitor a higher percentage of SARS-CoV-2 positive samples than the five percent (5%) random sampling by sequencing currently in practice in the United States. Samples that cannot be assigned to a known variant would be prime candidates for sequencing. Finally, we have demonstrated that the Omicron variant can be identified with high precision with two (2) to three (3) markers. Incorporating the Omicron specific markers with the markers defined to detect previous variants can provide a framework for the detection of the next new variant. The genotyping markers for Omicron effectively highlighted the transition from Delta to Omicron as the dominant variant. As we illustrated in this report, a static marker set with an omission of the Delta-specific markers experienced a significant decline in accuracy within four (4) months of the emergence of Delta in the United States. To prevent a loss of marker accuracy, we demonstrated an approach to detect new and emerging variants using a classifier algorithm for the recurring analysis of active marker sets across regional and global GISAID sequence data. As emerging variants develop, anomalies in classifier calls and the resulting discordance with sequencing classification will continue to highlight the need for marker modifications. Discrepancies such as these occurred during the retrospective review at the emergence of Delta in the United States. This approach can be applied on a regional or global scale. In this manner, a classifier algorithm would assess the accuracy of current marker sets based on daily analysis of new viral sequences added to GISAID, creating an adaptive and closed-loop process for low-cost, rapid variant identification with emerging variant detection. Indeed, we recently created a free, live dashboard of a real-time genotyping platform. 16 This illustrates the symbiotic nature of using genotyping markers in conjunction with targeted sequencing. The ultimate goal of these efforts is to provide real-time measurement of variant prevalence with the benefits of increased global awareness and rapid identification of emergent variants. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org/10.1101/2022.01.08.22268865 doi: medRxiv preprint GISAID EpiCoV Database. Freunde von GISAID e.V. Germany 2021 [cited 2021 Dec 28]. Available from: https://www.gisaid.org . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org /10.1101/2022. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org /10.1101 /10. /2022 COVID-19 Virus Emergence in late 2020 of multiple lineages of SARS-CoV-2 Spike protein variants affecting amino acid position 677. medRxiv Evolving Threat Sixteen novel lineages of SARS-CoV-2 in South Africa Transmission of SARS-CoV-2 on mink farms between humans and mink and back to humans A bloody mess': confusion reigns over naming of new COVID variants Genomic epidemiology identifies emergence and rapid transmission of SARS-CoV-2 B.1.1.7 in the United States. medRxiv CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity Detecting SARS-CoV-2 variants with SNP genotyping Track Omicron's spread with molecular data PHE Outbreak Surveillance Team, PHE Epidemiology Cell, PHE Contact Tracing Data Team, PHE Health Protection Data Science Team, PHE Joint Modelling Team, et al. SARS-CoV-2 variants of concern and variants under investigation in England Looming unknowns with SARS-CoV-2 variants RADx Variant Task Force Program for Assessing the Impact of Variants on SARS Molecular and Antigen Tests ROSALIND Tracker for COVID-19 disease and diplomacy: GISAID's innovative contribution to global health GISAID's Role in Pandemic Response Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants Response/Pages/Sequence-Based-Surveillance-Submission CoV-2 Mutation Panel User Guide CoV-2 Mutation Panel. Thermo Fisher Scientific Rapid displacement of SARS-CoV-2 variant B.1.1.7 by B.1.617.2 and P.1 in the United States. medRxiv [Preprint] 2021 Sensitive Recovery of Complete SARS-CoV-2 Genomes from Clinical Samples by Use of Swift Biosciences' SARS-CoV-2 Multiplex Amplicon Sequencing Panel Virology Surveillance and Diagnosis Branch, Influenza Division, National Center for Immunization and Respiratory Diseases The authors gratefully acknowledge the originating laboratories responsible for obtaining the specimens, the submitting laboratories where genetic sequence data were generated and shared via the GISAID Initiative, and the GISAID EpiCov Data Curation Finally, the authors thank Exponent Collaborative for graphic design contributions. Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org /10.1101 /10. /2022 Available from: https://www.cdc.gov/coronavirus/2019-ncov/lab/multiplex-primerprobes.html . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org /10.1101/2022. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 9, 2022. ; https://doi.org /10.1101 /10. /2022