key: cord-0947108-41dk3xyk authors: Guerrero-Preston, R.; Rivera Amill, V.; Caraballo, K.; Arias Garcia, A.; Sanchez Torres, R.; Tadeu Zamuner, F.; Zanettini, C.; MacKay, M. J.; Baits, R.; Beaubier, N.; Khullar, G.; Metti, J.; Pipic, U.; Purcell-Wiltz, A.; Vale, K.; Perez, G.; De Jesus, L.; Miranda, Y.; Ortiz, D.; Garcia Negron, A.; Viera, L.; Ortiz, A.; Acevedo, J.; Romaguera, J.; Jimenez, I.; Marchionni, L.; Rodriguez-Orengo, J.; Baez, A.; Mason, C. E.; Sidransky, D. title: Precision Health Diagnostic and Surveillance Network uses S Gene Target Failure (SGTF) combined with sequencing technologies to identify emerging SARS-CoV-2 variants. date: 2021-05-07 journal: nan DOI: 10.1101/2021.05.04.21256012 sha: 96b00e397ff50fd68020e0381f154fc3fc44b061 doc_id: 947108 cord_uid: 41dk3xyk Several genomic epidemiology tools have been developed to track the public and population health impact of SARS-CoV-2 community spread worldwide. A SARS-CoV-2 Variant of Concern (VOC) B.1.1.7, known as 501Y.V1, which shows increased transmissibility, has rapidly become the dominant VOC in the United States (US). Our objective was to develop an evidenced-based genomic surveillance algorithm that combines RT-PCR and sequencing technologies to identify VOCs. Deidentified data were obtained from 508,969 patients tested for COVID-19 with the TaqPath COVID-19 RT-PCR Combo Kit (ThermoFisher) in four CLIA certified clinical laboratories in Puerto Rico (n=86,639) and in three CLIA certified clinical laboratories in the US (n=422,330). TaqPath data revealed a frequency of S Gene Target Failure (SGTF) >47% for the last week of March 2021, in both Puerto Rico and US laboratories. The monthly frequency of SGTF in Puerto Rico steadily increased exponentially from 4% in November 2020 to 47% in March 2021.The weekly SGTF rate in US samples was high (>8%) from late December to early January, and then also increased exponentially through April (48%). The exponential increase in SGFT prevalence in Puerto Rico is concurrent with a sharp increase in VOCs among all SARS-CoV-2 sequences from Puerto Rico uploaded to GISAID (n=461). B.1.1.7 frequency increased from <1% in the last week of January 2021 to 51.5% of viral sequences from Puerto Rico collected in the last week of March 2021. The exponential increase in SGTF and B.1.1.7 prevalence in Puerto Rico and US requires an urgent response. According to the proposed evidence-based algorithm, approximately 50% of all positive samples should be managed as potential B.1.1.7 carriers with VOC quarantine and contact tracing protocols while their lineage is confirmed by WGS in surveillance laboratories. Patients infected with VOCs should be effectively triaged for isolation, contact tracing and follow-up treatment purposes. The coronavirus disease 2019 (COVID- 19) , an infection caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the first global pandemic of the 21 st century. Several genomic epidemiology tools have been developed to track the public and population health impact of SARS-CoV-2 community spread worldwide [1] [2] [3] [4] [5] . More than 137 million cases and close to 3 million deaths have been reported since the beginning of the pandemic in January 2020 (https://coronavirus.jhu.edu). A SARS-CoV-2 Variant of Concern (VOC), known as 20I/501Y.V1, VOC 202012/01, or B.1.1.7, was detected in the United Kingdom in November 2020 and has now spread to multiple countries worldwide [6] [7] [8] . Genomic epidemiology studies reveal B.1.1.7 possesses many non-synonymous substitutions of biological/immunological significance, in particular Spike mutations HVΔ69-70, N501Y and P681H, as well as ORF8 Q27stop and ORF7a [7, 9, 10] . B.1.1.7 shows increased transmissibility and has rapidly become the dominant VOC in the United States (US) (https://covid.cdc.gov) [11] [12] [13] [14] . The HVΔ69-70 mutation is a deletion in the SARS-CoV-2 21765-21770 genome region that removes Spike amino acids 69 and 70. The HVΔ69-70 causes target failure in the TaqPath COVID-19 RT-PCR Combo Kit (ThermoFisher) assay, catalog number A47814 (TaqPath) [15] . TaqPath is designed to co-amplify sections of three SARS-CoV-2 viral genes: Nucleocapsid (N); Open Reading Frame 1ab (ORF1ab); and Spike (S) [16] . The Spike HV∆69/70 deletion prevents the oligonucleotide probe from binding its target sequence, leading to what has been termed S gene dropout or S gene target failure (SGTF) [6] . SGTF is associated with significantly higher viral loads in samples tested by TaqPath [16] . S gene target late amplification (SGTL) has also been observed in a subset of samples having Cycle threshold [17] values for S gene >5 units higher than the maximum Ct value obtained for the other two assay targets: N and ORF1ab. The US and countries where B.1.1.7 rapidly became the dominant SARS-CoV-2 variant require immediate and decisive action to minimize COVID-19 morbidity and mortality [14, 18] . However, the US does not have a national genomic epidemiology surveillance network for COVID-19 whole genome sequencing (WGS) . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 7, 2021. ; https://doi.org/10.1101/2021.05.04.21256012 doi: medRxiv preprint program in place. Therefore, only a small fraction of all new cases is being sequenced ad-hoc. SGTF has been shown to correlate with the Δ69-70 mutation highly. Evidently, SGTF can be used as a proxy to monitor SARS-CoV-2 lineage prevalence and geo-temporal distribution and may be near-direct measure of B.1.1.7 [15, 19] . In an urgent response to the SARS-CoV-2 global pandemic, a consortium of researchers and scientists working in academia, industry, and clinical laboratories implemented a Precision Health Diagnostic and Surveillance Network (PHx) in March 2020. PHx's original objective was to augment SARS-CoV-2 molecular testing capacity and implement a genomic surveillance network in Baltimore, New York and Puerto Rico [20] . The present work describes the development of an evidenced-based genomic surveillance algorithm that combines RT-PCR and sequencing technologies to identify VOCs. The PRECEDE/PROCEED Model (PPM) [21] was selected to provide the evaluation framework for PHx conceptualization and implementation (Supplementary Figure 1) . Weekly remote meetings began in March 2020 to perform Social, Epidemiological, Educational, Behavioral and Environmental assessments in New York, Puerto Rico and Baltimore, using the NIH I-Corps Program framework [22] . School of Medicine faculty from the University of Puerto Rico in San Juan, Johns Hopkins University in Baltimore and Weill Cornell in New York City were involved in the conceptualization and implementation of PHx. The Center for Puerto Rican Studies of Hunter College led the New York initiative. The Puerto Rico initiative was led by the Puerto Rico Public Health Trust (PRPHT) and the Baltimore initiative was led by LifeGene-Biomarks, which also coordinated the PHx consortium. A team of investigators from the Medical Sciences Campus of the University of Puerto Rico (MSC), Johns Hopkins University School of Medicine, and LifeGene-Biomarks obtained IRB approval (IRB2770120) for a . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 7, 2021. ; https://doi.org/10.1101/2021.05.04.21256012 doi: medRxiv preprint Real-time PCR data were analyzed, interpreted and exported as .csv files using Applied Biosystems COVID-19 Interpretive Software (version 1.3). Ct values from pooled samples were removed from the data set before the analysis. Scatter plots and boxplots were prepared to visualize Ct values data. Data were summarized and correlation analyses were performed. R (version 4.0.3) was used for biostatistics analyses and data visualization. Secondary data analysis was performed on data downloaded from GISAID. TaqPath data from close to 508,969 patients revealed a frequency of SGTF 47% for the last week of March 2021 in both Puerto Rico and US laboratories. The overall frequency of SGTL (15.1%), SGTF (9.2%), and SGTF with N and ORF1ab Cts <28 (2.5%) in Puerto Rico was high from March 2020 through March 2021. SGTF steadily increased exponentially from 4% in November 2020 to 47% in March 2021 in Puerto Rico . The average weekly SGTF rate in US samples was high (>8%) from late December to early January, and then also increased exponentially through April (48%) (Figure 1) . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 7, 2021. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 7, 2021. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Genomic epidemiology tools that can quickly identify and track in real-time COVID-19 VOCs improve our understanding of the transmissibility, pathogenicity, morbidity and mortality of each variant detected in geographically defined populations [24, 25] . This approach will enable the deployment of targeted, evidencebased strategies, to quickly screen for COVID-19 VOCs and identify clusters, leading to a decrease in the spread of community transmission. PHx, a critical consortium of researchers and scientists working in academia, industry, and clinical laboratories developed an evidence-based method to screen SARS-CoV-2 positive samples for COVID-19 VOCs. The PPM was a highly effective framework to guide the conceptualization and implementation of PHx. Evaluation frameworks, such as PPM, can improve the understanding of the relationship between complex variables such as community attitudes, knowledge, and . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 7, 2021. ; https://doi.org/10.1101/2021.05.04.21256012 doi: medRxiv preprint screening test utilization and implementation, which determine the uptake of any screening intervention [21, 26] . Given the complexity of behavioral change processes during a global pandemic such as COVID-19, predisposing factors and barriers identified during the implementation of PHx can guide SARS-CoV-2 public policy and funding decision-making. Lessons learned from PHx can inform the urgent deployment of precision health clinical and surveillance networks. During this one year-long period the most important lesson learned is that the business model of clinical laboratories, which operate with very small profit margins, does not have much leeway for collaborative clinical or research efforts. The main limitations of this study are the lack of established workflows, public policy guidelines and funding streams for the implementation and administration of a genomic surveillance network. The convenience samples and data used for this report was gathered ad-hoc by academic institutions, public and private organizations, as well as state and federal agencies. Data integrity, uniformity and reliability are thus compromised, and should be treated as such. Uniform sample handling and management workflows, needed to assure data reproducibility, are not in place. For example, clinical laboratories discard their samples after diagnosis, which for COVID-19 EUA approved tests, are qualitative decisions based on proprietary algorithms designed by test manufacturers. These closed PCR tests do not require Ct interpretation, nor molecular biology expertise either from the user. In addition, WGS is just entering the clinical and regulatory setting. Therefore, clinical laboratory scientists and Department of Health staff are not usually trained to sequence samples, analyze WGS data, or develop genomic surveillance programs based on RT-PCR or WGS data. The combination of these complex factors, buttressed by sample and data management asymmetry between clinical and sequencing laboratories, as well as state and federal agencies, introduce barriers to sample and data workflows, eventually impinging on results interpretation. Our results suggest that a genomic surveillance network plays a critical role during the current stage of the COVID-19 pandemic. Patients infected with VOCs should be secured into quarantine immediately and a VOC contact tracing effort should be forcefully implemented to curtail community spread of VOCs. The evidence-. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 7, 2021. ; https://doi.org/10.1101/2021.05.04.21256012 doi: medRxiv preprint based Molecular Epidemiology and Genomic Surveillance algorithm developed by PHx can quickly identify emerging VOCs as a valuable tool for identifying individual carriers of highly infectious variants, who can then be effectively triaged for isolation, contact tracing and treatment purposes. Further information and requests should be directed to Rafael Guerrero-Preston (rguerrero@lifegenedna.com) Genomic data are available on GISAID (see Supplemental S3 for accession numbers). Coast-to-Coast Spread of SARS-CoV-2 during the Early Epidemic in the United States Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization An interactive web-based dashboard to track COVID-19 in real time Nextstrain: real-time tracking of pathogen evolution Enterovirus D68 outbreak detection through a syndromic disease epidemiology network Tracking SARS-CoV-2 VOC 202012/01 (lineage B.1.1.7) dissemination in Portugal: insights from nationwide RT-PCR Spike gene drop out data Investigation of novel SARS-COV-2 variant: Variant of Concern 202012/01: Technical briefing document on novel SARS-CoV-2 variant Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: Insights from linking epidemiological and genetic data. medRxiv Recurrent emergence and transmission of a SARS-CoV-2 spike deletion H69/V70. bioRxiv SARS-CoV-2 genomic surveillance identifies naturally occurring truncations of ORF7a that limit immune suppression. medRxiv Emergence and rapid transmission of SARS-CoV-2 B.1.1.7 in the United States Travel from the United Kingdom to the United States by a Symptomatic Patient Infected with the SARS-CoV-2 B.1.1.7 Variant -Texas Genomic epidemiology identifies emergence and rapid transmission of SARS-CoV-2 B.1.1.7 in the United States. medRxiv Emergence of SARS-CoV-2 B.1.1.7 Lineage -United States Early introductions and community transmission of SARS-CoV-2 variant B.1.1.7 in the United States S-variant SARS-CoV-2 is associated with significantly higher viral loads in samples tested by The Sequence Alignment/Map format and SAMtools Genomic epidemiology identifies emergence and rapid transmission of SARS-CoV-2 B.1.1.7 in the United States Tracking SARS-CoV-2 lineage B.1.1.7 dissemination: insights from nationwide spike gene target failure (SGTF) and spike gene late detection (SGTL) data, Portugal, week 49 2020 to week 3 2021 Genomic surveillance of SARS-CoV-2 in the Bronx enables clinical and epidemiological inference. medRxiv The PRECEDE-PROCEED model as a tool in Public Health screening: a systematic review I-Corps at NIH: Entrepreneurial Training Program Creating Successful Small Businesses SARS-CoV-2 Entry Related Viral and Host Genetic Variations: Implications on COVID-19 Severity, Immune Escape, and Infectivity Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity Genomic epidemiology reveals multiple introductions of SARS-CoV-2 from mainland Europe into Scotland Rapid SARS-CoV-2 whole-genome sequencing and analysis for informed public health decision-making in the Netherlands