key: cord-0972984-xnk21afc
authors: Stromberg, Z. R.; Theiler, J.; Foley, B. T.; Myers y Gutierrez, A.; Hollander, A.; Courtney, S. J.; Deshpande, A.; Martinez-Finley, E. J.; Mitchell, J.; Mukundan, H.; Yusim, K.; Kubicek-Sutherland, J. Z.
title: Fast Evaluation of Viral Emerging Risks (FEVER): A computational tool for biosurveillance, diagnostics, and mutation typing of emerging viral pathogens
date: 2021-05-27
journal: nan
DOI: 10.1101/2021.05.25.21257811
sha: a601fe55aecdd7b01226521b972eeb607c590d22
doc_id: 972984
cord_uid: xnk21afc

Viral pathogen can rapidly evolve, adapt to novel hosts and evade human immunity. The early detection of emerging viral pathogens through biosurveillance coupled with rapid and accurate diagnostics are required to mitigate global pandemics. However, RNA viruses can mutate rapidly, hampering biosurveillance and diagnostic efforts. Here, we present a novel computational approach called FEVER (Fast Evaluation of Viral Emerging Risks) to design assays that simultaneously accomplish: 1) broad-coverage biosurveillance of an entire class of viruses, 2) accurate diagnosis of an outbreak strain, and 3) mutation typing to detect variants of public health importance. We demonstrate the application of FEVER to generate assays to simultaneously 1) detect sarbecoviruses for biosurveillance; 2) diagnose infections specifically caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2); and 3) perform rapid mutation typing of the D614G SARS-CoV-2 spike variant associated with increased pathogen transmissibility. These FEVER assays had a high in silico recall (predicted positive) up to 99.7% of 525,708 SARS-CoV-2 sequences analyzed and displayed sensitivities and specificities as high as 92.4% and 100% respectively when validated in 100 clinical samples. The D614G SARS-CoV-2 spike mutation PCR test was able to identify the single nucleotide identity at position 23,403 in the viral genome of 96.6% SARS-CoV-2 positive samples without the need for sequencing. This study demonstrates the utility of FEVER to design assays for biosurveillance, diagnostics, and mutation typing to rapidly detect, track, and mitigate future outbreaks and pandemics caused by emerging viruses.

(SARS-CoV-2) most recently in 2019 (6, 7). Outbreaks and pandemics often occur when a 43 zoonotic viral strain evolves and escapes human immunity (8, 9) . The early detection of these 44

pathogens is required to minimize outbreaks and prevent pandemics by guiding early interventions 45 (10). However, RNA viruses mutate rapidly, so biosurveillance tools must be able to detect groups 46 of viral pathogens with diverse genome sequences and emerging variants, often requiring time-47 consuming and expensive procedures performed by trained personnel that are not easily accessible 48 in all regions of the world (11, 12). Once spillover occurs and a viral isolate enters human 49 circulation, a rapid and accurate diagnostic test is required to circumvent its spread (13-16). 50

However, development of accurate diagnostics at the onset of an outbreak can be challenging due 51 to limited data availability depending on the location of the outbreak, and the process for validation 52 and implementation of a new test can also be time-consuming (17). Having a broadly applicable 53 computational approach to assay design that can accommodate pathogen emergence and variant 54 evolution would help alleviate some of these challenges. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 27, 2021. ; https://doi.org/10.1101/2021.05.25.21257811 doi: medRxiv preprint applications to respond rapidly to emerging pathogens and mitigate their global impact (43). To 86 this end, we have developed FEVER (Fast Evaluation of Viral Emerging Risks), a computational 87 approach that can generate both high-coverage or strain-specific diagnostic assays so that the same 88 detection platform can be used for both broad-based biosurveillance and targeted diagnostic 89 applications. Here, we demonstrate the applicability of FEVER for simultaneous biosurveillance 90 of pan-Sarbecoviruses and targeted diagnosis of SARS-CoV-2 in a cohort of 100 human patients. 91

We also demonstrate mutation typing of the D614G SARS-CoV-2 spike variant using this 92 approach as an example of a rapid method for tailored surveillance of pathogen evolution and 93 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 27, 2021. ; https://doi.org/10.1101/2021.05.25.21257811 doi: medRxiv preprint in the database covered by the probe. "Exact" coverage corresponds to an exact match of probe 133 and sub-sequence. Hamming distance between two strings is defined as the number of characters 134 that would have to be changed in one string to agree with the other string. Given a database of 135 SARS-CoV-2 sequences, the single probe design problem is to find a string of length k, subject to 136 the constraints of GC-content and hairpin aversion, that covers as many SARS-CoV-2 sequences 137 as possible. For the multi-probe design problem, we seek n probes instead of just one, and a n-138 probe design is said to cover a SARS-CoV-2 sequence if any one of the probes covers it. The 139 selection of k=31 was used because of practical considerations for molecular beacon synthesis 140 allowing for a 12 bp stem region (6 bp on each side of the k-mer). Also, k=31 was long enough to 141 bind to target sequences in the SARS-CoV-2 database without accidentally covering potential 142 background sequences. 143

Our FEVER algorithm works by extracting all k-mers from each of the sequences in the 144 database and then restricting consideration to those that exhibit a minimal GC-content and hairpin 145 aversion. For every one of these k-mers, we counted how many viral sequences were covered, 146 being careful not to count a sequence twice even if the k-mer appears twice in it. Finally, we 147 selected the k-mer with the largest count. To ensure that our probes were not susceptible to viral 148 mutation, a multi-probe design approach was applied. The first probe was designed using a naïve 149 approach by producing the optimal probe with highest coverage of the queried sequence 150 database. The next probe is designed by eliminating the viral sequences covered by the first probe 151 and using this truncated database to find the single probe that best covers the remaining viral 152 sequences. This process is repeated until 100% of the database is covered. For both pan-153

Sarbecovirus sequences and SARS-CoV-2 sequences it only took 2 probes to obtain 100% 154 coverage of the sequence database at the time of design. 155 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) were assessed visually using the Variant Visualizer, which is under development and will be 169 available at https://cov.lanl.gov. Second, we used an in silico validation tool (46) to assess the 170 inclusivity of the FEVER assays compared with the U.S. CDC N1 and N2 assays (26). Each assay 171 (forward primer, reverse primer and TaqMan probe) was evaluated against each SARS-CoV-2 172 sequence from GISAID (47) public database that was greater than 29 kb (n = 525,708) to ensure 173 only complete genomes were assessed. The GISAID sequence data used in this analysis was 174 accessed on February 15, 2021. Results are presented in Table 2 . For this analysis, a false-negative 175 was assigned if one or more assay oligonucleotides satisfied any of the following conditions: (i) 176 three or more mismatches to a target sequence, (ii) a predicted melting temperature less than 40°C 177 between the oligonucleotide and a target sequence, or (iii) when primer/target mismatches 178 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) CDC assays were evaluated using genomic RNA from various viruses obtained from BEI 218 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. control, and water was used as a negative control. For the FEVER assays, RNA from heat-237 inactivated SARS-CoV-2, isolate USA-WA1/2020 (BEI Resources, NR-52347) was used as a 238 positive control, and water was used as a negative control. The RNase P assay described by the 239 U.S. CDC assays (26) was used as an extraction control for each sample. 240 241 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. SNPs of importance to public health. The computational approach starts with a curated multiple 276 sequence alignment. The FEVER algorithm does not however require sequences to be aligned in 277 order to generate assays. The alignment was performed to avoid overrepresentation of any single 278 isolate and ensure broad coverage of the pathogen of interest. In March of 2020, we developed two 279 alignments using the sequences available in GISAID: one with SARS-CoV-2 sequences and one 280 with Sarbecovirus sequences. Of note our FEVER assays were developed within one month of the 281 U.S. CDC assays (41) and also used viral sequence information obtained from GenBank. Next, 282 our FEVER algorithm was applied to design probes using the Sarbecovirus alignment for 283 biosurveillance and the SARS-CoV-2 alignment for diagnostics while also screening the results 284 using parameters set for GC-content, hairpin propensity, and match recall to achieve high coverage 285 as well as reagent manufacturing and performance compatibility. The FEVER algorithm identifies 286 sets of probes that cover 100% of each multiple sequence alignment. Two pan-Sarbecoviruses 287 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 27, 2021. ; https://doi.org/10.1101/2021.05.25.21257811 doi: medRxiv preprint specific probes were needed to detect all input sequences. After the FEVER probes were designed, 289 primers flanking the probe sites were designed using the PrimerDesign-M tool (44, 45) for use in 290 RT-PCR assays (Table 1) . Few variants were visually detected in the region of amplification of 291 these primers and probes (Fig. S1) . Further, we designed TaqMan SNP Genotyping PCR probes 292 and primers to detect the previously identified A23403G (D614G amino acid change) mutation in 293 the spike gene (40) ( Table 1) . Mutations in the SARS-CoV-2 genome conferring a fitness 294 advantage have been reported (40). However, rapid methods to characterize SARS-CoV-2 295 mutations are lacking. Together, the primers and probes for all three components (biosurveillance, 296 diagnostics, and mutation typing) are referred to collectively as the FEVER assays. 297

In silico inclusivity test. We evaluated our FEVER assays using a public web-based 299 validation tool against 525,708 sequences of SARS-CoV-2 (46). The in silico evaluation only 300 included complete SARS-CoV-2 genome sequences that were at least 29 kb long. In general, a 301 low number of predicted failures (false-negatives) was observed in comparison with the number 302 of successes or true-positives (total of perfect matches, single mismatch, and double mismatches) 303 (Table 2) . Recall (true positives divided by the sum of true positives and false negatives) was used 304 to assess relative assay performance. Of the FEVER and U.S. CDC assays, the SARS-CoV-2 305 specific FEVER probe targeting ORF1ab had the best-predicted performance with a recall of 306 99.7%. The pan-Sarbecovirus probe targeting the 5' UTR and envelope gene were also predicted 307 to be highly inclusive for SARS-CoV-2 with recalls of 96.4% and 98.6%, respectively. In contrast, 308 the SARS-CoV-2 specific probe targeting the spike gene had a 90.5% recall, suggesting that the 309 spike gene is a suboptimal target due to its high genetic variability (40, 49) . For instance, the U.S. 310 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (Table 3) . Excluding SARS-CoV-2, the SARS-CoV Urbani strain was the only 330 commercially available genomic RNA from a Sarbecovirus available at the time of testing. The 331 SARS-CoV Urbani strain was correctly tested negative by the U.S. CDC assays (N1 and N2), 332

FEVER_ORF1ab, and FEVER_Spike targets. The SARS-CoV strain Urbani tested positive by the 333 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 27, 2021. Table 4 ). Samples that initially 358 produced an inconclusive result were re-tested and the 12 that remained inconclusive following 359 the second test were removed from the analysis of agreement, sensitivity, and specificity. We 360 compared the four FEVER assays to the results obtained with the U.S. CDC assays and found that 361 all four FEVER assays displayed 100% specificity (no false positives were detected) and varying 362 sensitivities ranging from 74.2% to 92.4% (Table 4, Table S1 ). However, the U.S. CDC assays 363 were performed in a Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory, 364 while the FEVER assay was performed in a separate facility on a different thermocycler for 365 research purposes only. 366 inconclusive. Samples initially recorded as inconclusive were re-tested and the 12 that remained inconclusive were removed from the analysis of agreement, sensitivity, and specificity. b Agreement = no. of samples with identical results for an individual FEVER assay and U.S. CDC assay / 88 c Sensitivity = true positive / (true positive + false negative) with 95% confidence intervals d Specificity = true negative / (true negative + false positive) with 95% confidence intervals 95% confidence intervals were calculated using the modified Wald method. N/A, not applicable.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. have incorporated a SNP Genotyping PCR assay. For this mutation typing test, we are detecting 387 the A23403G (D614G amino acid change) mutation in the spike gene, which is associated with 388 greater infectivity (40) ( Table 1 ). Control RNA from isolate USA-WA1/2020 (BEI Resources, 389 NR-52347) was used as a positive control, and results showed that USA-WA1/2020 tested positive 390 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (Table  393 S2). A fragment of the spike gene was amplified, and the A23403G SNP was confirmed in a subset 394 of samples (n=17) by sequencing (Table S3 ). These results indicate that most (57/59) patient 395 isolates contained the mutated G614 form of SARS-CoV-2 that became prevalent globally in April 396 2020 (40). 397

Early detection of emerging pathogens is required to mitigate outbreaks and prevent 400 pandemics (10). Assays used for diagnosing infections are often used for biosurveillance; however, 401 due to their high specificity, they are not usually suited for broad-spectrum surveillance 402 applications or capable of identifying pathogen mutations (43, 56). Therefore, the user must often 403 decide between either obtaining diagnostics or biosurveillance information but not both. FEVER 404 (Fast Evaluation of Viral Emerging Risks) is a computational approach designed to simultaneously 405 facilitate broad-spectrum biosurveillance, pathogen-specific diagnostics, and SNP variant tracking 406 for pandemic response to any viral pathogen of interest. In this manuscript, we demonstrate the 407 application of FEVER to the ongoing COVID-19 pandemic as an initial proof-of-concept. As a 408 proof of concept, we present FEVER assays to detect all sarbecoviruses for future biosurveillance, 409 SARS-CoV-2 for strain-specific diagnostics, and the D614G spike variant SNP mutation 410 associated with increased transmissibility. 411

Here, we used an RT-PCR format to evaluate the performance of the FEVER assays as 412 compared to the U.S. CDC assays. The success of RT-PCR as a diagnostic approach relies heavily 413 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 27, 2021. ; https://doi.org/10.1101/2021.05.25.21257811 doi: medRxiv preprint on the design quality of the primers and probes (57-59) as well as factors such as probe format, 414 the ability to incorporate degenerate nucleotides, the number of target genomes that can be 415 included in the design process to account for genetic diversity, and the ability to avoid crossimportant that assays are able to differentiate infections caused by other common coronaviruses, 418 in which case a false positive could result in unnecessary and costly mitigation procedures (61). 419

For this reason, the U.S. CDC designed its N3 pan-Sarbecovirus assay; however, it was 420 unfortunately removed from their assay in March of 2020 in order to expedite reagent 421 manufacturing, consequently removing the biosurveillance component of the U.S. CDC COVID-422 19 assay (35, 37). This situation highlights the urgent need for a computational approach that can 423 develop both biosurveillance and diagnostics assays that are screened for potential design issues 424 that may potentially affect their performance and manufacturing. 425

Current probe design methods are also limited by over-representing single isolates in a 426 sequence database as indicated by high coverage in silico but assay failure when novel variants 427 emerge (31). In the case of SARS-CoV-2, the high number of false-negative results reported 428 highlights the need for screening and updating assay designs as sequences become available (62-429 64), which has been a primary goal of https://cov.lanl.gov. Mutations in the probe or primer 430 binding sites are particularly problematics since these can lead to false-negative results and 431 infected individuals may continue to spread the disease (65). Pandemic preparedness requires a 432 balanced approach combining targeted diagnostics assays designed to detect a specific outbreak 433 strain with high-coverage biosurveillance assays that can account for pathogen mutation, and 434 ideally this approach can be utilized to detect a variety of viral pathogens including those that are 435 highly mutagenic. FEVER was designed to fill this void. In the case of SARS CoV-2, our assays 436 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted May 27, 2021. ; https://doi.org/10.1101/2021.05.25.21257811 doi: medRxiv preprint were developed within a month of the U.S. CDC assays also using publicly available sequence 437 data, but with a different computational approach in mind. The U.S. CDC designed highly specific two SARS-CoV-2 specific assays that have contributed to combatting the current pandemic, 442 however it is unclear how useful these assays will be in detecting the next coronavirus outbreak. 443 In addition to optimized sequence analysis, FEVER incorporates physical constraints set 453 by the user to include GC-content, hairpin propensity, sequence length, and even mismatch 454 tolerance. This ensures that probes selected for testing are thermodynamically compatible for 455 increased assay performance as well as future multiplexing. It is critical that assay sensitivity is 456 optimized as a 10-fold increase in the limit of detection for SARS-CoV-2 has shown to result in 457 an increase in the false-negative rate by 13% (66). Here, the limit of detection for both the FEVER 458 and U.S. CDC assays in NP swab samples was 1 copy/µL which matches previously published 459 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In summary, the FEVER computational approach provides a novel tool to combat emerging 479 viral pathogens. FEVER can be used for simultaneous biosurveillance of viruses, diagnostics of 480 outbreak or pandemic isolates, and mutation typing to rapidly track the spread of variants 481 impacting public health. In addition to PCR, FEVER can generate probes that are compatible with 482 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Sarbecovirus assays for future biosurveillance applications. Additionally, FEVER mutation typing 486 assays provide a rapid means for monitoring the spread of potential vaccine escape variants that 487 may significantly impact a population. Future work will focus on applying FEVER towards the 488 development of assays for the detection of other upper respiratory pathogens including influenza 489 as well as testing FEVER probes in ultra-sensitive amplification-free nucleic acid biosensing 490

platforms. FEVER provides a holistic computational approach to combine biosurveillance and 491 diagnostics in order to better combat emerging viral pathogens. 492

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 27, 2021. ; https://doi.org/10.1101/2021.05.25.21257811 doi: medRxiv preprint

Avian flu, SARS, MERS, Ebola, Zika… what next? 521 Vaccine

Influenza-associated hospitalizations in the United States

Ebola--a growing threat?

Chikungunya outbreak in

Zika virus outbreak: 'a perfect storm'. Emerging microbes 530 & infections

SARS, the First Pandemic of the 21st Century

Candidate Agents for the Next Global Pandemic? A Review

Emerging Pandemic Diseases: How We Got to COVID-19

Development and characterization of a highly 610 specific and sensitive SYBR green reverse transcriptase PCR assay for detection of the 2009 611 pandemic H1N1 influenza virus on the basis of sequence signatures

Signature Erosion in Ebola Virus Due to Genomic Drift and Its Impact on the Performance of 616 Diagnostic Assays

Rapid development of nucleic acid diagnostics

Commercial Dual-Target Diagnostic Assay

Time Reverse Transcription PCR Panel for Detection of Severe Acute Respiratory Syndrome 635

Coronavirus 2

SARS-CoV-2 Testing: Trials and

Comparative Performance of SARS-CoV-2 Detection

Assays Using Seven Different Primer-Probe Sets and One Assay Kit

Tracking Changes in SARS-CoV-2 Spike: Evidence that

D614G Increases Infectivity of the COVID-19 Virus

Emergence of a new SARS-CoV-2 variant in the UK

tool for walking across variable genomes

A multiple-alignment based primer design algorithm for genetically highly 656 variable DNA targets

A public website for the automated assessment and 659 validation of SARS-CoV-2 diagnostic PCR assays

GISAID: Global initiative on sharing all influenza data -from vision 662 to reality

Multiple assays in a real-time RT-PCR SARS-CoV-2 panel can mitigate the 667 risk of loss of sensitivity by new genomic variants during the COVID-19 outbreak

Sensitivity of Nasopharyngeal Swabs and Saliva for the 678 Detection of Severe Acute Respiratory Syndrome Coronavirus 2

Saliva or Nasopharyngeal Swab 687 Specimens for Detection of SARS-CoV-2

Is the Patient Infected with SARS-CoV-2? 689

Emerging and Reemerging Viral Pathogens

Primer-BLAST: a 698 tool to design target-specific primers for polymerase chain reaction

Oli2go: an 701 automated multiplex oligonucleotide design tool

Classification and specific primer 704 design for accurate detection of SARS-CoV-2 using deep learning

Primer design for quantitative real-time PCR for the emerging

Evolutionary 709 and structural analyses of SARS-CoV-2 D614G spike protein mutation now documented 710 worldwide

Amplification of human β-glucuronidase 713 respiratory tract specimens

Presenting Characteristics, Comorbidities, and Outcomes Among 5700

Patients Hospitalized With COVID-19 in the

False Negative Tests for SARS-CoV-2 Infection 722 -Challenges and Implications

The Limit of Detection Matters: The Case for Benchmarking Severe Acute Respiratory 725

Syndrome Coronavirus 2 Testing

Diagnostics for SARS-CoV-2 detection: A 727 comprehensive review of the FDA-EUA COVID-19 testing landscape

Coronavirus RNA Proofreading: Molecular Basis and Therapeutic Targeting

No evidence for distinct types in the 733 evolution of SARS-CoV-2

Selective pressure on SARS-CoV-2 protein coding 735 genes and glycosylation site prediction

The 742 SARS-CoV-2 Spike variant D614G favors an open conformational state

Spike mutation D614G 747 alters SARS-CoV-2 fitness

SARS-CoV-2 genomic surveillance in Taiwan revealed novel

ORF8-deletion mutant and clade possibly associated with infections in Middle East

Comparative genomics tools applied to 755 bioterrorism defence

Introduction of the South African SARS

CoV-2 variant 501Y.V2 into the UK

Figure 2. The FEVER and U.S. CDC assays show concordance in detecting SARS-CoV-2 in 772 patient nasopharyngeal swab samples. The U.S. CDC N1 cycle threshold (CT) values were 773 compared to (a) FEVER_5'UTR, (b) FEVER_Env

RT-PCR assays CT values (blue dots). The U.S. CDC N2 CT values were compared to

FEVER_Env, (g) FEVER_ORF1ab, and (h) FEVER_Spike RT-PCR assays 776 CT values (purple dots). One sample was removed for the N2 vs FEVER_ORF1ab

FEVER_Spike comparisons because it was not detected by these assays