key: cord-0879606-in3nbt3f
authors: Smith, Elizabeth; Zhen, Wei; Manji, Ryhana; Schron, Deborah; Duong, Scott; Berry, Gregory J.
title: Analytical and Clinical Comparison of Three Nucleic Acid Amplification Tests for SARS-CoV-2 Detection
date: 2020-08-24
journal: J Clin Microbiol
DOI: 10.1128/jcm.01134-20
sha: 4b72408b3719ce8bd6a9d82ef6f0e464f6e7b98b
doc_id: 879606
cord_uid: in3nbt3f

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was first identified in December 2019 and has quickly become a worldwide pandemic. In response, many diagnostic manufacturers have developed molecular assays for SARS-CoV-2 under the Food and Drug Administration (FDA) Emergency Use Authorization (EUA) pathway. This study compared three of these assays, the Hologic Panther Fusion SARS-CoV-2 assay (Fusion), the Hologic Aptima SARS-CoV-2 assay (Aptima), and the BioFire Defense COVID-19 test (BioFire), to determine analytical and clinical performance as well as workflow. All three assays showed similar limits of detection (LODs) using inactivated virus, with 100% detection, ranging from 500 to 1,000 genome equivalents/ml, whereas use of a quantified RNA transcript standard showed the same trend but had values ranging from 62.5 to 125 copies/ml, confirming variability in absolute quantification of reference standards. The clinical correlation found that the Fusion and BioFire assays had a positive percent agreement (PPA) of 98.7%, followed by the Aptima assay at 94.7%, compared to the consensus result. All three assays exhibited 100% negative percent agreement (NPA). Analysis of discordant results revealed that all four samples missed by the Aptima assay had cycle threshold (C(t)) values of >37 by the Fusion assay. In conclusion, while all three assays showed similar relative LODs, we showed differences in absolute LODs depending on which standard was employed. In addition, the Fusion and BioFire assays showed better clinical performance, while the Aptima assay showed a modest decrease in overall PPA. These findings should be kept in mind when making platform testing decisions.

more challenging, some are infected with the virus but do not display any signs or symptoms of illness, which likely has contributed to the high rate of transmission among humans (5) .

Accurate and sensitive viral detection methods are key to quickly diagnose infections and mitigate transmission. Nucleic acid amplification tests (NAATs) are highly sensitive and specific methods for the detection of SARS-CoV-2 RNA in respiratory specimens. The majority of the SARS-CoV-2 NAATs available today are based on real-time reverse transcription PCR (RT-PCR) methods, including the BioFire Defense COVID-19 test (BioFire) and the Hologic Panther Fusion SARS-CoV-2 assay (Fusion) evaluated in the present study. Clinical comparative data have been obtained for the Fusion assay (6) (7) (8) (9) , but to our knowledge, there have been no comparative studies of the BioFire assay. Recently, Hologic has developed a second NAAT, the Aptima SARS-CoV-2 assay (Aptima), which has been granted Food and Drug Administration (FDA) Emergency Use Authorization (EUA); this NAAT is based on target capture and transcription-mediated amplification (TMA) technologies and is run on the Panther instrument. As with the BioFire assay, comparative evaluations have yet to be performed for this new molecular assay.

An increase in testing capability is critically needed to manage the current testing demands and to monitor and control the spread of the virus going forward. While the FDA has worked quickly to review and authorize more molecular diagnostic platforms to make more options available to meet demand, real-world clinical performance and comparative data are still lacking. These data are urgently needed to better understand the benefits and limitations of each test and how best to incorporate them into local and national testing strategies.

Here, we present an analytical and clinical evaluation of these three sample-toanswer NAATs for the qualitative detection of SARS-CoV-2 RNA in symptomatic patients: the Fusion assay, the Aptima assay, and the BioFire single-target COVID-19 assay.

Specimen collection and storage. Nasopharyngeal (NP) swabs were obtained from patients with clinical signs/symptoms of COVID-19. Each collection used a sterile swab made from Dacron, rayon, or nylon, which was placed into 3 ml of sterile universal transport medium (UTM; from various manufacturers) after collection. Samples were then transported at room temperature, stored at 2 to 8°C for up to 72 h, and tested as soon as possible after collection. For the retrospective sample set, after routine patient testing, aliquots of samples were taken and stored at Ϫ80°C until testing of the three comparator NAATs could take place.

Study design. Samples were selected from specimens received for routine SARS-CoV-2 testing between April and May of 2020 that were initially tested by the Fusion assay. The selection included symptomatic patient samples that were selected at random without bias toward specific age or gender. A total of 150 NP samples (75 negative and 75 positive) were used for this study: 101 retrospective specimens (51 negative and 50 positive) and 49 prospective, fresh specimens (24 negative and 25 positive). Specimens were selected to represent our laboratory's positivity rate at the time this study was designed (50 to 60% at beginning of April 2020) and also included positive specimens spanning the range of positivity, including those with low viral loads (characterized by high cycle threshold [C t ] values obtained by results from initial clinical testing). C t values obtained from the Fusion assay during comparative testing are also shown in Fig. S1 in the supplemental material. The 101 retrospective specimens were initially tested per routine patient care and then immediately aliquoted and frozen at Ϫ80°C, remaining frozen until this study was performed. Retrospective sample aliquots were thawed and immediately tested on the Fusion, Aptima, and BioFire systems. Testing for prospective samples was performed within 48 h of collection. Prospective samples were run on the Fusion, Aptima, and BioFire systems using the original UTM sample and aliquoted directly into Hologic lysis tubes and into the BioFire sample injection vial. This work was conducted as a quality improvement activity in order to complete assay validation for the Aptima and BioFire assays.

Hologic Panther Fusion SARS-CoV-2 assay. The Panther Fusion SARS-CoV-2 assay (Fusion) (Hologic, Inc., San Diego, CA) was performed on the Fusion instrument according to the manufacturer's instructions for use. This assay targets two unique regions of the open reading frame 1ab (ORF1ab) section of the SARS-CoV-2 viral genome. A 500-l aliquot of the primary NP swab specimen is transferred into a specimen lysis tube containing 710 l of lysis buffer and is then loaded onto the Fusion instrument. From this tube, the instrument removes 360 l for extraction. Each specimen is processed with an internal control (IC), which is added via the working Fusion capture reagent-S. Nucleic acid is purified using capture oligonucleotides and a magnetic field and eluted in 50 l, and 5 l of the eluted nucleic acid is added to a Fusion reaction tube. The Fusion assay amplifies and detects two conserved regions of the ORF1ab section of the SARS-CoV-2 viral genome. Both amplicons are detected by probes using the same fluorescent reporter, with amplification of either or both regions contributing to a single fluorescent signal and single C t value. Reporting of a positive specimen requires only 1 of the 2 targets to be detected (ORF1ab region 1 or ORF1ab region 2).

Hologic Aptima SARS-CoV-2 assay. The Aptima SARS-CoV-2 assay (Aptima) (Hologic, Inc., San Diego, CA) is a NAAT that uses target capture and transcription-mediated amplification (TMA) technologies for the isolation and amplification of SARS-CoV-2 RNA. This assay targets two unique regions of the ORF1ab section of the SARS-CoV-2 viral genome and is performed on the Panther instrument. All testing was performed according to the manufacturer's instructions and is briefly described. A 500-l aliquot of the primary NP swab specimen is transferred into a specimen lysis tube containing 710 l of lysis buffer, and this tube is then loaded onto the Panther instrument. From the specimen lysis tube, 360 l is taken for each reaction. Each specimen is processed with an IC, which is added via the working target capture reagent. Nucleic acid is purified using capture oligonucleotides and a magnetic field, and the purified nucleic acid is used as the template for the TMA reaction. After amplification, chemiluminescent probes hybridize to amplicons and emit light measured by a luminometer in relative light units (RLUs). The IC signal and SARS-CoV-2-specific signal are differentiated by kinetic profiles of the labeled probes (rapid versus slow). Assay results are determined by a cutoff based on the total number of RLU and the kinetic curve type.

BioFire COVID-19 test. The BioFire COVID-19 test (BioFire) (BioFire Defense, Salt Lake City, UT) was performed on the BioFire FilmArray Torch 12-bay system according to the manufacturer's instructions. This assay targets two unique regions of the ORF1ab section and one unique region of the ORF8 section of the SARS-CoV-2 viral genome. The test pouch (one per sample to be tested) is prepared by injecting hydration solution into the pouch hydration port. Using a provided transfer pipette, approximately 300 l of sample is added to the sample injection vial, followed by the addition of a single-use sample buffer tube to the sample injection vial, which is then inverted to mix. The sample mix is injected into the pouch sample port. Within the pouch, the sample is lysed by agitation (bead beating), and all nucleic acid is extracted and purified using magnetic bead technology. Nested multiplex PCR is performed, and endpoint melting curve data are used to detect and generate a result for each target assay of the BioFire COVID-19 test.

Analytical sensitivity. The limit of detection (LOD) was determined using quantified, inactivated (gamma-irradiated) SARS-CoV-2 material from isolate USA-WA1/2020 (NR-52287; BEI Resources, Manassas, VA) and a SARS-CoV-2 synthetic RNA quantified control containing five gene targets (E, N, ORF1ab, RdRP, and S genes of SARS-CoV-2) from Exact Diagnostics, Fort Worth, TX (SKU COV019). The BEI Resources material was provided at a concentration of 4.1 ϫ 10 9 genome equivalents (GE)/ml (2.8 ϫ 10 5 50% tissue culture infective doses [TCID 50 ]/ml before inactivation of the virus), from which the following serial dilutions were prepared (in GE/ml): 1,000, 500, 250, 125, 62.5, and 31.3. Dilutions were prepared using Ambion RNA storage solution (catalog no. AM7001; Thermo Fisher Scientific) to limit the potential of degradation of the RNA and aliquoted (with replicates ranging from 5 to 10, as shown in Table 1) for testing across the different platforms. The same process was followed for the Exact Diagnostics control, which had a starting concentration of 200,000 copies/ml and was diluted to make the same concentrations of serial dilutions as those made for the BEI Resources material. The LOD was defined as the lowest dilution at which all replicates were positive (100% detection rate).

Workflow evaluation. Workflow was evaluated using a calibrated timer to measure the time needed for each step being evaluated, including hands-on time (HoT), assay run time, and total turnaround time (TAT). HoT, assay run time, and TAT were calculated using the throughput of samples per run.

Statistical methods. The consensus result was based on the majority results of the three NAATs (Fusion, Aptima, and BioFire) and was defined as follows: consensus positive equals a positive result in Ն2 of 3 NAATs; consensus negative equals a negative result in Ն2 of 3 NAATs. The final result of each c The numbers of replicates at each dilution that gave initial equivocal results and required repeat testing were 0 (1,000 GE/ml), 1 (500 GE/ml), 1 (250 GE/ml), 5 (125 GE/ml), 2 (62.5 GE/ml), and 0 (31.3 GE/ml). d Dilutions for the Exact Diagnostics RNA are given in copies/ml. e The numbers of replicates at each dilution that gave initial equivocal results and required repeat testing were 0 (1,000 copies/ml, 500 copies/ml, and 250 copies/ml), 2 (125 copies/ml), 6 (62.5 copies/ml), and 4 (31.3 copies/ml).

NAAT was based on each manufacturer's results interpretation algorithm. 

Analytical sensitivity. Both quantified, inactivated SARS-CoV-2 virus (BEI Resources) and Exact Diagnostics SARS-CoV-2 synthetic RNA quantified control were used to prepare serial dilution panels (1,000 to 31.3 [GE or copies]/ml, in 2-fold dilutions) to determine the LOD of each assay. The LOD was defined as the lowest dilution in which all replicates were detected (100% positivity rate), using the results interpretation algorithm per the instructions for use for each assay. Using these criteria, the LOD using inactivated SARS-CoV-2 virus was 1,000 GE/ml for the Fusion assay, 500 GE/ml for the Aptima assay, and 500 GE/ml for the BioFire test (Table 1 ). In addition, the LOD as determined using the Exact Diagnostics synthetic RNA transcript was 62.5 copies/ml for both the Fusion and Aptima assays and 125 copies/ml for the BioFire test.

Clinical performance. After testing of the 150 clinical specimens on all three platforms, consensus results were determined for each sample (consensus positive is a positive result in Ն2 of 3 NAATs; consensus negative is a negative result in Ն2 of 3 NAATs), and results from each NAAT were compared to the consensus result. The Fusion and BioFire tests exhibited the highest PPA of 98.7%, while the Aptima assay had a PPA of 94.7% (Table 2) . NPAs were 100% for all three NAATs, with no false-positive results for any platform. Cohen's kappa values were 0.987, 0.947, and 0.987 for the Fusion, Aptima, and BioFire tests, respectively, all of which indicate an "almost perfect" level of agreement with the consensus result. In addition, McNemar's chi-square test was performed and showed no significant difference between each assay and the consensus result (Table 2) .

Initial equivocal or invalid results occurred for three samples out of the sample set: two samples with equivocal results for the BioFire test and one sample with an invalid result for the Fusion assay. The initial BioFire results for the two samples were one of three targets detected (one of the ORF1ab targets); repeat testing gave the same result for both samples, for an overall interpretation of "detected" as described by the manufacturer. The initial invalid result by the Fusion assay repeated as invalid, and this sample was removed from the overall agreement analysis for the Fusion assay; since this sample was negative by both the Aptima and BioFire tests, giving a consensus negative result, this sample was kept in the overall agreement analyses for the Aptima and BioFire tests. There were no discordant results among the prospective, fresh sample set; all of the discordant samples occurred among the retrospective sample set. Of the three NAATs evaluated, the Aptima assay had the most discordant results with the consensus (n ϭ 4), followed by the Fusion and BioFire tests, each with one discordant result. Discordant sample results are shown in Table 3 . The four discordant samples for Aptima had C t values of Ն37.3 by the Fusion assay, indicating lower viral titers in these samples. The discordant BioFire sample had a C t value by the Fusion assay of 35.7. For the five Fusion/Aptima discordant samples, the BioFire test detected only two of three targets in each sample (samples GSD-3, GSD-6, GSD-23) or detected one of three targets in each sample twice for an overall result of "detected" (GSD-4, GSD-48) ( Table 3) .

Workflow. Workflow parameters along with basic assay characteristics are presented in Table 4 These additional steps for BioFire account for the increased HoT per specimen, as shown in Table 4 .

The present study compared the analytical sensitivities (LODs), clinical performance, and workflows of three SARS-CoV-2 NAATs for 150 NP swab specimens. Our two independent LOD analyses revealed that while all three assays had an LOD that was within 1 dilution factor of each other within a given control material, absolute LODs with quantified, inactivated virus (500 to 1,000 GE/ml) were severalfold higher than the value obtained when using quantified synthetic RNA (62.5 to 125 copies/ml). This difference in absolute LOD values reflects the inherent difficulty in comparing standards that have been prepared and quantitated differently and that have very high stock concentrations requiring significant dilutions for LOD panel testing (i.e., any quantification error and/or pipetting error of the stock will be magnified in a dilution series). Of interest, the LOD calculations for BioFire using the synthetic RNA standard showed amplification in only two of three gene targets, since ORF1ab is included in the Exact Diagnostics control but ORF8 is not included. Considering this discrepancy, the increased LOD of the Biofire assay relative to the Fusion and Aptima assays when using the quantified synthetic RNA may possibly be due to the lack of ORF8 target material.

Our clinical correlation showed that the Fusion and Biofire assays had similar PPAs (98.7%), while the Aptima assay showed a slight decrease in PPA, at 94.7%, due to four false-negative results among samples in the frozen retrospective set. The differences seen among all three assays and the consensus result were not statistically significant when analyzed using McNemar's chi-square test. Cohen's kappa values also showed almost perfect agreement (range, 0.947 to 0.987) between each assay and the consensus result. NPAs were 100% for all NAATs, suggesting that each of these assays has high specificity.

All discordant results were false negatives compared to the consensus result, with Fusion and BioFire each demonstrating one missed positive and Aptima demonstrating four missed positives. A closer analysis of discordant results showed that the sample missed by the Fusion assay (GSD-23) had only two gene targets detected by BioFire (2a/2e), and the sample missed by the BioFire assay (GSD-7) had a C t value of 35.7 by the Fusion assay. All four samples missed by the Aptima assay (GSD-3, -4, -6, and -48) had C t values ranging from 37.3 to 40.5, suggesting that the Aptima assay may be slightly less sensitive than the Fusion and Biofire assays. Of note, two of the specimens missed by the Aptima assay (GSD-4 and GSD-48) were also each equivocal twice by the BioFire assay and resulted as positive per the EUA instructions for use (IFU), suggesting that these were also weak positives by the BioFire assay.

Comparisons of workflows between the Fusion and the Aptima assays showed that they were quite similar, both being optimal in high-volume testing situations (Ͼ1,000 tests/24 h), whereas the BioFire assay had an advantage of fast sample-toanswer time, allowing for the faster detection and diagnosis that is useful in urgent situations.

Other comparator studies of the Fusion assay have recently been reported, including the following: a study comparing the Fusion assay to the modified CDC assay, GenMark ePlex, and DiaSorin Simplexa (6); a study comparing Cepheid, ID NOW, and GenMark assays using Fusion as the reference standard (7); a study comparing Fusion, Cepheid, DiaSorin, and cobas assays and a laboratory-developed test (LDT) (8) ; and a study comparing Fusion to an in-house LDT targeting the envelope gene (9) . In general, these reports show that the Fusion SARS-CoV-2 assay is comparable, if not superior, to other molecular platforms available for SARS-CoV-2 testing. In particular, these evaluations showed that the Fusion, DiaSorin, and Cepheid assays tended to have better analytical sensitivity and fewer false negatives in clinical testing than other assays. The low rate of false negatives is a critical performance aspect, as missed positive results allow for further spread of the disease and potential patient mismanagement. These comparison studies, while not clinical trials as would be typical of new FDA-cleared in vitro diagnostic assays, have provided a window into the relative real-world clinical performance characteristics of these assays, which has been a significant knowledge gap since these SARS-CoV-2 EUA molecular assays first became available. Among these knowledge gaps are relative performance data for the Aptima and BioFire tests. To our knowledge, this is the first report comparing both the Aptima and BioFire assays to any other NAAT for SARS-CoV-2 detection.

It is worth noting that the LOD for the Fusion SARS-CoV-2 assay determined in this study was 1,000 GE/ml when inactivated virus was used, while the LOD determined in a previous study by our laboratory was 83 Ϯ 36 copies/ml (6) and an additional LOD study performed as part of this study, once again using the same synthetic quantified RNA standard, also previously published, showed a comparable LOD of 62.5 copies/ml. This is an important additional new set of data that exhibits the difficulty of comparing absolute LOD values when using different standards, specifically inactivated virus versus synthetic RNA standards.

This study does have limitations, being a single-site study with a limited number of NP swab specimens (n ϭ 150) included in the clinical evaluation. However, this specimen set was selected to be representative of our patient population (including samples selected without bias toward specific age or gender) and positivity rate (50 to 60% in the beginning of April 2020) and included samples with a range of viral loads (low, moderate, high [see Fig. S1 in the supplemental material]). One additional potential limitation is that the Fusion assay was used to select positive results, but any bias introduced by this selection should be minimal, considering that the proportion of positives analyzed mirrored our true positivity rate at the time of this study. Some initial equivocal or invalid results (n ϭ 3) by BioFire and Fusion required retesting, but only one remained unresolved on retesting and was removed from the agreement analysis for the Fusion assay.

In conclusion, our data show that the Fusion, Aptima, and BioFire SARS-CoV-2 NAATs exhibit similar LODs, even when tested using two different quantified standards. In addition, the Fusion and BioFire assays have comparable clinical performance for detection of SARS-CoV-2 in NP swabs, with a PPA of 98.7%, while the Aptima assay showed a slightly lower PPA, at 94.7%. All three assays demonstrated 100% NPA, suggesting high specificity. These performance characteristics, as well as testing volume and workflow requirements, should be considered when making testing platform decisions.

Supplemental material is available online only. SUPPLEMENTAL FILE 1, PDF file, 0.4 MB.

Washington State 2019-nCoV Case Investigation Team. 2020. First case of 2019 novel coronavirus in the United States

COVID-19) situation summary

Coronavirus COVID-19 global cases by the

COVID-19): people who are at higher risk for severe illness

WHO-Joint Mission on Coronavirus Disease. 2020. Report of the WHO-China Joint Mission on Coronavirus Disease

Comparison of four molecular in vitro diagnostic assays for the detection of SARS-CoV-2 in nasopharyngeal specimens

Clinical evaluation of Comparison of Three SARS-CoV-2 NAATs Journal of Clinical Microbiology three sample-to-answer platforms for the detection of SARS-CoV-2

Comparison of commercially available and laboratory developed assays for in vitro detection of SARS-CoV-2 in clinical laboratories

Comparison of the Panther Fusion and a laboratorydeveloped test targeting the envelope gene for detection of SARS-CoV-2

The measurement of observer agreement for categorical data

Interrater reliability: the kappa statistic

R: a language and environment for statistical computing. R Foundation for Statistical Computing

Gregory Berry has previously given education seminars for Hologic, Inc., and BioFire Diagnostics and has received honorariums.