key: cord-266036-qhlo99l7
authors: Axell-House, Dierdre B.; Lavingia, Richa; Rafferty, Megan; Clark, Eva; Amirian, E. Susan; Chiao, Elizabeth Y.
title: The Estimation of Diagnostic Accuracy of Tests for COVID-19: A Scoping Review
date: 2020-08-31
journal: J Infect
DOI: 10.1016/j.jinf.2020.08.043
sha: 
doc_id: 266036
cord_uid: qhlo99l7

OBJECTIVES: To assess the methodologies used in the estimation of diagnostic accuracy of SARS-CoV-2 real-time reverse transcription polymerase chain reaction (rRT-PCR) and other nucleic acid amplification tests (NAATs) and to evaluate the quality and reliability of the studies employing those methods. METHODS: We conducted a systematic search of English-language articles published December 31, 2019-June 19, 2020. Studies of any design that performed tests on ≥10 patients and reported or inferred correlative statistics were included. Studies were evaluated using elements of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) guidelines. RESULTS: We conducted a narrative and tabular synthesis of studies organized by their reference standard strategy or comparative agreement method, resulting in six categorizations. Critical study details were frequently unreported, including the mechanism for patient/sample selection and researcher blinding to results, which lead to concern for bias. CONCLUSIONS: Current studies estimating test performance characteristics have imperfect study design and statistical methods for the estimation of test performance characteristics of SARS-CoV-2 tests. The included studies employ heterogeneous methods and overall have an increased risk of bias. Employing standardized guidelines for study designs and statistical methods will improve the process for developing and validating rRT-PCR and NAAT for the diagnosis of COVID-19.

After its emergence in December 2019, the virus now known as SARS-CoV-2 was identified and sequenced in early January 2020, 1 allowing for the rapid development of diagnostic testing based on the detection of viral nucleic acid (i.e., real-time reverse transcription polymerase chain reaction [rRT-PCR]). 2 Because infected patients can present with non-specific symptoms or be asymptomatic, the development of accurate diagnostic tests for both clinical and epidemiological purposes was a crucial step in the response to the COVID-19 pandemic. 3 In the United States, the spread of SARS-CoV-2 rapidly outpaced the capacity to test for it, resulting in the Food and Drug Administration (FDA) relaxing regulatory requirements to increase testing availability. The FDA granted the first Emergency Use Authorization (EUA) for a SARS-CoV-2 rRT-PCR diagnostic test on February 4, 2020. Consequently, hundreds of tests for SARS-CoV-2, among them rRT-PCRs, other types of nucleic acid amplification tests (NAATs), and automated and/or multiplex methods based on proprietary platforms, obtained FDA Emergency Use Authorization (EUA). As of August 4 th , 2020, the FDA has granted EUAs to 203 diagnostic tests, including 166 molecular tests, 35 antibody assays, and 2 antigen tests.

Although the FDA began requiring the submission of validation methods and results as part of EUA application for SARS-CoV-2 diagnostic tests, these tests were not initially required to undergo the rigorous assessment that would normally be part of the FDA approval process.

Researchers also began developing alternative nucleic-acid based methodologies to detect SARS-CoV-2, including reverse-transcription loop-mediated isothermal amplification (RT-LAMP), and others.

Concurrently with rapid test production, publications emerged reporting clinical diagnostic test performance characteristics, such as "sensitivity" and "specificity", though some lacked the rigorous methodologies usually required to formally estimate diagnostic accuracy.

Here we present a scoping review of the literature with two main objectives: 1) to assess the methodologies used in the estimation of diagnostic accuracy of SARS-CoV-2 tests and 2) to evaluate the quality and reliability of the studies employing those methods.

Searches were performed through MEDLINE (Ovid), EMBASE (Elsevier), Scopus, Web of Science, CINAHL, and PubMed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines 4 between December 1, 2019 and June 19, 2020. The following search string was used: (2019-nCOV or SARS-CoV-2 or SARS-CoV2 or COVID-19 or COVID19 or COVID) and ("positive agreement" or "negative agreement" or "overall agreement" or "diagnostic accuracy" or "positive rate" or "positivity rate" or "test performance" or "reference standard" or "gold standard" or sensitivity or specificity or "percent agreement" or "concordance" or "test agreement" or "predictive value" or "false negative" or "false positive") and ("polymerase chain reaction" or PCR or "reverse transcriptase" or "nucleic acid amplification test" or NAAT or isothermal or "RT-LAMP" or "RT-PCR" or "molecular test").

The literature hub LitCovid's "Diagnosis" section was screened in its entirety once and then daily for relevant titles.

We liberally screened articles by title and abstract for further evaluation. Articles were included if they met the following criteria on screening: 1) Peer-reviewed publication, 2) Study evaluated diagnostic test accuracy of NAAT, 3) Diagnostic test performed on ≥10 patients, 4) Diagnostic/Clinical sensitivity, specificity, other correlative statistics, or test positive rate were either identified by name or were included in the publication as a numerical value and we could reproduce the calculations. Exclusion criteria included: 1) Pre-print status, 2) Guidelines, consensus, review, opinion, and other summary articles 3) Entirely pregnant or pediatric populations, 4) Overlap of study population with another included publication.

Four authors independently extracted data and two authors reviewed data for accuracy.

For study characteristics, we extracted: first author name, country, study design, patient population, total number of patients or samples included in test performance calculations, and number of cases according to rRT-PCR (Tables 1-5) or total number of cases based on positive result of any platform tested (Table 6) . For patient characteristics, we extracted age and sex.

For index test and reference standard characteristics, we extracted: test type (NAAT) or definition (clinical diagnosis, composite reference standards), specimen (NAAT), specimen dry/collection liquid status (for studies evaluating Abbott ID NOW), proprietary automated and/or multiplex systems -henceforth called "platforms" (NAAT), and target genes of primers (NAAT).

For outcomes, we extracted the values of test performance characteristics with their designation according to the original authors, without our interpretation. For this reason, we indicate these outcomes as "reported" (r): reported sensitivity (rSN), specificity (rSP), positive predictive value (rPPV), negative predictive value (rNPV), accuracy (rAcc), positive percent agreement (rPPA), negative percent agreement (rNPA), overall agreement (rOA), and Kappa coefficient.

Additionally, we extracted "positive rate," a non-standard term used by the included studies to refer to the number of positive NAATs in a population of patients suspected to have COVID-19 (Table 1) , or to the number of positive samples in a total population of positive samples after repeat testing (Table 2) . We constructed 2 x 2 contingency tables and reproduced test performance characteristic calculations to demonstrate the methods of how the original authors obtained the values (Supplementary Table 1) . We report additional pertinent study data in Supplementary Table 2 : enrollment dates, number of sites of enrollment, symptomatic status, and chest radiology status. No articles were excluded on the basis of quality in order to present the most comprehensive summary of the currently available evidence.

We presented the extracted data in tabular form mirrored by a descriptive synthesis 5 in two broad categories: diagnostic accuracy studies for rRT-PCR (Tables 1-3) , and diagnostic accuracy or comparative agreement studies of two NAATs (Tables 4-6 ). Tables are thematically divided based on the reference standard strategy, or approach to obtaining comparative agreement measures. Diagnostic accuracy studies for rRT-PCR were arranged alphabetically in tables by first author last name (Tables 1-3) . Diagnostic accuracy and comparative agreement studies for two NAATs were arranged by decreasing order of studies per methodology, then alphabetically by methodology or platform (Tables 4-6) for easy comparison. Due to significant diversity in methods and reporting of results, we reported grouped summary data for study characteristics, patient characteristics, and outcomes.

We used the framework of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) 6 to evaluate our selected articles (Supplementary Table 3 ). We collected data, or noted their absence, for a narrative description of risk of bias and concerns of applicability based on the QUADAS domains. For assessment of bias in patient selection, we evaluated author conflicts of interest, study design type, inclusion/exclusion criteria, method of patient enrollment, and reporting of patient demographics and characteristics (i.e. symptomatic status).

For assessment of bias in reference standard and index test, we evaluated the accuracy of the reference standard, the description of duration of symptoms at time of testing, whether the threshold to determine a positive test was prespecified, and researcher blinding to reference standard and index test results. For assessment of bias in flow and timing, we evaluated whether the reference standard was the same for all patients, the sequence and timing of the performance of the reference standard and index test, whether test performance characteristics were calculated based on sample numbers or patient numbers, and whether indeterminate or invalid results were included in test performance calculations.

Our search yielded 1537 articles, with 816 unique articles after deduplication. After screening title and abstract, 130 articles underwent full text evaluation. Ultimately, 49 articles were included in our review ( Figure 1 ).

Three studies, with 19 to 1014 patients, report a "positive rate" as the number of positive rRT-PCR out of the number of suspected cases of COVID-19, with a range of 38.42% to 59% (Table 1) . [7] [8] [9] The studies do not report these values as "sensitivity" directly, however these concern that their low calculated positive rates (38.42% and 47.4%, respectively) were indicative of a failure of rRT-PCR to diagnose COVID-19. 8, 9 In terms of quality assessment, the studies lack specific details as to how patients were classified as having suspected COVID-19 infection. The accuracy of clinical diagnosis based on case definitions is unclear but is likely not ideal for diagnosis. Additionally, duration of symptoms at the time of clinical diagnosis or rRT-PCR testing was not provided (Supplementary Table 3 ).

Eight studies, with a range of 36 to 22,061 patients per study, attempted to determine the accuracy of rRT-PCR by comparing the initial rRT-PCR result to the result after multiple repeated samples from the patient submitted for rRT-PCR testing, which was called the reference standard (Table 2) . [10] [11] [12] [13] [14] [15] [16] [17] Three studies reported this value as a "positive rate," ranging from 51.25% to 88%, 10, 14, 17 and five reported sensitivity, with a range of 57.9% to 94.6%. [11] [12] [13] 15, 16 Of these studies, only He et al included an rSp of 100%, calculated from patients who remained negative for SARS-CoV-2 after repeated sample testing (Supplementary Table 1 ). 13 Green et al. included patients in their study regardless of whether they were tested once or multiple times, using data from these subsets of patients to make assumptions for estimating clinical test characteristics. In addition, this study also conducted multiple different NAATs and rRT-PCRs on patients, whereas other studies employing this strategy used only one type of NAAT. Additionally, the authors do not clarify whether patients who had repeat SARS-CoV-2 test were consistently tested with the same NAAT/rRT-PCR test or a different one. They also calculated test performance characteristics differently from other studies: two estimates of sensitivity were calculated, one in which the rate of false negatives for single-tested patients was 0%, and one in which the "false negative" rate was the same as in repeat-tested patients in their study of approximately 16.8%. 12 However, the details of how they calculated test characteristics were not presented. To clarify the two assumptions made in the calculations, we In terms of quality assessment, most of the studies were performed with non-cohort design, and six consisted of only patients who were determined to have COVID-19 by rRT-PCR, i.e. cases only (Table 2) . 10, 11, [14] [15] [16] Five of the studies had inclusion criteria which caused pertinent patients to be excluded by necessitating patients to have had a well-performed CT Chest or X-Ray (Supplementary Table 3 ). This excluded several patients who would otherwise have been pertinent to the study of test diagnostic accuracy. 10, 11, 13, 16, 17 The studies involved repeating rRT-PCR several times for a reference standard, but each patient received a different number of repeat tests over a different time period, resulting in each patient receiving a different reference standard. One study tracked negative-to-positive conversion over 1 to 49 days, 12 and another tracked over 1 to 14 days, 13 leading to concern that potentially a patient could have been infected in the time between the initial test and the final test and confounding results. One study counted invalid results as negative results and indeterminate results as positive results when calculating test performance characteristics, 12 otherwise the rationale and ways invalid and indeterminate results were handled were not reported in these studies.

Three studies determined the accuracy or agreement of rRT-PCR or automated rRT-PCR platforms/instruments compared to a reference standard based on the results of several tests as a "composite reference standard" (Table 3) . [18] [19] [20] There were between 58 and 184 patients per study. Suo et al considered a positive result of either repeated measurements of rRT-PCR or serology to indicate a positive test according to the reference standard; reported sensitivity of initial rRT-PCR result was 40%, rSp 100%, rPPV 100%, and rNPV 16%. 19 Zhen et al compared rRT-PCR performed according to the US CDC protocol to a composite reference standard in which the consensus result of 3 or more out of 4 molecular assays was considered the correct result. The rRT-PCR had an rPPA of 100%, an rNPA of 98%, and Cohen's kappa coefficient of 0.98. 20 Cradic et al did not study rRT-PCR but studied three automated molecular assays and used a composite reference standard of the consensus result of two or more of the three assays. While Abbott ID NOW had a rPPA of 91%, the Roche cobas 6800 and Diasorin Simplexa assays had a rPPA of 100%. 18 These studies either did not report how samples were selected for evaluation (Supplementary Table 3 ), 19, 20 or reported that only samples which had sufficient residual volume and had been properly stored were selected. 18 days after the initial test, after the patient had been discharged from the hospital, leading to potential exposure for initial infection or reinfection. 19 The performance other nucleic acid amplification test methods compared to standard rRT-PCR Fourteen studies compared other nucleic acid amplification test methods to detect SARS-CoV-2 to rRT-PCR (Table 4) 30 Suo et al evaluated digital droplet polymerase chain reaction (ddPCR), with rSN 94%, rSp 100%, rPPV 100%, rNPV of 63%, and rAcc 95%. 19 Bulterys et al study evaluated an isothermal amplification method with rSN 82.8% and Cohen's kappa 0.86. 31 Wang, Cai, and Zhang et al evaluated one-step single-tube nested quantitative polymerase chain reaction (OSN-qRT-PCR) with Cohen's kappa of 0.737. 32 Regarding evaluation of quality (Supplementary Table 3) , the majority of studies did not report how patient samples were selected for evaluation. 21, 22, 24, 25, [27] [28] [29] [30] 32 In the study conducted by Bulterys et al, sample selection was a convenience selection of samples with residual volume that had been stored correctly. 31 Most studies did not report symptomatic status of the patient 21-28,30-32 or patient demographics. [21] [22] [23] [24] [25] [27] [28] [29] [30] [31] [32] Problematically, many of the studies did not report when the reference standard was conducted on the patient samples compared to the index test, or whether actions that could potentially alter test results (such as freeze/thaw cycles) occurred between reference standard or index test. [21] [22] [23] [24] 27, 28, 31 Four studies calculated test performance characteristics based on number of samples rather than number of patients. 23, 27, 30, 32 The management of indeterminate and invalid test results went largely unreported. [21] [22] [23] [24] [25] 27, 30 The performance of NAAT platforms compared to rRT-PCR as the reference standard

Fifteen studies compared automated NAAT platforms to various rRT-PCR assays to determine test performance characteristics ( Other studies evaluated AusDiagnostics (rSN 100%, rSP 92.16%), 43 Hologic Panther Fusion (rPPA 98.7%, rNPA 98.1%), 44 Luminex NxTAG (rSN 97.8%, rSP 100%), 45 Mesa BioTech Accula (rPPA 68.0%, rNPA 100%), 46 or QIAstat-Dx (rSN 100%, rSP 93%) 47 compared to rRT-PCR.

With regards to quality evaluation (Supplementary Table 3 ), most studies did not report method of sample collection/patient recruitment, 33, 35, 37, [41] [42] [43] [44] [45] [46] [47] and four studies conducted a convenience selection of samples, including enrichment for positive samples. [34] [35] [36] 39 Eight studies conducted test performance calculations on sample numbers instead of patient numbers. 33, 35, 36, 38, 39, [42] [43] [44] Four studies conducted calculation of test performance characteristics with indeterminate or inconclusive results as "positive," 35, 38, 39, 42 and the management of indeterminate/inconclusive as well as invalid results went unreported in an additional three studies. 33, 37, 47 No study reported the blinding of researchers to the reference standard or index test results.

Ten studies, containing between 15 and 524 patients per study, evaluated the agreement between two different types of NAAT platforms ( In the studies, some platforms were identified as the "comparator" or "reference" platforms, including Cepheid Xpert Xpress, 49 Abbott RealTime, 34, 48 Hologic Panther Fusion, 50, 51 and Roche cobas 6800, 52,55 and these were listed as "Platform #1" in Table 6 . Three studies did not identify any studied platform as the "comparator" or "reference standard," and instead only reported general, non-directional measures of agreement such as overall agreement, Cohen's Kappa, or alternatively, the calculations of PPA and NPA were identical no matter their method of calculation (Supplementary Table 1) . 39, 53, 54 Regarding quality evaluation (Supplementary Table 3 ), the samples used for calculating test performance characteristics were reported to be selected for enrichment of positive samples, 34, 39 for diversity of viral load, 52,54 otherwise curated, 50 or the method of selecting samples was unreported. 49, 51, 53, 55 Symptomatic status of the patients was largely unreported. 39, [49] [50] [51] [52] [53] [54] [55] Five studies included samples where one test was conducted, then interim freezing, cooling, or other storage, before performance of the second test. 39, [50] [51] [52] 54 Two studies did not report the sequence of testing of the two platforms or interim handling or storage of the samples. 49, 53 The status of researcher blinding to either platform result was not reported in any study.

In our scoping review of 49 articles concerning test performance characteristics of rRT-PCR and other NAAT used for the diagnosis of COVID-19, we were able to observe several overarching themes. Clinical diagnosis by the case definition for COVID-19 used in the early period of the pandemic does not correlate well with positive rates of COVID-19 rRT-PCR ( Table   1 ). The result of the initial rRT-PCR performed on a patient, if negative, may not be reflective of the result after multiple repeated rRT-PCRs for that patient ( Table 2) . Several alternative NAAT methods, many of which are easier or faster to perform, may be comparable to standard rRT-PCR (Table 4 ). Proprietary multiplex, automated, and/or point-of-care methods are comparable to in accuracy to rRT-PCR (Table 5 ) and to each other (Table 6 ), although the Abbott ID NOW SARS-CoV-2 test appears to have lower comparative agreement to other platforms. 34, [48] [49] [50] [51] [52] These findings should be viewed cautiously as the SARS-CoV-2 tests in these studies have not undergone rigorous evaluation necessary for FDA approval due to the emergency state generated by the COVID-19 pandemic. In addition, during our scoping review, we found substantial heterogeneity among available studies in terms of test types, reference standards, metrics, and details of study design and methodology.

We categorized the included studies by four different reference standard strategies: clinical diagnosis/case definitions (Table 1) , repeated index testing (Table 2) , composite reference standard (Table 3) , and rRT-PCR (Table 4 and 5) . Additionally, we identified a fifth category, where instead of using a reference standard, comparative agreement between two NAAT platforms was calculated (Table 5 and 6 ).

The main limitation of the first group of studies (Table 1 ) was the use of a "case definition" as the reference standard to report a "positive rate" of rRT-PCR. During novel disease outbreaks, standard case definitions are often developed to assist clinicians in case identification before a diagnostic test is available. Unfortunately, the studies included in this group were unable to use a clear case definition; instead they refer to a population of "suspected cases," for which the definition is not reported. [7] [8] [9] Because this group enrolled patients prior to February 15, 2020 in China, during the time in which the Chinese National Guideline for Diagnosis and Treatment of COVID-19 (NGDTC) published five different versions of the COVID-19 case definition, the case definitions in use at the time of these studies varied. 56 A recent study estimated that if a single guideline (specifically, version 5 of the NGDTC) had been used to identify cases from the beginning of the outbreak to February 20, 2020, there would have been more than three times as many identified cases in Hubei province. 56 This is relevant to our review because the two largest studies that evaluated the rRT-PCR positive rate of patients with a clinical diagnosis of COVID-19 took place in Wuhan, Hubei province, and included patients evaluated before February 14, 2020 7, 8 (Supplementary Table 2 ). This increased case estimate due to diagnosis of COVID-19 based on case definition complicates the legitimacy and reported accuracy of the "positive rate" of rRT-PCR referred to in these studies.

The second group assessed rRT-PCR test performance characteristics via repeated index rRT-PCR testing ( Table 2 ). Most studies in this group reported "sensitivity" by dividing the number of participants with positive baseline rRT-PCRs by the total number of participants who eventually had a positive rRT-PCR after repeated measurements. While such an approach may have some advantages over the use of a case definition alone as a reference standard, this strategy is, nonetheless, an imperfect solution with its own set of inherent limitations. SARS-CoV-2 infection is transient and the associated viral loads are time-varying because of the natural pathophysiology of the infection. Therefore, the time interval between each repeated test becomes crucially important, and even relatively small time differences (and/or lack of uniformly used intervals) could complicate the interpretation of re-test results and their quality as reference standards. Furthermore, repeated use of the same test as a reference standard for itself does not eliminate the inaccuracies or limitations of the test. Such comparisons ultimately reflect the reliability of the test (assuming a short, uniform time interval between tests), rather than providing a true view of test accuracy.

The third group of three studies calculated test performance characteristics of rRT-PCR according to a composite reference standard (Table 3 ). Using arbitrary rules to combine multiple different and imperfect tests inevitably creates a reference standard with some degree of bias. 57 Furthermore, all three studies in this group included the test under evaluation as part of the composite reference standard, which leads to additional bias, described below. 58 Use of a biased composite standard is likely to lead to reduced sensitivity, among other errors affecting true test performance characteristics. 59 The fourth group of studies evaluated SARS-CoV-2 diagnostic tests that are under development as well as proprietary testing platforms (most of which are based on standard rRT-PCR methods). These studies used traditional rRT-PCR as a reference standard; results are summarized in Tables 4 and 5 , respectively. Importantly, while these studies were not designed to estimate the accuracy of rRT-PCR, their results indicate that the index tests did not identify significantly more positive samples than rRT-PCR.

Finally, the last group of studies compared SARS-CoV-2 NAAT platforms (Table 6) .

These comparative accuracy studies examined the agreement between two non-reference standard tests. Although most of the testing platforms evaluated in these studies were based on standard rRT-PCR, the agreement between two non-reference standard tests is not equivalent to test accuracy, as mentioned previously.

This scoping review is limited by the lack of reporting of several key study features in the majority of the articles evaluated, which is an important indicator of quality and potential bias.

Based on the QUADAS-2 criteria, most of the included studies had concern for bias (Supplementary Table 3 ). The most prominent concerns were unclear inclusion/exclusion criteria, unclear method of enrollment/selection of patients and samples, and unclear handling of indeterminate/inconclusive and invalid results. Additionally, many of the studies were conducted in a so-called "two gate" (case-control) design, in which cases and controls were known and selected ahead of time, rather than performing the test on a group of patients or samples with suspected COVID-19. These factors likely incorporate bias that significantly confounds the results of the studies, thus, the accuracy of the tests in other settings with different prevalences (such as asymptomatic screening, other age groups) may not be truly generalizable. Furthermore, few studies were able to evaluate both the index and reference tests simultaneously or within a short period of time, which is key to avoiding biases caused by changes in the patient's true disease status; this bias can also affect the diagnostic accuracy of the index test.

The best approach to determining diagnostic test performance characteristics in the absence of a "gold" standard is an open question in diagnostic accuracy methodology. While many methods have been described, there are only a few well-defined statistical approaches that use a reference standard in lieu of a gold standard, reviewed elsewhere. 60 Latent class analysis is one commonly used approach in situations in which neither the true error rates of the reference standard nor the true prevalence of the disease are known. This approach uses the results of a set of imperfect tests to estimate parameters related to sensitivity, specificity, and prevalence often using maximum likelihood methods. However, this is not the only method available and every method has its own strengths and limitations. 57 Therefore, careful interpretation by studies that attempt to estimate test characteristics is warranted to account for and clarify the inherent limitations of assessing accuracy-related metrics when a gold standard is unavailable.

Evaluation of the performance characteristics of SARS-CoV-2 diagnostic tests is vital to control of the ongoing COVID-19 pandemic. While more than 200 SARS-CoV-2 molecular diagnostic tests have received FDA EUAs, we have described in this scoping review that the performance of few of these tests has been assessed appropriately. The lack of robust test performance that we noted in many studies published to date is undoubtably due in part to the critical need for tests, which resulted in accelerated test development. However, our scoping review also uncovered imperfect methods for estimating diagnostic test performance in the absence of a gold standard and demonstrate that the accuracy of these tests should be interpreted with caution. Future studies would benefit from employing statistical methods such as latent class analysis and other methods referenced above to accurately analyze their data.

Indeed, instituting national requirements for test performance analysis and reporting, perhaps based on the existing FDA guidelines on diagnostic tests, 61 would advance the goal of standardizing the evaluation SARS-CoV-2 diagnostic test performance. Such an initiative would lead to statistically robust conclusions regarding the accuracy of the index test, which will in turn support hospitals and clinicians as they determine the optimal test to use for COVID-19 diagnosis. Table 2 . Abbreviations-AIGS: Automatic integrated gene detection system, BAL: Bronchoalveolar lavage, CI: confidence interval, ddPCR: digital droplet polymerase chain reaction, E: envelope, iAMP: isothermal amplification, IQR: Interquartile range, κ: kappa statistic, n/a: not applicable, N: nucleocapsid, No.: number, NPS: nasopharyngeal swab, nr: not reported, OPS: oropharyngeal swab, ORF1ab: open reading frame 1ab, OSN-qRT-PCR: one-step single-tube nested quantitative real-time polymerase chain reaction, rAcc: reported accuracy, RdRp: RNA-dependent RNA polymerase, Ref Stnd: reference standard, rNPA: reported negative percent agreement, rNPV: reported negative predictive value, rOA: reported overall agreement, rPPA: reported positive percent agreement, rPPV: reported positive predictive value, rRT-PCR: real-time Reverse Transcription Polymerase Chain Reaction, rSN: reported sensitivity, rSP: reported specificity, RT-LAMP: reverse transcription loop-mediated isothermal amplification, RT-RAA: reverse-transcription recombinase-aided amplification, S: spike, y: years. 

A Novel Coronavirus from Patients with Pneumonia in China

Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding

Presymptomatic Transmission of SARS-CoV-2 -Singapore

Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement

Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline

QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies

Correlation of Chest CT and RT-PCR Testing in Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases

Positive rate of RT-PCR detection of SARS-CoV-2 infection in 4880 cases from one hospital in

Comparison of different samples for 2019 novel coronavirus detection by nucleic acid amplification tests

Chest CT Findings in Coronavirus Disease-19 (COVID-19): Relationship to Duration of Infection

Sensitivity of Chest CT for COVID-19: Comparison to RT-PCR

Clinical Performance of SARS-CoV-2 Molecular Testing

Diagnostic performance between CT and initial real-time RT-PCR for clinically suspected 2019 coronavirus disease (COVID-19) patients outside Wuhan, China

Testing for SARS-CoV-2: Can We Stop at Two? Clin Infect Dis

Diagnosis of the Coronavirus disease (COVID-19): rRT-PCR or CT?

Frequency and Distribution of Chest Radiographic Findings in COVID-19 Positive Patients

Detection and analysis of nucleic acid in various biological samples of COVID-19 patients

Clinical Evaluation and Utilization of Multiple Molecular In Vitro Diagnostic Assays for the Detection of SARS-CoV-2

ddPCR: a more accurate tool for SARS-CoV-2 detection in low viral load specimens

Comparison of Four Molecular In Vitro Diagnostic Assays for the Detection of SARS-CoV-2 in Nasopharyngeal Specimens

Development of a reverse transcription-loop-mediated isothermal amplification as a rapid early-detection method for novel SARS-CoV-2

Evaluation of rapid diagnosis of novel coronavirus disease (COVID-19) using loop-mediated isothermal amplification

Real-time reverse transcription loop-mediated isothermal amplification for rapid detection of SARS-CoV-2

A Novel Reverse Transcription Loop-Mediated Isothermal Amplification Method for Rapid Detection of SARS-CoV-2

Rapid and visual detection of 2019 novel coronavirus (SARS-CoV-2) by a reverse transcription loop-mediated isothermal amplification assay

Multiple-centre clinical evaluation of an ultrafast single-tube assay for SARS-CoV-2 RNA

A Reverse-Transcription Recombinase-Aided Amplification Assay for Rapid Detection of the 2019 Novel Coronavirus (SARS-CoV-2)

Multiplexing primer/probe sets for detection of SARS-CoV-2 by qRT-PCR

Triplex Real-Time RT-PCR for Severe Acute Respiratory Syndrome Coronavirus 2

Development of an automatic integrated gene detection system for novel Severe acute respiratory syndrome-related coronavirus (SARS-CoV 2)

Comparison of a laboratory-developed test targeting the envelope gene with three nucleic acid amplification tests for detection of SARS-CoV-2

Novel One-Step Single-Tube Nested Quantitative Real-Time PCR Assay for Highly Sensitive Detection of SARS-CoV-2

Evaluation of the COVID19 ID NOW EUA assay

Comparison of two commercial molecular tests and a laboratory-developed modification of the CDC 2019-nCoV RT-PCR assay for the detection of SARS-CoV-2

Comparison of Abbott ID Now, Diasorin Simplexa, and CDC FDA EUA methods for the detection of SARS-CoV-2 from nasopharyngeal and nasal swabs from individuals diagnosed with COVID-19

Validation and verification of the Abbott RealTime SARS-CoV-2 assay analytical and clinical performance

Multi-Center Evaluation of the Cepheid Xpert Xpress SARS-CoV-2 Assay for the Detection of SARS-CoV-2 in Oropharyngeal Swab Specimens

Comparison of Commercially Available and Laboratory Developed Assays for in vitro Detection of SARS-CoV-2 in Clinical Laboratories

Multicenter Evaluation of the Cepheid Xpert Xpress SARS-CoV-2 Test

Rapid and sensitive detection of SARS-CoV-2 RNA using the Simplexa TM COVID-19 direct assay

Clinical Evaluation of the cobas SARS-CoV-2 Test and a Diagnostic Platform Switch during 48 Hours in the Midst of the COVID-19 Pandemic

Comparison of SARS-CoV-2 detection from nasopharyngeal swab samples by the Roche cobas 6800 SARS-CoV-2 test and a laboratory-developed real-time RT-PCR test

Interpret with caution: An evaluation of the commercial AusDiagnostics versus in-house developed assays for the detection of SARS-CoV-2 virus

Comparison of the Panther Fusion and a laboratory-developed test targeting the envelope gene for detection of SARS-CoV-2

Clinical performance of the Luminex NxTAG CoV Extended Panel for SARS-CoV-2 detection in nasopharyngeal specimens of COVID-19 patients in Hong Kong

Comparison of the Accula SARS-CoV-2 Test with a Laboratory-Developed Assay for Detection of SARS-CoV-2 RNA in Clinical Nasopharyngeal Specimens

Evaluation of the QIAstat-Dx Respiratory SARS-CoV-2 Panel, the first rapid multiplex PCR commercial assay for SARS-CoV-2 detection

Comparison of Abbott ID Now and Abbott m2000 methods for the detection of SARS-CoV-2 from nasopharyngeal and nasal swabs from symptomatic patients

Performance of Abbott ID NOW COVID-19 rapid nucleic acid amplification test in nasopharyngeal swabs transported in viral media and dry nasal swabs, in a New York City academic institution

Five-minute point-of-care testing for SARS-CoV-2: Not there yet

Clinical Evaluation of Three Sample-To-Answer Platforms for the Detection of SARS-CoV-2

Comparison of Cepheid Xpert Xpress and Abbott ID Now to Roche cobas for the Rapid Detection of SARS-CoV-2

The Detection of SARS-CoV-2 using the Cepheid Xpert Xpress SARS-CoV-2 and Roche cobas SARS-CoV-2 Assays

Comparison of Two High-Throughput Reverse Transcription-Polymerase Chain Reaction Systems for the Detection of Severe Acute Respiratory Syndrome Coronavirus 2

Clinical evaluation of a SARS-CoV-2 RT-PCR assay on a fully automated system for rapid ondemand testing in the hospital setting

Effect of changing case definitions for COVID-19 on the epidemic curve and transmission parameters in mainland China: a modelling study

Evaluation of diagnostic tests when there is no gold standard. A review of methods

Using a combination of reference tests to assess the accuracy of a new diagnostic test

Value of composite reference standards in diagnostic research

Diagnostic test evaluation methodology: A systematic review of methods employed to evaluate diagnostic tests in the absence of gold standard -An update

Food and Drug Administration CfDaRH. Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests