key: cord-0731284-rbhv7ezx authors: Brooks, Zoe C; Das, Saswati title: COVID-19 Testing: Impact of Prevalence, Sensitivity, and Specificity on Patient Risk and Cost date: 2020-08-28 journal: Am J Clin Pathol DOI: 10.1093/ajcp/aqaa141 sha: 12c3d535ca2d1052a4266707d4e9af83dcda1d9e doc_id: 731284 cord_uid: rbhv7ezx OBJECTIVES: To illustrate how patient risk and clinical costs are driven by false-positive and false-negative results. METHODS: Molecular, antigen, and antibody testing are the mainstay to identify infected patients and fight the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). To evaluate the test methods, sensitivity (percent positive agreement [PPA]) and specificity (percent negative agreement [PNA]) are the most common metrics utilized, followed by the positive and negative predictive value—the probability that a positive or negative test result represents a true positive or negative patient. The number, probability, and cost of false results are driven by combinations of prevalence, PPA, and PNA of the individual test selected by the laboratory. RESULTS: Molecular and antigen tests that detect the presence of the virus are relevant in the acute phase only. Serologic assays detect antibodies to SARS-CoV-2 in the recovering and recovered phase. Each testing methodology has its advantages and disadvantages. CONCLUSIONS: We demonstrate the value of reporting probability of false-positive results, probability of false-negative results, and costs to patients and health care. These risk metrics can be calculated from the risk drivers of PPA and PNA combined with estimates of prevalence, cost, and Reff number (people infected by 1 positive SARS-CoV-2 carrier). and -negative test results are driven by risk drivers of prevalence of SARS-CoV-2, or for antibodies in the test population, percent positive agreement (PPA; sensitivity) and percent negative agreement (PNA; specificity) of each test process. The gold standard at present for diagnosing suspected cases of COVID-19 is molecular testing, such as real-time reverse transcription polymerase chain reaction (RT-PCR), which is a nucleic acid amplification test that detects unique sequences of SARS-CoV-2. 5 Antigen tests that also detect the presence of SARS-CoV-2 do not amplify viral components and are less sensitive (more likely to produce a false-negative result) than molecular tests. Negative antigen tests should be confirmed with a molecular test before considering a person negative for COVID-19. Molecular and antigen tests detect patients in the acute phase only. A study by Yong et al 6 illustrated the shortcomings of RT-PCR as the only diagnostic method in surveillance, because of its inability to detect past infection, and the added value of serologic testing. Serology tests can detect both active and past infections if the antibodies are captured within the relevant timeframe after the onset of the disease. 7 Serologic assays detect IgG and IgM antibodies to SARS-CoV-2, which develop 1 to 3 weeks after infection. Testing for IgG may be a superior marker of sustained immunity to SARS-CoV-2. 8 More scientific data on the immune response to SARS-CoV-2 is required to design evidence-based recommendations for all testing scenarios and interpretation guidelines. 9 On May 27, 2020, the Centers for Disease Control and Prevention issued interim guidelines for COVID-19 antibody testing, stating "Although serologic tests should not be used at this time to determine if an individual is immune, these tests can help determine the proportion of a population previously infected with SARS-CoV-2 and provide information about populations that may be immune and potentially protected. Serologic test results may assist with identifying persons who may qualify to donate blood that can be used to manufacture convalescent plasma as a possible treatment for those who are seriously ill from COVID-19." Contrary to early hopes to use serologic testing to issue "immunity passports" to return to work and society, the CDC now states clearly that "Serologic test results should not be used to make decisions about returning persons to the workplace." 9 ❚Table 1❚ describes the purpose of the 3 types of tests, with advantages, disadvantages, and risks. The International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) guide 51 defines risk as "the combination of the probability of occurrence of harm and the severity of that harm." 10 To evaluate/select test methods, laboratory professionals usually compare sensitivity (PPA) and specificity (PNA), followed by positive predictive value (PPV) and negative predictive value (NPV), the probability that a positive or negative test result represents a true-positive or -negative patient, respectively, in the population tested. These metrics alone do not adequately or easily project the levels of patient risk or clinical costs associated with each test method. To estimate the probability of harm, we calculated the probability that a positive result is a false positive (PFP) and probability that a negative result is a false negative (PFN). PFP is the number of false-positive results as a percent of all positive results. PFP is the remainder of PPV; PFP = 1 -PPV. PFN is the number of false-negative results as a percent of all negative results. PFP is the remainder of NPV; PFN = 1 -NPV. We roughly estimated the cost of false results, and from those we projected the severity of harm as the costs incurred by patients and health care institutions. PPA and PNA are inherent to the test method. Probabilities of true and false results in clinical settings change with prevalence of the virus or antibody in the population tested. "In a population where the prevalence is 5%, a test with 90% sensitivity and 95% specificity will yield a positive predictive value of 49%. In other words, less than half of those testing positive will truly have antibodies. Alternatively, the same test in a population with an antibody prevalence exceeding 52% will yield a positive predictive value greater than 95%, meaning that less than one in 20 people testing positive will have a falsepositive test result." 11 As of May 4, 2020, the Food and Drug Administration (FDA) required that clinical agreement data should demonstrate a minimum overall 90.0% PPA (sensitivity) and 95.0% PNA (specificity). 11 Most, but not all, values for sensitivity and specificity reported by the FDA on May 21, 2020, meet their goals. In the United Kingdom, recommended standards are set higher, at 98% PPA and 98% PNA. 12 Recommendations are theoretical goals, and manufacturers' test results are created under controlled ideal conditions. The Foundation for Innovative New Diagnostics (FIND), working in partnership with the WHO, maintains a diagnostics resource center that includes an interactive dashboard showing SARS-CoV-2 sensitivity and specificity, as assessed in laboratory on-site evaluation studies. 13 We chose to model their meta-analysis results as the baseline in simulations, as we believe these are more representative of current test performance in use in testing laboratories. ❚Table 2❚ shows baseline FIND PPA and PNA values for each test type, plus the number of different sample types, companies, individual test names, test formats, or targets detected. Index sample types include nasopharyngeal swab, lower respiratory system, sputum, tracheal aspirate, capillary blood, serum, and plasma. Test formats include integrated systems, manual isothermal amplification, manual PCR, rapid diagnostic tests (with and without reader), chemiluminescence immunoassay, enzyme-linked immunosorbent assay, and more. Notice the large number of companies and test names. Targets include RNA with and without extraction, nucleocapsid protein, nucleoprotein antigens, IgG, IgM, and IgA. We modeled the impact of +/-10% in PPA (sensitivity) from baseline. We modeled up to 100% PNA (specificity), with a lower limit of -10% from baseline. Prevalence of the SARS-CoV-2 virus and antibody is unknown and may vary widely between locations. Estimating prevalence is complicated by the existence of false-positive and falsenegative tests. We modeled changes in prevalence for all tests from 2% to 20%, with an estimated baseline of 11%. The impact of the risk drivers of prevalence, PPA and PNA, on the risk metrics of PFP and PFN are shown in ❚Figure 1❚, ❚Figure 2❚, and ❚Figure 3❚. Figure 1 illustrates how increasing prevalence of truepositive samples impacts the PFP and the PFN. The number of patients who are positive for the SARS-CoV-2 virus or antibody increases with prevalence. Prevalence is governed Figure 2 shows the impact of modeled changes of +/-10% from baseline PPA for each test type, with prevalence and PNA constant at baseline. Higher PPA indicates a larger percent of positive test results in true-positive samples. True-positive test results increase, but the number of false positives is not affected by PPA. As true-positive tests increase with PPA, the constant number of false-positive tests (that are driven by PNA) forms a smaller portion of all positive results, decreasing PFP from 30.3% to 26.2% for molecular tests. Antigen tests have a lower range of PPA and a higher PNA, causing a smaller change in PFP from 20.2% to 17.2%. As PPA increases for antibody tests, PFP decreases from 36.6% to 32.1%. As PPA increases, the number of truepositive test results increases and false negatives decrease. Figure 3 shows the impact of modeled changes in PNA on false test results for each test type. When PNA reaches 100%, all negative results are true negatives and the probability of false positives decreases to zero. As PNA increases from 86.3% to 100%, PFP decreases from 56.3% to 0% for molecular tests. Antigen tests have a higher range of PNA (88.4%-100%) with ❚Figure 3❚ Impact of changes in percent negative agreement (PNA) (specificity) on false results with baseline prevalence and percent positive agreement (PPA). a resultant change in PFP from 60.3%% to 0%. Antibody tests, with a range of PNA from 86.0% to 100%, show a range of PFP decreasing from 62.3%% to 0%. Notice that PPA had less impact than prevalence or PNA on probability of false-positive tests. As PNA increases, the number of true-negative results increases; false negatives are unchanged in number but form a smaller portion of all negatives, driving the PFN down. PFN decreases from 2.0% to 1.7% for molecular, 5.1% to 4.5% for antigen, and 4.3% to 3.8% for antibody tests. Laboratories invest a great deal of effort in test selection to minimize patient risk and clinical cost caused by false results. Table 1 presented the different clinical interpretation of each type of test. False-positive and falsenegative results drive patient risk and clinical care costs. The authors estimated costs for the United States in May 2020 as shown in ❚Table 3❚, with the understanding that these are rough estimates. The potential harm of false-positive and falsenegative results, 14 as discussed in Table 1 , is applied in ❚Figure 4❚, ❚Figure 5❚, ❚Figure 6❚, and ❚Figure 7❚ to create a rough estimate of patient and clinical care costs for the United States. These costs are used as a model to illustrate the process of converting risk drivers of prevalence plus method PPA (sensitivity) and PNA (specificity) to risk metrics of the number and cost of erroneous results. Figure 4 shows how costs are applied to true-and falsepositive patient samples. Individual costs were roughly estimated based on research and opinion. The total cost for each sample is calculated by adding all the checked costs and multiplying by the Reff where indicated. An online calculator is available at https://awesome-numbers.com/risk-calculator/ for readers to modify costs and model various scenarios with user-input variables of prevalence, PPA, PNA, and Reff. • Health care system costs to obtain, perform and report the test were roughly estimated to be $200. • Although costs are much higher for hospitalized patients, "A single symptomatic COVID-19 case could incur a median direct medical cost of $3,045 during the course of the infection alone." 15 • A report from Johns Hopkins University put the cost of hiring 100,000 new community health workers for contact tracing at an estimated $3.6 billion, and the Association of State and Territorial Health Officials has echoed that estimate as the minimum requirement in a memo to Congress. 16 having SARS-CoV-2 antibodies, such as persons with a history of COVID-19-like illness. or 3. Employ an orthogonal testing algorithm in which persons who initially test positive are tested with a second test." 19 We set the cost to confirm positive antibody tests at $50, as a new sample is not required. true-positive result, multiplied by (1 + Reff) to account for other people infected. With Reff set at 1.0, falsenegative molecular tests cost $11,290. False-negative antigen tests are confirmed with an orthogonal test to incur total costs of $400. False-negative antibody tests incur the same costs as true negatives for testing plus self-isolation for antibody tests ($1,600.) Figure 5 presents the impact of increased prevalence on cost of false results. The x-axis represents the modeled value of prevalence; the y-axis shows patient and clinical cost of error per 1,000 samples tested. Cost of false-positive results decreases slightly as prevalence increases because the number of true-negative samples decrease from 980 to 800 per 1,000 samples. False-positive tests are a fraction of true-negative samples, which is driven by PNA. The number of true-positive samples increases from 20 to 200 per 1,000 samples as prevalence increases from 2% to 20%, driving up true-positive and false-negative test results and costs. False-negative tests are a fraction of true-positive samples, which is driven by PPA. Costs vary between test types due to variation in baseline PPA and PNA. Costs are based on patient and clinical cost in Figure 4 . Costs of each false-negative molecular test result are much higher than other tests. Figure 6 shows the impact of PPA on cost of false results, with prevalence and PNA at baseline. The x-axis shows the baseline PPA for each test type +/-10%; the y-axis shows patient and clinical costs as shown in results down. Again, because false-negative molecular tests cost more than false-negative antigen or antibody tests, their costs show the greatest impact. If one looks only at the statistical indicator of probability of false results, the impact on cost is not apparent. Figure 7 shows the impact of PNA on cost of false results, with prevalence and PPA at baseline. The x-axis shows the baseline PNA for each test type +/-10% (to a maximum of 100%); the y-axis shows patient and clinical costs as shown in PNA has no impact on false-negative test results. Again, because false-negative molecular tests cost more than false-negative antigen or antibody tests, their costs show the greatest impact. Table 3 presents the total cost per 1,000 samples tested of false results for each test type, with modeled variations in risk drivers of prevalence, PPA and PNA. In each case, molecular tests carry the greatest risk of cost of false results due to the high cost of false-negative results. The specific numbers vary with baseline prevalence, PPA, PNA, and costs of each test type. The authors combined PPA and PNA values from user evaluation studies with estimates of prevalence, cost, and Reff number to illustrate a model showing how patient risk and clinical cost are driven by test selection. Knowledge of the PFP and the PFN add valuable information to method evaluation and review. Statistical indicators of PPA, PNA, PPV, NPV, PFP, PFN, or even the number of false results alone cannot evaluate risk as the patient risk and clinical cost of the analytical method selected. It would be worthwhile repeating this exercise with locally verified costs, prevalence, and Reff number. The authors have posted an online calculator at https:// awesome-numbers.com/risk-calculator/ to allow readers to simulate changes with their projected variables and estimates of cost in local currency. ISO/IEC guide 51 defines risk as "the combination of the probability of occurrence of harm and the severity of that harm." 10 Examination of only PPA and PNA does not give an indication of patient risk as the number and clinical cost of false results. Risk as the probability and severity of false-positive and false-negative results can be extrapolated from manufacturers' claims and/or user data for PPA and PNA plus estimates of prevalence, Reff number, and cost for your health care setting. Reff values for each US state can be found at https://rt.live/. 20 We estimated costs roughly for the United States but did not enter a value for loss of life in our equations, as human life is invaluable. It may be wise, if difficult, to factor that in when evaluating cost in your location and currency. The relationships between the various acronyms are confusing. Increased PPA (sensitivity), percent positive agreement, drives the number and cost of false-negative results down, but has no impact on false positives. Increased percent negative agreement, PNA (specificity), drives the probability of false positives (PFP) and the resultant patient risk and health care cost down. PNA (specificity), percent negative agreement, has no impact on false negatives. We found it thought-provoking that, as prevalence increases from 2% to 20%, cost of false molecular test results increase by over $250,000 for every 1,000 molecular tests performed. This happens because the number of true-positive and very costly false-negative tests increase in proportion to prevalence. With the baseline PNA of 95.8%, there are few false-positive results (41 at prevalence of 2% and 33 at prevalence of 20%), and the decrease in their cost make little difference to the total costs. For each 1,000 samples tested, selecting a molecular test with PPA of 94.8% instead of 77.5% would save patients and the health care system over $200,000. A test with PNA of 100.0% instead of 86.3% reduces patient and clinical cost by over $300,000. Similar patterns were observed for antigen and antibody tests. Acceptable risk is "a state achieved in a measuring system where all known potential events have a degree of likelihood for or a level of severity of an adverse outcome small enough such that, when balanced against all known benefits-perceived or real-patients, physicians, institutions, and society are willing to risk the consequences." 22 The COVID-19 pandemic has brought "patients, physicians, institutions, and society" together as never before; ask them if they are willing to risk the consequences of your chosen method. What is their maximum acceptable risk level as the number and cost of false results? Although methods report a qualitative result, these are typically based on quantitative measurements and cutoff levels. The same concept can be applied to risk-based standards through on-site method validation experiments and daily quality control to maintain risk within acceptable risk limits. Three types of laboratory tests play critical roles in the diagnosis and management of COVID-19. The existing practice of examining PPA and PNA fails to project risk as the probability and severity of harm. The PFP decreases as prevalence and PNA increase. The PFN increases with prevalence and decreases with PNA. Measuring risk metrics as the number and cost of false results adds a great deal of insight that is masked by the usual statistical metrics. Patient risk and clinical cost are governed by the number, clinical implications, and cost of false-positive and false-negative patient results for each test type. Small changes in statistical metrics can produce large changes in risk metrics. Knowledge of the clinical implications and cost of false-positive and -negative test results can add valuable insight to test selection and guide decisions of repeating test results for confirmation with an orthogonal method. We provided an online calculator to encourage and enable future studies with localized statistical indicators and cost. Corresponding author: Zoe C. Brooks, ART; zoe@awesomenumbers.org. World Health Organization. Pneumonia of unknown cause-China: disease outbreak news WHO Director General's opening remarks at the media briefing on COVID-19-11 Clinical characteristics of coronavirus disease 2019 in China Novel coronavirus (2019-nCoV) situation reports The novel coronavirus (2019-nCoV) outbreak: think the unthinkable and be prepared to face the challenge. Diagnosis (Berl) Connecting clusters of COVID-19: an epidemiological and serological investigation Serology characteristics of SARS-CoV-2 infection since the exposure and post symptoms onset Serological and molecular findings during SARS-CoV-2 infection: the first case study in Finland Immunity passports" in the context of COVID-19 Safety Aspects-Guidelines for Their Inclusion in Standards emergency-situations-medical-devices/eua-authorized-serologytest-performance Target product profile: antibody tests to help determine if people have recent infection to SARS-CoV-2: Version FIND Foundation for Innovative New Diagnostic. Test performance dashboard False negative tests for SARS-CoV-2 infection-challenges and implications The potential health care costs and resource use associated with COVID-19 in the United States A national plan to enable comprehensive COVID-19 case finding and contact tracing in the US Coronavirus (COVID-19) update Centers for Disease Control and Prevention. Interim guidelines for COVID-19 antibody testing Rt Covid-19 Effect of alert level 4 on effective reproduction number: review of international COVID-19 cases. medRxiv Laboratory Quality Control Based on Risk Management, Approved Guideline. CLSI Document EP23-A