key: cord-0691492-arv7y8bd authors: Sempos, Christopher T; Tian, Lu title: Adjusting Coronavirus Prevalence Estimates for Laboratory Test Kit Error date: 2020-08-17 journal: Am J Epidemiol DOI: 10.1093/aje/kwaa174 sha: 3e67185a6406762bda3529276554b10641d76a65 doc_id: 691492 cord_uid: arv7y8bd Testing representative populations to determine the prevalence or percent of the population with active SARS-Cov-2 infection and/or antibodies to infection is being recommended as essential for making public policy decisions to open-up or to continue enforcing national, state and local government rules to “shelter-in-place”. However, all laboratory tests are imperfect and have estimates of sensitivity and specificity less than 100% - in some cases considerably less than 100%. That error will lead to biased prevalence estimates. If the true prevalence is low, possibly in the range of 1-5%, then testing error will lead to a constant background of bias that will most likely be larger and possibly much larger than the true prevalence itself. As a result, what is needed is a method for adjusting prevalence estimates for testing error. In this paper we outline methods for adjusting prevalence estimates for testing error both prospectively in studies being planned and retrospectively in studies that have been conducted. The methods if employed would also help to harmonize study results within countries and world-wide. Adjustment can lead to more accurate prevalence estimates and to better policy decisions. However, adjustment will not improve the accuracy of an individual test. Testing for the SARS2 Coronavirus (SARS-CoV-2) or those who had had the disease and have formed antibodies to it in representative populations is being recommended as essential for making public policy decisions to open-up or to continue enforcing national, state and local government rules to "shelter-in-place" (1, 2) . Important objectives of testing are to estimate either the percent of the population currently infected with SARS-CoV-2 or the percent of the population who have developed antibodies to SARS-CoV-2 after exposure, i.e. IgM and IgG (3-5,) . While cross-sectional studies are useful in estimating the current prevalence and trends in prevalence over time, it must be realized that all laboratory tests have measurement error. Two key statistics used to characterize laboratory test performance are sensitivity and specificity. Sensitivity is defined as the ability of a test to correctly identify those who have the disease (6) . It is calculated as the proportion of the population who test positive among those having the disease (Table 1) . Specificity, on the other hand, is defined as the ability of the test to correctly identify those who do not have the disease (6) . It is calculated as the proportion of the population who test negative among those who do not have the disease (7, 8) . Similarly, one may use positive predictive value (PPV) and negative predictive value (NPV) to characterize the laboratory performance. Specifically, the PPV is the probability that a positive test sample is confirmed to be a case. The NPV is the probability that a negative test sample is confirmed to be negative or a control sample. No laboratory test is 100% sensitive and specific and many will likely include substantial measurement error as recent results have shown (9) (10) (11) (12) . That measurement error will result in biased prevalence estimates. Consequently, it is important to understand the impact of laboratory test error, and how it changes with the true prevalence. There is an urgent need to develop a strategy to adjust for that error in estimating prevalence, which may affect other important population summary statistics such as case-fatality rate. In this paper, we will recommend a strategy to adjust prevalence estimates based on our experience, in successfully adjusting laboratory measurements of vitamin D as part of the Vitamin D Standardization Program, and tailored to the unique circumstances surrounding COVID-19 testing (13, 14) . To date most emphasis has been placed on the sensitivity of test kits to identify patients with SARS-CoV-2 infection using, for example, reverse transcription polymerase chain reaction testing (15) . That was done initially because the focus was on clinical diagnostic testing of people who displayed COVID-19 symptoms or who were at high risk of infection. The main concern was not to miss cases that should be treated and/or quarantined in order to prevent the spread of the infection. Many states have also encouraged universal testing for SARS-CoV-2 in specific populations. In addition, many states and local governments are attempting to document the percent of the population that has been infected with SARS-CoV-2 using serology assays under the assumption that those individuals may have developed immunity that will last for some period of time in order to determine how and when to relax the "shelter-in-place" decrees. Public Health England is conducting representative surveys to estimate the incidence of SARS-CoV-2 infection as well as trends in prevalence of antibodies to prior infection (16, 17) . The true COVID-19 prevalence estimate is currently thought to be quite low -possibly in the range of 0-5% -in many areas (18, 19) . In that case, it is essential to understand the impact of specificity in addition to sensitivity -as even small deviations of specificity from 100% may lead to identifying a set of positive samples which is largely composed of False-Positives. For example, assume that a cross-sectional study is being conducted to determine the percentage of the population that has developed antibodies to SARS-CoV-2. Moreover, assume that the testing kit of interest has outstanding performance characteristics: sensitivity = 100% and specificity = 99% ( Table 2 ). Also assume that the true COVID-19 prevalence rate, the proportion of the population with antibodies to SARS-CoV-2, among those tested is 1%. Then among 1 million persons tested, 10,000 COVID-19 cases will be correctly identified as True Positives by the test kit and there are no False Negatives -a sensitivity of 100% (Tables 1 and 2) . Among the 990,000 truly uninfected individuals, there will be 9,900 False Positives and 980,100 True Negatives based on a specificity = 99%. Therefore, the False Positive Rate -the proportion of those not infected with COVID-19 among all those who tested positive (3) -will be approximately equal to 50%, i.e. [9,900/(9,900 + 10,000)] × 100 = 49.7%. At a prevalence of 5%, the False Positive rate will still be as high as 17%. On the other hand, when the sensitivity and specificity are both 95% and the true prevalence is 1%, the false-positive rate will be 83.94% ( Figure 1 ). As the true prevalence increases, the false positive rate will decrease. However, at a true prevalence of 5% the false positive rate will still be as high as 50%. Consequently, studies to determine prevalence in representative samples need to have a plan imbedded in their study design to determine the sensitivity and specificity of the laboratory test kits used. Moreover, because the laboratory error will vary from study to study even if the same test kit is used it is essential that each study include a harmonization plan so provide a continuous quantitative result. Some serology tests may be semiquantitative, where a numerical value in arbitrary units is compared against a cut-off value to determine a positive result. Whether or not these tests provide a linear range of results is still being established (20, 21) . However, even in this situation it is still important to develop a framework which can be used to adjust the bias of crude prevalence estimates (18, 19) . The framework would consist of: (i) selecting an established well-validated test, with documented sensitivity and specificity as close to 100%, as possible, to use as the reference-point assay or test kit; (ii) Using that reference-point test kit to develop a series of true positive and true negative test samples; and (iii) Using that set of test samples to estimate the sensitivity/specificity of the study test kit or PPV/NPV of the study test kit in the study. As we will mention later it may also be important to know the sensitivity and specificity of the reference-point assay or test kit. This is similar to what assay manufacturers are required to do in their validation of test kits, but which is often carried out in the field. For example, the way in which samples are collected, the way the assay is used and cared for in the field and the way results are recorded may differ from the conditions and procedures used by the assay manufacturer to validate the assay. Those differences may contribute to measurement error. As described below, the framework for determining levels of sensitivity and specificity should resemble normal conditions of use, as much as possible, and take into account sources of error including those that may occur in the pre-analytical, analytic and post-analytic phases (22) (23) (24) . Estimates of sensitivity and specificity could then be used to adjust the crude prevalence estimates from representative surveys using equation #1 in Appendix. Specifically, the adjusted prevalence can be estimated using the following formula where the crude or observed prevalence is the proportion of the positive tests using the test kit, and sensitivity and specificity are their respective estimates. Moreover, if everyone uses the same framework in every US state and in countries around the world, then data could be pooled to provide even larger datasets that could be used to study COVID19 in greater detail. Harmonization is a process that brings laboratory results from different laboratories into alignment with each other (25, 26) . where PPV and NPV are their respective estimates. The first option provides sensitivity and specificity estimates so that we can adjust the prevalence; while the second provides PPV and NPV estimates, which can also be used to calculate the adjusted prevalence. In either case, the resulting adjusted prevalence is the same. Another possible modification to both options is to select two (or more) reference-point assays. One reference-point assay or test kit might have 100% sensitivity but unsatisfactory specificity level while another might be just the reverse. For example, an assay with 100% sensitivity could then be used to verify the studies positive samples while the assay with 100% specificity would be used to verify the negative study samples. As a result, using the two assays might then lead to a more precise test result. By the same token, it might also be possible for prevalence studies themselves to take the same approach and use two assays that complement themselves for sensitivity and specificity to increase the accuracy of case and non-case identification. In both cases, one test kit would be used to determine if RNA from the virus/or antibodies to the virus are present and in the other would be used to affirm that they are not present. In practice, the test results for the same sample from two assays may be correlated and the test performance by combining two assays needs to be assessed empirically. Based on how test materials are prepared we proposed two sets of different equations would be used to calibrate the population prevalence rate estimate to a specific method or methods based on sensitivity/specificity or PPV/NPV. Those two equations highlight the different approaches of the two suggested options. In the end, much work will need to be done to develop a working harmonization system based on either option. Three further examples can help to show the potential impact of test kit error (30) . On the other hand, this combination of sensitivity and specificity corresponds to a PPV of 44.2% and an NPV of 99.8%. The adjusted prevalence using option II would again be 4%, i.e., assuming that test kit sensitivity and specificity were both 95%, the adjusted percent is 34.4%. In this case adjustment, had little effect on the estimate of true prevalence. The third example has a direct application to the use of casually collected of SARS-Cov-2 positive test rate data to make a policy decision. The New York Public School system is going to use a 3% SARS-Cov-2 positivity rate to determine if school instruction will be in-person or virtual. Say, the observed positivity rate is 4%. Easy decision? However, if test kit sensitivity is 100% and specificity is 99% -a near perfect test kit -then the adjusted or true percentage is 3.03% (Appendix, Equation 1 ). Moreover, if the specificity is ever so slightly less -at 0.985 then the true positivity rate is 2.54%! In this case, a difference in specificity of 0.005 is the difference between in-person and virtual instruction. Surely, such consequential decisions demand the accuracy afforded by adjustment. These results reinforce the point discussed above and illustrated in Figure 1 , that when testing is restricted to symptomatic individuals, among which the true prevalence is high, the impact of test kit error is likely to be much less. But when testing is opened to all and especially in studies of representative samples where the true prevalence in many areas is likely to be small possibly on the order of 0-5%, adjustment for test kit error is essential in determining the true prevalence. Therefore, states need to adjust the crude estimates posted on websites, if possible, to interpret them properly. At this point in time, an essential point that must be emphasized, is that adjustment will neither change nor improve the accuracy of an individual test when you have a qualitative yes/no assay. That is not the case for continuous data (13, 39, 40) . Second, we acknowledge that we have left a great many details to be resolved. Developing a harmonization plan is a complex long-term effort. To plan for that, studies need to assess the sensitivity and specificity of the test-kit(s) to be used and to collect and bank duplicate/triplicate samples -nasal and throat swabs are used, or excess plasma/serum -for use in future efforts to retrospectively harmonize study data. Once a harmonization system is in place stored samples could be used to develop retrospective adjustment procedures. All laboratory assays contain measurement error which needs to be estimated empirically. That is true of all COVID-19 assays. In representative cross-sectional COVID-19 studies, even small deviations from 100% sensitivity and specificity will result in biased prevalence estimates. This is equally true for studies estimating the proportion of population who is currently infected and ones estimating the proportion of the population who have developed antibodies to past exposure. In this paper, we have outlined a series of steps that may be used to adjust representative studies for test kit error and to harmonize results over Author Affiliations This work was funded by National Institutes of Health Westgard as well as Mr. Lawrence Kessenich for their kind help with thoughtful comments and suggestions for improving the paper Conflict of Interest Statements: The authors have no conflicts of interest to declare Covid-19: testing times Coronavirus and the race to distribute reliable diagnostics Fundamental principles of epidemic spread highlight the immediate need for large-scale serological assays to assess the stage of the SARA-CoV-2 epidemic Researchers applaud Spanish COVID-19 serological survey. The Scientist Magazine Accessed 6/22/2020 Epidemiology: Beyond the LC eds. Infectious Disease Epidemiology Working document of Commission services. Current performance of COVD-19 test methods and devices and proposed performance criteria Test performance evaluation of SARS-CoV-2 serological assays Evaluation of nine commercial SARS-CoV-2 immunoassays Evaluation of a COVID-19 IgM and IgG rapid test; an efficient tool for assessment of past exposure to SARSCoV-2 Evaluation of antibody testing for SARS-Cov-2 using ELISA and lateral flow immunoassays Standardizing serum 25-hydroxyvitamin D data from four Nordic population samples using the Vitamin D Standardization Program protocols: shedding new light on vitamin D status in Nordic individuals Vitamin D Assays and the Definition of Hypovitaminosis D: Results from the 1 st International Conference on Controversies in Vitamin D COVID-19 Testing: The Threat of False-Negative Results Initial data from the COVID-19 Infection Survey Incidence of SARS-CoV-2 infection and prevalence of immunity to SARS-CoV-2 in the UK general population as assessed through repeated cross-sectional household surveys with additional serial sampling and longitudinal follow-up -an Office of National Statistics Survey Estimation of SARS-CoV-2 infection fatality rate by real-time antibody screening of blood donors EUA authorized serology test performance (06/17/2020) Centers for Disease Control and Prevention. Interim Guidelines for COVID-19 Antibody Testing Swabs collected by patients or health care workers for SARS-Cov-2 Testing Potential preanalytical and analytical vulnerabilities in the laboratory diagnosis of coronavirus disease 2019 (COVID-19 Key Facts in Covid-19 Testing -Westgard Roadmap for harmonization of clinical laboratory measurement procedures The roadmap for harmonization: status of the International Consortium for Harmonization of Clinical Laboratory Results Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR COVID-19). CDC Viral Test for COVID-19 Validation of a SARS-CoV-2 spike protein ELISA for use in contact investigations and serosurveillance Antibody responses to SAS-Cov-2 in patients with COVID-19 Role of the National Institute of Standards and Technology (NIST) in Support of the National Institutes of Health, Office of Dietary Supplement Vitamin D Initiative Standardization of measurements of 25-hydroxyvitamin D3 and D2 Specifications for trueness and precision of a reference measurement system for serum/plasma 25-hydroxyvitamin D analysis Establishing an Accuracy Basis for the Vitamin D External Quality Assessment Scheme (DEQAS) Hydroxyvitamin D Assays: An Historical Perspective From DEQAS Accuracy-Based Vitamin D Survey: Six Years of Quality Improvement Guided by Proficiency Testing The estimation of calibration equations for variables with heteroscedastic measurement error General Steps to Standardize the Laboratory Measurement of Serum Total 25-Hydroxyvitamin D The Vitamin D Standardization Program (VDSP) Manual for Retrospective Laboratory Standardization of Serum 25-Hydroxyvitamin D Data COVID-19 proficiency testing 900 (b) 19,900 (a+b) Negative 0 (c) 980,100 (d) %) = a/(a+b) x 100 = 10,000/19,900 x 100 = 50% Negative Predictive Value (NPV) (%) = d/(c+d) x 100 = 980,100/980,100 x 100 = 100% b Assumptions: Sensitivity = 100%