key: cord-0986622-xuk9ceo6 authors: Pilcher, Christopher D; Westreich, Daniel; Hudgens, Michael G title: Group Testing for Sars-Cov-2 to Enable Rapid Scale-Up of Testing and Real-Time Surveillance of Incidence date: 2020-06-27 journal: J Infect Dis DOI: 10.1093/infdis/jiaa378 sha: 8c10b05b53886f2e8264d2036d0ccfbde290307c doc_id: 986622 cord_uid: xuk9ceo6 High-throughput molecular testing for SARS-CoV-2 may be enabled by group testing in which pools of specimens are screened, and individual specimens tested only after a pool tests positive. Several labs have recently published examples of pooling strategies applied to SARS-CoV-2 specimens, but overall guidance on efficient pooling strategies is lacking. Therefore we developed a model of the efficiency and accuracy of specimen pooling algorithms based on available data on SAR-CoV-2 viral dynamics. For a fixed number of tests, we estimate that programs using group testing could screen 2 to 20 times as many specimens compared to individual testing; increase the total number of true positive infections identified; and improve the positive predictive value of results. We compare outcomes that may be expected in different testing situations and provide general recommendations for group testing implementation. A free, publicly-available web calculator is provided to help inform laboratory decisions on SARS-CoV-2 pooling algorithms. Molecular tests of nasopharyngeal (NP) swab fluid for virus RNA remain the test of choice for early detection of SARS-CoV-2 infection, to identify new cases and to assess individual contagiousness. However, the high cost, limited throughput and imperfect specificity of molecular tests make them poorly suited to large scale testing of populations with low expected rates of positivity. Blood banks and HIV testing programs have addressed the problem of high-throughput molecular test screening for acute viral infections using group testing. [1, 2] In group testing, we first screen pools of specimens; when a pool is negative we declare the specimens in it negative; when a pool is positive, we re-test sub-pools or individual specimens, depending on the strategy [2] . Several labs have recently published clinical validation studies in which SARS-CoV-2 RNA positive NP specimens from patients with COVID-19 have been tested in pools with RNA-negative clinical specimens [3] [4] [5] [6] [7] . These papers have examined using unmodified assays and ad hoc pooling strategies comprising (variously) 5, 10, 32 and 64 total specimens. None of these studies have documented PCR inhibition arising from pooling NP fluid samples. Furthermore, two studies have confirmed that the analytic sensitivity of SARS-CoV-2 RNA PCR assays is lowered as expected when RNA in positive samples is diluted by negative samples in pools [3, 4] . For example, Abdalhamid et al. [3] used an assay from the CDC to test clinical specimens in pools of 5. Compared with individual testing, pooled testing resulted in cycle threshold (Ct) values that were on average 2.24 and 2.67 Ct higher (for targets N1, N2), consistent with an increase of log2(5) = 2.32 Ct expected with 5-fold dilution (calculated from Table 1 in [3] ). Based in part on these clinical validations, China used group testing to screen the population of Wuhan [8] and A c c e p t e d M a n u s c r i p t 5 regional programs using expanded group testing are ongoing in Israel [7] and the US state of Nebraska [3] . Unfortunately, NP swab group testing for SARS-CoV-2 has met with widespread skepticism. This is based in part on the perception that individual-specimen NP swab testing already has a "sensitivity problem": diagnostic sensitivity in symptomatic patients has been found to be in the range of 60 to 90 percent. [9, 10] To the extent that specimens have low virus loads, pooling dilution will reduce clinical detection even further. [2] In this study, we seek first to describe the distribution of NP RNA viral loads that actually occur during the initial "detection window" of acute SARS-CoV-2 infection. Second, we use these data to estimate how group testing will impact the outcomes of SARS-CoV-2 molecular testing efforts. Finally, we provide preliminary guidance for immediate implementation of efficient group testing algorithms for SARS-CoV- Estimating the effects of testing and pooling approaches on testing outcomes requires some knowledge of the distribution of biomarkers that will be found in the testing population. In testing for acute infections, the problem is complicated by the rapid flux of viral loads as well as antibody levels over time in infected individuals. If these dynamics are well described, however, and individuals arrive for testing uniformly during the detection window, one can estimate changes in clinical case detection compared to individual testing. Specifically, uniform presentation during the A c c e p t e d M a n u s c r i p t 6 detection window allows the problem to be reduced to measuring how testing choices affect the length of time that an average individual can be detected. We therefore sought to describe nasopharyngeal viral loads that occur in infected individuals during the time window when RNA is detectable by standard PCR assays. We reviewed recent papers [9] [10] [11] [12] [13] [14] [15] containing data on SARS-CoV-2 viral dynamics. Most presented individual level data in visual form only. Two papers displayed NP viral loads from multiple individuals who had been frequently sampled within days of first detection [14] or last detection [13] ; these plots appeared to show a rapid rise and similarly rapid fall in viral load on either side of the detection window. In an analysis differentiating non-critical from critical cases of COVID-19, Tan and colleagues [10] confirmed this abrupt onset and equally abrupt ending of shedding among non-critical patients. They also showed that within a few days after symptoms NP RNA was already detectable at peak levels by PCR, an average of 14 cycles before cutoff (suggesting that average viral loads were at least 4.2 log10 above cutoff.) These typical dynamics have been contrasted in several papers by the dynamics in patients with critical COVID-19, documenting delayed onset of nasopharyngeal shedding [14] , very high levels of peak shedding [15] , slower viral decay [10, 15] and a longer detection window [9, 10] among critically ill individuals. Based on this information, we proposed a model of respiratory virus dynamics with the intent of conservatively representing SARS-CoV-2 dynamics in individuals during the detection window of typical (i.e., non-critical) acute infection. The model is illustrated in Figure 1(a) . Parameters were as follows: detection window = 14 days [10, 12] ; peak viral load = 4.2 log10VL [10] ; rate of viral increase was +1.0 log10VL per day; and slope of viral decay was -1.0 log10VL per day. As a check on these parameters we used this model to estimate the distribution of viral loads that would be expected in a hypothetical testing population of individuals who all followed average dynamics and presented uniformly for testing. The distribution predicted by the model agreed closely with the distributions of SARS-CoV-2 viral load found in recent studies [3, 9] among individuals first testing positive for SARS-CoV-2. (Figure 1(c) ). Estimating dilution effects. We next used the above viral dynamic model to estimate the sensitivity of pooled testing for SARS-CoV-2 RNA, as follows: first, we calculated the average detection window that would be expected with and without dilution, as illustrated in Figure 1(b) . We calculated the estimate of pooled testing sensitivity (relative to individual testing sensitivity) by dividing the pooled detection window by the individual detection window. These estimates assume that infected individuals are equally likely to present at all times during the detection window. These estimates also assume ordinary specimen pooling procedure, wherein specimens are processed individually before pooling, and the same volume of fluid is put into the assay from pools or from individual specimens. Finally, they assume that similar interpretation criteria (e.g., detection cutoffs for positive status) are used for both individual specimens and pools. Estimating testing program outcomes. The above model and estimation procedures were used to adapt a previously described software package designed to optimize group testing in acute HIV [2] . All calculations assumed a representative assay would be used with analytic sensitivity 0.95 and specificity 0.99, both of which we judged to be reasonable for molecular testing. To determine a possible upper limit on pool size we estimated the maximum fold-dilution of specimen that would reduce the analytic sensitivity of pooled testing by less than 20% compared to individual testing. Here we distinguish "analytic sensitivity" and "diagnostic sensitivity" similarly to Saah and Hoover [16] . Specifically, diagnostic sensitivity is the probability a testing protocol correctly identifies an individual with COVID-19 as infected (and as noted above, may be as low as 60% in some clinical circumstances [9, 10] ); analytic A c c e p t e d M a n u s c r i p t 8 (or test) sensitivity is the probability an assay correctly classifies as positive a sample with viral load above the molecular level of detection. Reductions in analytic sensitivity result in proportional, relative reductions in diagnostic sensitivity. For example, suppose a testing protocol which does not involve pooling has a diagnostic sensitivity of 70%. If adding specimen pooling reduces the analytic sensitivity by 10%, then the new diagnostic sensitivity would be 70% × 90% = 63% [16] . We addressed outcomes of two kinds of pooling strategies: two-stage (N:1, that is one pool of N specimens, followed if necessary by retesting of individual specimens) and three-stage (kN:N:1, where typically k=N and so N 2 :N:1; that is, one pool of kN specimens, followed if necessary by retesting of k pools each containing N specimens, followed if necessary by retesting of individual specimens in individual positive sub-pools). We identified group testing algorithms for either strategy that would increase specimen throughput, increase actual case identification and increase the positive predictive value of results, at levels of prevalence ranging from 1 per 1,000 to 10 percent positive tests. We reported on the following outcomes: average time to results (measured as mean number of rounds of testing, assuming individual testing requires one round); efficiency, defined as expected number of specimens screened (or alternatively, individual results obtained) per molecular assay used, where individual testing allows screening of one specimen per assay; reduction in sensitivity compared to individual testing (given above assumptions); and positive predictive value (PPV). The PPV calculation assumes uncorrelated errors between rounds of testing, under which assumption group testing usually leads to substantial increases in PPV. [2] A free, publicly-available web calculator of the model is available to help inform laboratory decisions on SARS-CoV-2 pooling algorithms (http://www.bios.unc.edu/~mhudgens/SARS-CoV-2.pooling.home.html). A c c e p t e d M a n u s c r i p t 9 The proposed viral dynamic model is summarized in Figure 1 . In Figure 1(c) , the distribution of viral load values predicted by the model is shown to be similar to those published in clinical studies of SARS-CoV-2 testing. For example, our model predicted that 50% of viral loads would have Δ-Ct values greater than 11.6, similar to the 50% of viral loads greater than 11 Δ-Ct found by Zhao and colleagues among newly positive testers with non-critical COVID-19 in Hong Kong [9] . The model indicated that pool sizes greater than 25 are expected to reduce analytic sensitivity by more than 20 percent, calculated as follows. First, as illustrated in Figure 1(b) , our assumption of uniform presentation during the detection window means that a 20% loss of analytic sensitivity is due to a loss of These SARS-CoV-2-specific estimates were then incorporated into a previously described software package [2] . Predicted outcomes are shown for selected algorithms in Table 1 and in Figures 2 and 3 . Gains in efficiency appeared to be large, allowing 2 to 20 times the number of specimens to be processed with the same number of tests. When the prevalence was greater than 1%, simple pooling schemes and smaller pools (e.g., 6:1 "mini-pools" for prevalence (p) =0.05) were more comparable in efficiency to larger and/or more complex pooling schemes. Below 1% prevalence, larger mini-pools could be several-fold more efficient (in terms of results per test used) than 5:1 minipools (Figure 3) . When prevalence was 1 in 1000, larger pools and particularly three-stage pools were substantially more A c c e p t e d M a n u s c r i p t 10 efficient (Table 1, Figure 3) . Below 1% prevalence, adding the intermediate pool stage generally resulted in much higher testing efficiency. The results of this analysis suggest group testing schemes should be effective at expanding the capacity and throughput of molecular testing for SARS CoV-2. Simple-to-implement algorithms can allow between 2 and up to 20 results to be generated for every molecular test used, depending on the testing scenario. The highest gains in efficiency and testing performance were predicted for testing situations where the expected prevalence of disease is low; because testing is at present generally limited to those with high likelihood of having the disease, this means that the potential for greater efficiency with group testing in such low-prevalence settings has been underestimated. Situations with low expected prevalence of disease include screening low-risk, asymptomatic health care workers, performing universal testing in health care facilities or in the general population, or large surveillance studies. Indeed, it is difficult to see how molecular testing can be rapidly scaled up in such settings without group A c c e p t e d M a n u s c r i p t 11 testing. It is equally difficult to see how alternatives such as antibody tests, which identify past infection, will be able to identify new cases in an ongoing epidemic. Importantly, our results indicate that the 5-specimen mini-pool protocol recently demonstrated by Abdalhamid and colleagues [3] can increase efficiency even when 10 to 25% of samples are positive, and could thus be an effective standard protocol. Having a standard mini-pool protocol allows labs to tailor testing for specific sample sets. For example, a laboratory processing samples from a general population survey would anticipate much greater efficiency at larger pool sizes of 15 or 25 and perhaps by using three stage testing (Figure 3) ; the larger pools needed could be simply created by pooling the standard mini-pools at the end of a standard processing procedure, given sufficient fluid volume. It is essential to reiterate that pooling can sacrifice the analytic sensitivity of molecular tests for some low viral load specimens [2] [3] [4] [5] [6] . However, early results suggest that SARS-CoV-2 infection has faston/fast-off viral dynamics in NP fluid [13, 14] , making it an ideal candidate for group testing. In particular, the rate of viral increase in acute infection in SARS Cov-2 appears to be substantially faster than HIV-1 [17, 18] , and group testing is successful and indeed standard for acute HIV-1 [1] . As more information on window of detection and speed of viral load increase and/or decrease becomes available, our viral dynamics model should be reassessed, and the effects of pool size on dilution and analytic sensitivity should be reevaluated. In particular if the rise and fall of viral load is more gradual than we assumed, smaller maximum pool sizes may be desirable. However, some loss of sensitivity due to pooling may be acceptable, because at present there are insufficient molecular tests available for all the individuals who need them in many settings. In such a situation, the comparison is not between pooled sensitivity and sensitivity of individual testing (as individual testing is an impossibility), but rather between pooled A c c e p t e d M a n u s c r i p t 12 sensitivity and the sensitivity of not testing at all. (Not testing, of course, has a sensitivity of 0% (95% CL 0%, 0%).) Adding serologic tests will help address the problem of diagnostic sensitivity of SARS CoV-2 RNA testing. For instance, Zhao and colleagues [9] showed that even in the first week of illness, when antibody tests had lower diagnostic sensitivity (38 percent) than NP RNA testing (67 percent), combining antibody and RNA results increased diagnostic sensitivity to 79%. In serial antibody/viral load testing algorithms (for instance, where only NP specimens from antibody-negative individuals are tested), group testing for viral RNA has been shown to be especially sensitive and efficient [1, 2] . Removing antibody-positives reduces the number of RNA positives in a sample set, and reduces the proportion that contain low viral loads. Estimating algorithms for this situation would require a modification of our present model taking antibody test dynamics into account. The model of viral dynamics for this study were based on rapidly emerging clinical data. As more groups report results of group testing and viral dynamic studies, the assumptions of our model may change and our group testing web tool will be updated accordingly and transparently, included a linked change log. In summary, the need for group testing to make widespread high throughput molecular testing feasible is clear. While caution around group testing has been reasonable, numerous types of data now suggest that SARS CoV-2 is an ideal candidate for group testing. There are logistical issues involved [2] that can be especially challenging for smaller laboratories. However, the Wuhan example of testing 6.5 million residents over a period of days in June 2020 shows that group testing can be efficiently implemented at scale [8] . The specimen pooling protocols needed for such efforts are published [3, 4, 5] . These can be cleanly implemented as long as laboratories are assured they will have appropriate regulatory A c c e p t e d M a n u s c r i p t 13 clearances, reimbursement, and technical support. Regardless of location, the first step is for authorities to task large laboratories (both public health and commercial) with expanding testing, and to encourage group testing be used if test availability is a limiting factor. A c c e p t e d M a n u s c r i p t 18 Table 1 CoV-2. Pool sizes suggested are those predicted to give the highest number of specimens screened per test used, while not reducing analytic sensitivity by more than 20 percent in a specimen pool. All estimates reflect assumed viral dynamics, dilution effects and baseline assay performance in individual specimens (see text); they also assume that assay results are interpreted similarly (e.g., using the same cycle threshold for a qPCR assay) when testing pools or individual specimens. test used, while not reducing analytic sensitivity by more than 20 percent in a specimen pool. All estimates reflect assumed viral dynamics, dilution effects and baseline assay performance in individual specimens (see text); they also assume that assay results are interpreted similarly (e.g., using the same cycle threshold for a qPCR assay) when testing pools or individual specimens. 1 Two-and three-stage algorithms are described in METHODS, Estimating testing program outcomes. 2 Time to results is estimated as the mean number of testing rounds required to obtain all results, since the time to completing a run will vary according to the assay and platform used by a laboratory. Here individual testing is assumed to require one round; in group testing most negative results require one round but some two or three; and all positive results require two (two-stage testing) or three (three-stage testing) rounds. These estimates depend on sensitivity and specificity. 3 The number of test results generated in each group testing scenario were divided by the number of assays used in the process; this can be implicitly compared to individual testing where this ratio is always 1. The ratio of resultsto-tests indicates the increase in testing capacity that a laboratory can expect with an algorithm where test kits and supplies are the limiting factor. Greater efficiencies can be achieved if increased pool sizes (and increased dilution, and therefore lower sensitivity) are allowed. Detection of acute infections during HIV testing in North Carolina Optimizing screening for acute human immunodeficiency virus infection with pooled nucleic acid amplification tests Assessment of Specimen Pooling to Conserve SARS CoV-2 Testing Resources Evaluation of COVID-19 RT-qPCR test in multi-sample pools Sample Pooling as a Strategy to Detect Community Transmission of SARS-CoV-2 SARS-CoV-2 by real-time RT-PCR using minipools of RNA prepared from routine respiratory samples The Hebrew University-Hadassah COVID-19 diagnosis team. Large-scale implementation of pooled RNA-extraction and RT-PCR for SARS-CoV-2 detection. medRxiv preprint Here's How Wuhan Tested 6.5 Million for Coronavirus in Days Antibody responses to SARS-CoV-2 in patients of novel coronavirus disease 2019 Guohong Deng. Viral Kinetics and Antibody Responses in Patients with COVID-19 Viral load of SARS-CoV-2 in clinical samples Virological assessment of hospitalized patients with COVID-2019 Clinical features and dynamics of viral load in imported and non-imported patients with COVID-19 Viral dynamics in mild and severe cases of COVID-19 Temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by SARS-CoV-2: an observational cohort study Sensitivity" and "specificity" reconsidered: the meaning of these terms in analytical and diagnostic settings A c c e p t e d M a n u s c r i p t 25 5 Positive predictive value (PPV) is the probability that, given a final positive result, the specimen is truly positive. Substantial increases in PPV comparing group testing to individual testing are the result of the effect of re-testing of positives in the group testing procedure, assuming uncorrelated errors between testing rounds. The model emphasizes that molecular tests with imperfect specificity (0.99 in our models) have inherently limited utility in low-prevalence situations such as SARS CoV-2 surveillance [5] where false positive individual results could swamp true positives.