key: cord-0560447-vls3zviw authors: Srivastava, Abhishek; Mishra, Anurag; Parekh, Trusha Jayant; Jena, Sampreeti title: Implementing Stepped Pooled Testing for Rapid COVID-19 Detection date: 2020-07-19 journal: nan DOI: nan sha: a70a9fe42a7c76dd65df563b1484a0ce383bf30d doc_id: 560447 cord_uid: vls3zviw COVID-19, a viral respiratory pandemic, has rapidly spread throughout the globe. Large scale and rapid testing of the population is required to contain the disease, but such testing is prohibitive in terms of resources, cost and time. Recently RT-PCR based pooled testing has emerged as a promising way to boost testing efficiency. We introduce a stepped pooled testing strategy, a probability driven approach which significantly reduces the number of tests required to identify infected individuals in a large population. Our comprehensive methodology incorporates the effect of false negative and positive rates to accurately determine not only the efficiency of pooling but also it's accuracy. Under various plausible scenarios, we show that this approach significantly reduces the cost of testing and also reduces the effective false positive rate of tests when compared to a strategy of testing every individual of a population. We also outline an optimization strategy to obtain the pool size that maximizes the efficiency of pooling given the diagnostic protocol parameters and local infection conditions. Coronavirus Disease 2019 (COVID- 19) , a viral infectious respiratory illness, has recently emerged as a major threat to public health and economic stability in countries around the world. It has spread globally at an alarming pace and World Health Organization (WHO) has declared it a pandemic. In absence of a cure or a vaccine, large scale testing and quarantine is recognized as one of the most effective strategies for containing its spread. While there are various known diagnostic methods for COVID-19 including nucleic acid testing, protein testing and computed tomography [1], they can be extremely prohibitive in terms of cost and time. Pooled testing is a promising strategy to boost testing efficiency. In pooled testing, several samples from each patient are divided and grouped into various pools and the pool is then tested for the disease. If the pool tests negative, each sample of the pool must be negative too. This basic idea reduces the overall cost and time of testing large populations. Pooled testing was first proposed during World War II [2] and has been a part of diagnostic methodology ever since [3] . It has since been employed several times to test for infections ranging including Malaria [4] , Flu [5] and HIV [6, 7] . One of the first implementations of laboratory pooled testing for COVID-19 was demonstrated by Yelin et al [8] for pools as large as 32 or 64 samples. Today, Physicians and Public Health Officials from India [9] [10] [11] [12] [13] [14] [15] and many other countries around the globe [16] [17] [18] [19] [20] are using pooled testing for determine the spread of this pandemic in a rapid and cost efficient manner. A variety of different strategies have been proposed over the past several years to implement pooled testing [21, 22] . They can be broadly classified into two types: adaptive and non-adaptive. Adaptive methods [22] [23] [24] [25] [26] [27] [28] [29] [30] employ a sequential testing approach, thus requiring fewer number of total tests but more time as each step of testing informs the next. On the other hand, non-adaptive pooled testing methods [29, 31, 32] usually involve a matrix type pooling that allows for simultaneous testing of several pools whose results are then collated to pinpoint to infected samples. These methods are faster but can require a greater number of tests in total. While many of these methods might be mathematically efficient, their practical implementation is usually challenging [8, 29] and limits the complexity that can be incorporated, no matter the benefits. Hence, it is imperative to modify and verify any proposed method according to clinical constraints. Here, we present a probability driven pooled testing approach that can significantly reduce the number of tests required to identify infected patients in large populations. The method divides and tests pools of samples in a hierarchical (stepped) manner. This approach is general enough to not be limited to COVID-19 alone and can be applied to other infectious scenarios with minor modifications. The mathematical model used for implementing and optimizing this strategy is presented along with representative results for various probable real-life scenarios. Under various plausible scenarios, this strategy reduces the cost of testing between 30% to 90% compared to a strategy of individually testing everyone in a population and cuts the false positive rate up to one-third of an individual test. It can be used to rapidly determine the efficiency boost that can be obtained by pooling a desired number of samples together if we know the accuracy of testing method and the rate of infection in the population being tested. It can also suggest optimal pool size that should be used to minimize the number of tests needed per 1000 people. The stepped pooled testing strategy is applicable to any testing method that involves sample collection such as the Reverse Transcription -Polymerase Chain Reaction (RT-PCR) [1] test which is being widely used for testing COVID-19. We begin by assuming that the sample(s) collected from the patients are enough for N max tests only (for instance if we are able to collect 3 swabs per patient then N max = 3). This number will determines the number of steps of the stepped pooled testing strategy. Our strategy extends the 2-step model described in Hanel and Thurner [30] . The stepped pooled testing strategy goes as follows: 1. We test a pool of M samples. 2. If the outcome is negative (not infected) we can surmise that all the M samples in the pool are infection free. 3. If the pooled sample is tested positive (infected), we split the samples from these M patients into two sub pools of size M/2 each and repeat steps 1 and 2. It should be noted that at every step of this process we need to use a fresh sample from the patient to make new sub-pools because the sample from the previous step is not reusable. 4. This process is repeated N max − 1 times, after which we are left with a single sample of the patients in the sub-pools. If a sub-pool at this stage yields positive for infection, we individually test every patient in this sub-pool. It can be observed that this strategy is most effective when the the pool size M is an integer multiple of M/2 N max −2 . The initial size of the pool M can be optimized to maximize the effective number of people tested per test or equivalently, minimize the number of tests needed per 1000 people. A flowchart for this strategy is shown in Fig. 1 . Probabilistic calculations along this tree enable us to estimate the expected number of tests to be done for a pool of given size as well as the overall chances of false negatives. The probability of a pool of M samples being infected (i.e. at least 1 out of M positive) is The probability of the pooled testing positive is [33] Note that we have assumed that the false negative and positive rate for pool of samples is the same as that for a single sample. This can be justified based on the limits of detection for the commonly used RT-PCR protocols. Please refer to Appendix A for details. Following the flowchart in Fig. 1 , we can deduce that T(M), the expected number of tests for a pool of size M, is given by a recursive functionẐ(M, s) that terminates when we get to N max steps: Z(m, s) = 1 + 2G +Ẑ (m/2, s − 1) for s > 1 m for s = 1 (4) Test individually Step 1 Step 2 Step 3 Step Here m denotes the subpool size and s denotes the step number. It follows that the number of persons per test, which we call the test efficiency amplification K, is given by Correspondingly, the number of tests needed per 1000 people is The total probability of showing a false positive at the end of all steps can also be calculated using a recursive formula. To better understand the calculation for this step, it helps to write the probabilities at each step as shown in Fig. 5 . The recursive formula for the pooled test false negative F pool − can then be written as Here m denotes the subpool size and s denotes the step number. [34] for pool testing and suggested limiting the pool size M to 5 to avoid dilution. ICMR also suggested a staggered approach to use of pooled testing: (a) for areas with infection rate in the population less than 2% pooled testing should be used, (b) For infection rate between 2 − 5%, pooled testing should be used for community and asymptomatic patient testing, and (c) for areas with infection rate > 5%, pooled testing should not be used. We will use these numbers as a guide for demonstrating our method. It should be noted that higher pool sizes, up-to M = 64, have been reported in other studies [8] . These are also in agreement with our calculations regarding limits of detection (See Appendix A). In Figs. 2 to 4, we show the results for a representative set of parameters. We find that the number of tests per 1000 people decreases and the false negatives increases as we make the pool size larger. However, there is an optimum pool size that achieves maximum efficiency (i.e. minimum T 1000 ). Figure 2 reveals that for the same pool size, a higher infection rate population requires more tests and will have an overall lower accuracy (higher false negative rate). This is consistent with what we would expect clinically. In Fig. 3 , we obtain the effect of false negative rate on stepped pooled testing. Interestingly, a diagnostic test with higher false negative would go through more samples in a fewer number of tests but at the cost of overall higher pool test false negative making this trade-off possibly undesirable. In Fig. 4 , we see the effect of the number of steps, N max (also the number of samples per patient) on the pooling strategy. Similar to the previous two parameter sweeps, we notice that the test required per 1000 people shows a non-monotonic behavior and has an optimal pool size for which the pooling is most efficient (Note that for N max = 4, T 1000 minimizes at M = 48 which is beyond the visible horizontal axis). On the other hand, the false negative rate steadily increases but still remains below the false negative rate of a single test. It is obvious that using multiple samples significantly reduces the number of tests needed without compromising the overall false negative of the pooling strategy. Table II summarizes the results for a broad set of plausible scenarios to demonstrate the efficiency of this strategy. In addition to predicting the efficiency and accuracy of different pooling strategies, we can also this method to calculate the optimal pool size that leads to the least number of tests (i.e. minimizes T 1000 ). Figures 2 to 4 clearly demonstrate the existence of such an optimum. In Table III , we show various possible testing scenarios and the corresponding optimal pool size. The results in this section show that stepped pooled testing can reduce the overall pool false negative rate below the false negative rate of an individual test. We propose a new stepped pooled testing strategy that can significantly reduce the cost of testing a large population. The strategy also reduces the chances of false negative in almost all scenarios because an infected patient's sample is likely to be tested multiple times. Even in the simplest case with two samples per individual (i.e. two steps, also called Dorfman Pooling [2]) and an initial pool size of 2, we can significantly reduce the number of tests required per 1000 individuals, by up to 33.7% for populations with a high infection rate and up to 46.5% for populations with a low infection rate. As the number of steps and initial pool size is increased, the testing efficiency progressively improves, albeit at the cost of slightly higher false negative rate. Never the less, barring the cases with very high infection rate, the pooled false negative rate is still below that of an individual test. Based on our results, we make several suggestions about the effective pool size and the number of samples that should be collected from an individual. This methodology should be customized dynamically and regularly based on evolving local levels of infection. Most significant benefits of this strategy can be realized by collecting 2 or 3 samples from each individual and pooling them into groups of 4 to 6. Increasing the number of steps N max means collecting more samples from each patient being tested. Hence, the value of N max should be chosen pragmatically based on consultation with the physician or health professional. Finally, we note that machine learning methods may be implemented to utilize data collected on disease spread and dynamically adapt this strategy for maximum efficiency. We leave this as a topic for future research. High-Throughput Pooling and Real-Time PCR-Based Strategy for Malaria Detection Evaluation of the pooling of swabs for real-time PCR detection of low titre shedding of low pathogenicity avian influenza in turkeys Screening for the Presence of a Disease by Pooling Sera Samples A methodology for deriving the sensitivity of pooled testing, based on viral load progression and pooling dilution Evaluation of COVID-19 RT-qPCR test in multi-sample pools India assesses Covid-19 sample pooling for tests, says top scientist. How it helps We will begin pool testing trials to speed up diagnosis: Satyendar Jain Group of researchers and data scientists develop a new algorithm to prepare India for mass-testing of COVID-19 ICMR suggests using pooled samples for molecular testing COVID-19 pool testing should be encouraged in UP: Yogi Adityanath Centre allows COVID-19 pool testing, plasma therapy in Maharashtra 24-hr shifts, TB kits: ICMR maths for 1 lakh tests daily Pooling Method for Accelerated Testing of COVID-19 To ease global virus test bottleneck, Israeli scientists suggest pooling samples Nebraska Public Health Lab begins pool testing COVID-19 samples Corona 'pool testing' increases worldwide capacities many times over We 'pool' coronavirus samples to test 1,000s at a go; we've done 30,000 since Sunday -Noguchi Evaluation of Pool-based Testing Approaches to Enable Population-wide Screening for COVID-19 Group Testing for COVID-19: How to Stop Worrying and Test More Efficient and Practical Sample Pooling for High-Throughput PCR Diagnosis of COVID-19 Increasing testing throughput and case detection with a pooled-sample Bayesian approach in the context of COVID-19 Variable pool testing for infection spread estimation Noisy Pooled PCR for Virus Testing Multi-Stage Group Testing Improves Efficiency of Large-Scale COVID-19 Screening Pooling RT-PCR or NGS samples has the potential to cost-effectively generate estimates of COVID-19 prevalence in resource limited environments Pooled RNA extraction and PCR assay for efficient SARS-CoV-2 detection Boosting test-efficiency by pooled testing strategies for SARS-CoV-2 Rapid, Large-Scale, and Effective Detection of COVID-19 Via Non-Adaptive Testing Evaluation of Group Testing for SARS-CoV-2 RNA A pool that has infected samples may not necessarily test as positive because the test has a non-zero false negative and false positive rates. Hence G + (M) is not the same as p(M) Advisory on Feasibility of Using Pooled Samples for Molecular Testing of COVID-19 Pitfalls of quantitative realtime reverse-transcription polymerase chain reaction The Laboratory Diagnosis of COVID-19 Infection: Current Issues and Challenges Improved molecular diagnosis of COVID-19 by the novel, highly sensitive and specific COVID-19-RdRp/Hel real-time reverse transcription-polymerase chain reaction assay validated in vitro and with clinical specimens Assay Techniques and Test Development for COVID-19 Diagnosis Specimen Type Mean (range) viral load (RNA copies/mL) in RdRp-P2-negative but COVID-19-RdRp/Hel-positive specimens Viral Load in respiratory and non-respiratory specimens The authors are thankful to Dr. Saumya Srivastava, MBBS and Vertika Srivastava for useful discussions, and to Dr. Hanel for providing more details about his model via email. One of the key advantages of real-time PCR assays utilizing target sequence specific primers (as is the case with all COVID-19 test kits) is their wide dynamic range. This enables the analysis of samples with widely varying levels of target RNA. The resolving power of RT-PCR is mostly limited by the efficiency of RNA-to-cDNA conversion, a real concern when the target RNA is scarce. Thus, determination of the Limit of Detection (LOD)-by performing serial dilutions of the positive control sample and obtaining standard curves-is a critical step in the validation of any testing kit/protocol. The highest dilution of the standard curve, provided in the assay performance evaluation report of any RT-PCR assay kit, delineates the lowest concentration that can be quantified with confidence. Thus, pooling patient samples as proposed by the current model is unlikely to influence the probability of a false negative prediction by the assay if the effective target concentration is maintained above the LOD. However, if the intensity values recorded are comparable to that of the LOD, they should be recorded only as a qualitative (yes/no) prediction [35] .Target RNA selection plays a big role in the assay sensitivity. These include RNA-dependent RNA polymerase (RdRp), hemagglutinin-esterase (HE), and open reading frames ORF1a and ORF1b. World Health Organization (WHO) recommends a first line screening with the E gene assay followed by a confirmatory assay using the RdR p gene. Tang et al. [36] developed and compared the performance of three novel real-time RT-PCR assays targeting the RdRp/Hel, S, and N genes of SARS-CoV-2. Among them, the COVID-19-RdRp/Hel assay had the lowest limit of detection in vitro and higher sensitivity and specificity.In this section, we will calculate the maximum possible pool size (M * ) that is consistent with the LOD of current COVID-19 tests.Calculation.-The LOD of the COVID-19-RdRp/Hel assay is 11.2 RNA copies/reaction [36] . Assuming a reaction volume of 25 µL, this is equivalent 448 RNA copies in one mL sample. From Table IV , we find that the mean viral load for nasopharyngeal/nasal swabs is 1.74 × 10 4 RNA copies/mL. Assuming a pool size of M samples with only one infected sample, and that samples are pooled first followed by RNA extraction, the net effective viral load in the pooled sample will be (1.74/M) × 10 4 copies/mL. In the standardized protocol for RNA extraction and RT-PCR procedure, 200 µL of pooled sample is diluted with 250 µL of solvent and loaded for RNA extraction. Purified RNA is diluted into 50 µL of solvent. 10 µL of diluted solution is used per well of PCR assay with a total reaction volume of 25 µL [38] .Thus, the net effective viral load per PCR well (in units of RNA copies/mL of solvent) is Thus, the largest possible pool size consistent with LOD of a RT-PCR test is M * = 62. This value is consistent with earlier literature [8, 28] . Step 1Step 2Step 3Step