key: cord-0559097-yev55ijr authors: Jaszkiewicz, Andrzej title: Modified Dorfman procedure for pool tests with dilution -- COVID-19 case study date: 2020-11-30 journal: nan DOI: nan sha: 06b8136678ecda9d6a7061de94bbd3a2fd04523e doc_id: 559097 cord_uid: yev55ijr The outbreak of the global COVID-19 pandemic results in unprecedented demand for fast and efficient testing of large numbers of patients for the presence of SARS-CoV-2 coronavirus. Beside technical improvements of the cost and speed of individual tests, pool testing may be used to improve efficiency and throughput of a population test. Dorfman pool testing procedure is one of the best known and studied methods of this kind. This procedure is, however, based on unrealistic assumptions that the pool test has perfect sensitivity and the only objective is to minimize the number of tests, and is not well adapted to the case of imperfect pool tests. We propose and analyze a simple modification of this procedure in which test of a pool with negative result is independently repeated up to several times. The proposed procedure is evaluated in a computational study using recent data about dilution effect for SARS-CoV-2 PCR tests, showing that the proposed approach significantly reduces the number of false negatives with a relatively small increase of the number of tests, especially for small prevalence rates. For example, for prevalence rate 0.001 the number of tests could be reduced to 22.1% of individual tests, increasing the expected number of false negatives by no more than 1%, and to 16.8% of individual tests increasing the expected number of false negatives by no more than 10%. At the same time, a similar reduction of the expected number of tests in the standard Dorfman procedure would yield 675% and 821% increase of the expected number of false negatives, respectively. This makes the proposed procedure an interesting choice for screening tests in the case of diseases like COVID-19. Identification of infected patients is crucial for providing appropriate treatment and limit further transmission of COVID-19 disease. Large number of infected or potentially infected patients results in unprecedented demand for fast and efficient testing of large populations. For example, the results of a recent simulation study reported by Larremore et al. [16] demonstrate that effective screening depends largely on frequency of testing and the speed of reporting and could turn the COVID-19 tide within weeks. Some countries have decided already to test the whole population. For example, in October 2020 Slovakia decided to test all adults for SARS-CoV-2 [13] . However, due to the cost, availability, and efficiency of PCR test, Slovakia's government decided to use antigen tests which are known to have lower sensitivity and specificity. Beside technical improvements of the cost and speed of individual tests, pool testing is an interesting option to improve efficiency and throughput of a population test. Dorfman pool testing procedure is one of the best known and studied methods of this kind [7] . In this procedure, the subjects are grouped into a number of non-overlapping pools and the specimens from the pool subjects are tested together. If the result of a test for a pool is negative, all subjects are assumed to be negative. Otherwise, all subjects from the pool are individually tested. Since, the prevalence rate is usually much below 50%, many pools may contain only negative subjects and may be efficiently identified. Dorfman procedure procedure is, however, based on some assumptions that are not realistic in the context of COVID-19 and many other diseases, and thus limit its practical applications. This procedure assumes that the pool test has perfect sensitivity and the only objective is efficiency usually expressed as minimization of the number of tests. This assumption may seem justified in the case of PCR tests, including tests for presence of SARS-CoV-2 RNA, since in (reverse transcription) polymerase chain reaction the target DNA is exponentially multiplied, thus theoretically even extremely small amount of target DNA/RNA could be detected. In practice, the sensitivity of pool test may become lower due to the dilution effect [20] , [25] . Indeed some recent studies confirm that dilution significantly affects sensitivity of SARS-CoV-2 PCR tests [1] , [5] , [23] . For example, Bateman et al. [5] estimated that pools of 5 (with one positive subjects) lead to 93% sensitivity, pools of 10 lead to 91% sensitivity, and pools of 50 lead to 81% sensitivity, while estimated sensitivity for undiluted specimens is 99%. Thus, dilution effects are significant and need to be appropriately taken into account. With lower sensitivity of the pool test many subjects may be false negative. Such situations are highly undesired in the context of diseases like COVID-19 since they make impossible providing appropriate treatment and limiting further transmission. Thus, test efficiency should not be the only objective [19] . Although a number of authors proposed improvements of Dorfman testing procedure or other pool testing schemes, these modifications usually focus on improving efficiency (number of tests) [15] , [24] , for example, by using a hierarchy of sub-pools tested in more than two stages. Hierarchical approach, however, even further increases the risk of a subject being false negative, since a subject is assumed negative it at least one of its (sub-)pool tests was negative. Other works take into account only reduced specificity with the assumption of perfect sensitivity [2] , [8] , [11] . There are relatively few studies taking into account accuracy related objectives with imperfect sensitivity in pool tests [4] , [3] , [14] , [15] . Still, some of the approaches make the assumption that the imperfect sensitivity is constant and the same for both individual and pool tests [3] , [9] , [10] , [15] , [17] , in other words, they do not take into account the dilution effect. Some works taking into account the dilution effect concern the problem of estimating the prevalence rate [21] , [22] , [26] , not the problem of identifying positive/negative subjects. Furthermore, even the studies that consider dilution effect with false negatives as (partial) objective [4] , [14] , [27] do not propose modifications of the Dorfman test procedure that could reduce the number of false negatives. In general, in our opinion, the development of efficient and precise pool testing procedures did not obtain sufficient interest from the research community, due to limited practical interest before the outbreak of COVID-19 pandemic. We show, that the standard Dorfman pool testing procedure is not well adapted to the case of imperfect pool tests with the objective of minimizing the number of false negatives and we propose a simple modification of this procedure in which test of a pool with negative result is independently repeated up to a given number of times. This arXiv:2012.00673v1 [stat.ME] 30 Nov 2020 approach significantly reduces the number of false negatives with a relatively small increase of the number of tests. We present also a numerical study based on the recently reported data about the influence of the dilution effect on the sensitivity of SARS-CoV-2 PCR tests [5] . We show that the modified procedure is better on both the expected number of false negatives and the number of tests than the standard Dorfman procedure for a wide range of parameters. We show also that the proposed procedure may significantly reduce the expected number of tests with practically negligible increase of the expected number of false negatives compared to individual tests. Several authors proposed repeating test of pools (retesting) under some conditions [9] , [10] , [17] . In particular our procedure could bee seen as the special case of the procedure proposed by Graff and Roeloffs [9] with r1 = 1 and r2 = r. Still there are some important differences between these works and our approach, namely: • These works do not take into account dilution effect, which significantly influences the analysis and performance of pool testing. • The presented analysis differs from ours, in particular we use Bayes' theorem to update posterior probabilities. • We present a simulation study verifying the presented analysis. • We do not propose to repeat test of pools with positive results, since although it could slightly reduce the number of false positives, it increases the number of tests. At the same time, the number of false positives has relatively low importance in the context of diseases like COVID-19. Furthermore, in [18] Nyongesa described a procedure in which a pool with negative result is retested just once, however, this procedure was analyzed only from the point of view of prevalence rate estimation without dilution effect. The contributions of this paper are: • A modified Dorfman pool testing procedure for pool tests with dilution effect designed to improve the number of false negatives at a moderate cost of increased number of tests is proposed. • Analytical formulas for the number of tests, the number of false negatives and the number of false positives in the proposed procedure are derived. • The analytical formulas are verified in a computational simulation study. • The practical contribution of the paper is that through a computational case study using recent data about SARS-CoV-2 PCR tests sensitivity it is shown that the proposed procedure may be highly beneficial in the case of diseases like COVID-19. The paper is organized in the following way. In the next Section, we define basic concepts and notation. Then, we present the analysis of the standard Dorfman pool testing procedure from the point of view of the number of tests, the number of false negatives and the number of false positives. In Section four, we introduce the proposed procedure and analyze its properties. In the fifth Section, we present a computational case study for SARS-CoV-2 PCR tests. Finally, we present conclusions and directions for further research. Let p = P (Ps = 1) be the probability of a subject being positive (prevalence rate), where Ps ∈ {0, 1} is a random variable describing that the subjects is positive (Ps = 1) or negative (Ps = 0). Let Sp be the test specificity, i.e. the conditional probability that the outcome of the test is negative given the subjects is negative. Let Se be test sensitivity, i.e. the conditional probability that the outcome of the test is positive given the subjects is positive. Probability of a test being true negative in a given test is Probability of a test being true positive in a given test is Se p. Probability of a test being false negative in a given test is (1−Se)p. Let SpI and SeI denote specificity and sensitivity of an individual test. If each subject is tested individually, the expected number of false negatives E(F N ), the expected number of false positives E(F P ), and the number of tests T per subject are: Consider a pool of size n. The probability that there is at least one positive subject in the pool (pool is positive) is Let SpP and SeP denote specificity and sensitivity of a pool test. It is reasonable to assume that SpP = SpI = Sp, since in both cases we deal with a sample containing no virus RNA. where SeP (n, k) is the sensitivity of a pool test with the pool of size n and k positive subjects in the pool, and P r(k; n, p) is the probability of having k positive subjects in the pool of size n that could be derived from binomial distribution. SeP is the same for each pool assuming constant pool size and homogeneous subjects (i.e. constant p). In practice SpP = SpI = Sp and SeI are close to 1. SeP , SeP (n, k) ≤ SeI and SeP (n, k) decreases with growing n and increases with k. Let T ∈ {0, 1} denote the random variable being outcome of the pool test. The probability of a pool test being (true or false) positive is: and the probability of a pool test being (true or false) negative is: The detailed analysis is presented in Appendix A. In this Section we present the analytical formulas for the expected number of false negatives E(F N ), number of false positives E(F P ), and expected number of tests E(T ). SeP (n, k)P r(k − 1; n − 1, p) p The weak point of pool tests is decreased sensitivity compared to an individual test. In result, the number of false negatives may be increased. This weak point could be, however, resolved by repeating test for the same pool, if the results of the previous tests were negative, until the last test was positive or a given number of tests were negative. In other words, we assume that the pool test is positive, if at least one result was positive and we assume that the pool test is negative, if all results were negative. The decision tree of the repeated pool test is presented in Figure 1 . Note, that we assume that for each pool test, different samples from each subject are used, so the results of the repeated tests may be assumed to be independent. Let T ∈ {0, 1} denote the random variable being outcome of the pool test (with potential repeats). The probability of a pool test being (true or false) positive is: and the probability of a pool test being (true or false) negative is: where: is the repeated test specificity and Se P = E Se P (n, k), k = 1, . . . , n ≥ SeP is the repeated test sensitivity (see Appendix B). Lemma IV.1. P (T = 1) ≥ P (T = 1) and P (T = 0) ≤ P (T = 1). Proof. It is direct consequence of Sp P ≤ Sp and Se P ≥ SeP The detailed analysis is presented in Appendix B. In this Section we present the analytical formulas for the expected number (1 − SeI Se P (n, k))P r(k − 1; n − 1, p) (16) Se P (n, k)P r(k − 1; n − 1, p) p Lemma IV.2. With the same parameters, the modified procedure has lower or equal expected number of false negatives than the standard Dorfman test: Proof. It is a direct consequence of Se P (n, k) ≥ SeP (n, k), k = 1, . . . , n. Lemma IV.3. With the same parameters, the modified procedure has greater or equal expected number of individual tests and total number of tests than the standard Dorfman test: Proof. E(T I ) ≥ E(T I ) results from P (T = 1) ≥ P (T = 1). E(T ) ≥ E(T ) results from E(T I ) ≥ E(T I ) and the fact that already the first iteration of pool tests in the modified procedure requires the same number of tests as the first stage in the standard procedure. Lemma IV.4. The expected number of false negatives if greater or equal than in the case of individual tests: Proof. Se P (n, k)) ∈ [0, 1], k = 1, . . . , n and E(F N ) in nonincreasing with Se P (n, k)), k = 1, . . . , n, thus E(F N ) achieves minimum for Se P (n, k) = 1, k = 1, . . . , n. If Se P (n, k) = 1, k = 1, . . . , n then: Note, that since values of Se P (n, k) become very close to 1 in the repeated pool test, E(F N ) may be very close to E(F N ). We verify the presented analysis and evaluate the proposed procedure using recent data about the influence of the dilution effect on the sensitivity of SARS-CoV-2 PCR tests [5] . Since, however, the authors presented results (values of Se(n, k)) only for some specific values of n = 1, 5, 10, 50 and k = 1 we need to fit to the data a model allowing calculation of Se(n, k) for other values of n and k. We tried first the model proposed in [6] and used also in [4] : with Sp ≥ 0.99, however, this model did not fit the data perfectly (MSE = 0.000781). So, we decided to extend this model with a linear term: This model fitted the data almost perfectly (MSE = 0.000026) with α = 0.032482 and β = −0.001255. We hypothesize that the linear term reflects PCR test dependence not only on the relative but also on the absolute amount of virus RNA and remaining material. We tested the proposed procedure for prevalence rates p = 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.3. Our procedure involves two parameters, pool size n and number of iterations of pool test r. Since their values are integer and reasonable ranges of their values are relatively small, all potential settings could be enumerated. We used n = 2, . . . , 50 and r = 2, . . . , 5. We did not try higher numbers of iterations, since they would be highly impractical and 5 iterations was enough to assure the expected number of false negatives very close to that of individual tests. We performed two kinds of computational studies. First, we used the analytical formulas presented above. Second, we simulated the procedures for 100, 000, 000 subjects drawn randomly as either positive or negative with a given prevalence rate. By comparing results of the analytical and simulation studies we verified the analytical formulas. In Table I we present mean relative squared errors between the analytical and simulation results for particular parameters and methods. These results confirm very good agreement between the analytical and simulation study. We compare the proposed procedure to the standard Dorfman procedure and to individual tests. Since false positives are much less important than false negatives in the context of diseases like COVID-19, we decided to use two main objectives i.e. the expected number of false negatives and the expected number of tests. Thus, we deal with a bi-objective optimization problem with two integer variables defining feasible solutions (settings of parameters) [12] . For each value of p, only Pareto-optimal solutions were preserved. A solution is Pareto-optimal (non-dominated, efficient) if there exists no other feasible solution better on one of the objectives and not worse on other objective(s). For solutions that are not Pareto-optimal it is possible to improve one objective without deteriorating other(s), so they do not constitute reasonable parameters for pool tests. All detailed numerical results and the source code used in this computational study are available at dedicated web page 1 . In Figure 2 we present exemplary sets of Pareto-optimal solutions of the standard Dorfman and the proposed procedure for selected values of p. The results are presented relative to the results corresponding to individual tests of all subjects. As it can be seen, the proposed procedure performs especially well for low prevalence rates and is able to assure much lower expected numbers of false negatives than the standard Dorfman procedure with the similar expected number of tests. Note, however, that if reduction of the number of tests is the main objective, the standard Dorfman testing procedure could assure lower expected number of tests than the proposed procedure. Furthermore, the proposed procedure allows for a significant reduction of the expected number of tests with very low increase of the expected number of false negatives compared to individual tests. To analyze it more precisely, in Table II we present Pareto-optimal solutions for the proposed procedure that assure no more than 1%, 10%, and 100% increase of the expected number of false negatives with the minimum expected number of tests. Note, that for p = 0.2 and p = 0.3 no such solutions exist with the expected number of tests lower than the number of tests in individual testing. These results confirm that the proposed procedure may be highly beneficial with low prevalence rates. For example, for p = 0.001 the number of tests could be reduced to 22.1% of individual tests increasing the expected number of false negatives by no more than 1%, and to 16.8% of individual tests increasing the expected number of false negatives by no more than 10%. At the same time, a similar reduction of the expected number of tests (to 26.4% and 18.2%, respectively) in the standard Dorfman procedure would yield 675% and 821% increase of the expected number of false negatives, respectively. Although, we treat the numbers of tests and of false negatives as the main objectives, in Table III we report also the average numbers of false positives for Pareto-optimal solutions in compared procedures. As it could be observed, the number of false positives in the proposed Figure 2 . Pareto optimal solutions for compared procedures procedure is larger than in the standard Dorfman procedure, but is lower than in individual tests. We have presented a modified pool testing procedure which could significantly reduce the number of tests at a very low cost of increasing the number of false negatives. We have derived analytical formulas for the main objectives and verified them iscreening n a simulation study. We have also shown that the proposed procedure may be highly beneficial for SARS-CoV-2 PCR screening tests. In this work we assumed a homogeneous population of subjects with constant prevalence rate. An interesting research direction would be to consider heterogeneous subjects with different prevalence rates [4] . Another possibility would be to apply hierarchical approach [15] in the case of pools tested positively that could perhaps further reduce the expected number of tests. In this study, we assumed that pool test is repeated with the same pool. Another possibility would to re-shuffle the subjects from the negative pools into new pools. It is yet to be established is such approach could bring some benefits. Note, however, that adding a hierarchical approach or pools reshuffling would increase complexity, while an advantage of the proposed procedure is relative simplicity which may be very important for practical implementations. Having k positive subjects given Ps = 1 means that there are k − 1 positive subjects among n − 1 remaining subjects, thus P r(k; n, p|Ps = 1) = P r(k − 1; n − 1, p), so: In an analogous way we may analyse the case T = 1: P (Ps = 1|T = 1) = P (T = 1|Ps = 1)p P (T = 1) = P (T = 1|Ps = 1)p P (T = 1) SeP (n, k)P r(k; n, p|Ps = 1) = k=n k=1 SeP (n, k)P r(k − 1; n − 1, p) Thus: The expected number of subjects in negative pools that are not individually tested is: The probability that one of these subjects is false negative is P (Ps = 1|T = 0) Thus, the expected number of false negative subjects from the pool testing stage is: (1 − SeP (n, k))P r(k − 1; n − 1, p) p P (T = 0) (1 − SeP (n, k))P r(k − 1; n − 1, p) p A subject is individually tested if its pool test result was true or false positive. The expected number of subjects individually tested in the second phase is: So, the expected number of false negatives in the second stage is: In total the expected number of false negative subjects in both stages is: (1 − SeP (n, k))P r(k − 1; n − 1, p) p+ SeP (n, k)P r(k − 1; n − 1, p) p To calculate the expected number of tests we add the number of pool tests and the expected number of subjects tested individually in the second phase: In repeated pool test we need to update test specificity and sensitivity for each value of k. As can be seen from the Figure 1 the repeated pool test specificity with r iterations is and the repeated pool test sensitivity is Se P (n, k) = 1 − 1 − SeP (n, k) r ≥ SeP (n, k), k = 1, . . . , n Obviously Se P (n, k) > SeP (n, k), k = 1, . . . , n, if SeP (n, k) < 1. Note that the sensitivity of the repeated pool test may be higher than the sensitivity of an individual test if Se P = k=n k=1 Se P (n, k)P r(k; n, p) pP > SeI Se P (n, k)P r(k − 1; n − 1, p) p To calculate the expected number of tests we need to take into account that the pool test for a pool is repeated if all previous tests for this pool were negative. Consider iteration number l. The probability that a pool was true negative in iterations 1, . . . , l − 1 is: To calculate the probability that a pool was false negative in iterations 1, . . . , l − 1 we need to consider each k = 1, . . . , n. This probability for a given k is: pP (1 − SeP (n, k)) l−1 and for all k = 1, . . . , n: pP k=n k=1 1 − SeP (n, k) l−1 P r(k; n, p) pP = k=n k=1 1 − SeP (n, k) l−1 P r(k; n, p) thus the probability that a pool was negative in iterations 1, . . . , l − 1 is: (1 − pP )Sp l−1 + k=n k=1 1 − SeP (n, k) l−1 P r(k; n, p) Assessment of Specimen Pooling to Conserve SARS CoV-2 Testing Resources Dna pooling in mutation detection with reference to sequence analysis Optimal Risk-Based Group Testing Adaptive riskbased pooling in public health screening Assessing the dilution effect of specimen pooling on the sensitivity of SARS-CoV-2 PCR tests Group testing with test error as a function of concentration The detection of defective members of large populations The efficiency of pooling in the detection of rare mutations Group testing in the presence of test error; an extension of the dorfman procedure A group-testing procedure in the presence of test error Group testing in presence of classification errors Multiobjective optimization in bioinformatics and computational biology Slovakia to test all adults for sars-cov-2 Comparison of group testing algorithms for case identification in the presence of test error Test sensitivity is secondary to frequency and turnaround time for covid-19 screening Multistage group testing procedure (group screening) Dual Estimation of Prevalence and Disease Incidence in Pool-Testing Strategy Reader reaction: A note on the evaluation of group testing algorithms in the presence of misclassification Informative Dorfman Screening Regression models for group testing data with pool dilution effects Sequential prevalence estimation with pooling and continuous test outcomes Pooling of SARS-CoV-2 samples to increase molecular testing throughput THE USE OF A SQUARE ARRAY SCHEME IN BLOOD TESTING Comparative analysis of triplex nucleic acid test assays in united states blood donors A general regression framework for group testing data, which incorporates pool dilution effects Pooled testing for hiv screening: Capturing the dilution effect