key: cord-0633145-ps82gs8y authors: vCivzikovien.e, Ugn.e; Skorniakov, Viktor title: On the Optimal Configuration of a Square Array Group Testing Algorithm date: 2021-06-29 journal: nan DOI: nan sha: 2ea220993c7ba8ed6f492063449612d508337367 doc_id: 633145 cord_uid: ps82gs8y Up to date, only lower and upper bounds for the optimal configuration of a Square Array (A2) Group Testing (GT) algorithm are known. We establish exact analytical formulae and provide a couple of applications of our result. First, we compare the A2 GT scheme to several other classical GT schemes in terms of the gain per specimen attained at optimal configuration. Second, operating under objective Bayesian framework with the loss designed to attain minimum at optimal GT configuration, we suggest the preferred choice of the group size under natural minimal assumptions: the prior information regarding the prevalence suggests that grouping and application of A2 is better than individual testing. The same suggestion is provided for the Minimax strategy. The task of identification of infected patients in a given cohort is the frequent one. Though the plain consecutive testing of all individuals is an obvious solution, there are many other ways to approach that problem. The term Group Testing (GT) refers to the testing strategy when the testing of distinct specimens is replaced by the testing of groups of pooled specimens. It appears that the idea was first described in the famous paper of Dorfman [12] . He looked for a cost saving way to screen the U.S. soldiers for syphilis during the period of the World War II and suggested the following scheme. Instead of testing each individual blood sample, pool N samples and test the group; if the group tests positive, retest each single individual; if the group tests negative, then all individuals in the group are healthy and no retesting is needed. It is clear that, when the prevalence of the disease is low, a small fraction of pools needs retesting thereby leading to significant cost savings. Since the appearance of [12] , the idea came to stay to many other fields (quality control, informational sciences, environmental sciences, etc.) as well. In biomedical context, GT is widely applied to screen for infectious diseases like HIV, hepatitis and, most recently, , [40] , [5] , [25] , [35] , [38] , [1] , [10] , [11] , [16] , [22] , [30] , [27] ). It also appears to be a very useful technique in genetics ( [13] , [9] , [28] , [7] ). In this paper, we focus on the Square Array (A2) GT algorithm introduced by Phatarfod and Sudbury [37] and later generalized by Berger, Mandell and Subrahmanya [36] . The A2 operates as follows. Given n 2 specimens, one places them on n × n matrix and tests the pools defined by subsets corresponding to rows and columns. The cases lying on the intersections of rows and columns exhibiting positive responses are further retested and, assuming that the test is perfect, all infected are identified. One of the most important characteristics of each GT algorithm is the optimal configuration. To introduce the concept, consider an arbitrary GT scheme and let T N denote the total (random) number of tests performed over the cohort spanning N individuals. The optimal configuration is the size of the cohort which minimizes function N → E T N N , i.e., the expected number of tests per individual. Though, for a given prevalence, numerical solution of the optimal configuration is always possible, analytical formulae provide much more insights allowing, in particular, analytical comparisons of different GT schemes. For the case of the Dorfman scheme described above, the optimal configuration was established by Samuels [31] quite long ago. However, for a couple of its modifications, namely, the modified Dorfman scheme [33] and Sterrett scheme [34] , the analytical formulae, though conjectured, were unknown [24] and established only recently [44] . Pretty much the same situation is with A2. To our best knowledge, only Hudgens and Kim [17] addressed this problem and did not succeed in providing the final solution. Namely, in [17] the authors obtained lower and upper bounds for the optimal A2 configuration leaving the exact formulae undiscovered. In this paper, we fill in the gap by providing exact solution similar to that obtained by Samuels [31] and Skorniakov and Čižikovienė [44] for the case of classical Dorfman scheme and its modifications. The paper is organized as follows. In Section 2, we state our results and provide a couple of applications. Section 3 is devoted to discussion. Finally, there are two appendices containing proofs, figures, and link to the web repository with tabular data. Before proceeding to the statement of results, we first formulate Binomial Testing Assumptions (BTA) which are assumed to hold in the remaining part of the paper by default. infected with the same constant probability p ∈ (0, 1) (termed prevalence in the sequel). (BTA2) The test under consideration is perfect and there is no dilution effect. That is, pooling does not affect the performance of the test. The way A2 operates was already described in the introductory Section 1. However, we need an explicit expression for the average number of tests E T N applied under A2 to the cohort spanning N = n 2 , n ≥ 2, individuals. The latter was derived by Phatarfod and Sudbury [37] and is equal to where q := 1 − p. Therefore, in this parametrization, an average number of tests per person 2) and the corresponding optimal configuration n opt = n opt (q) := argmin n∈{2,3,...} t(q, n). The existence, uniqueness and bounds on n opt for various values of q ∈ (0, 1) were established by Hudgens and Kim [17] . Our reconsideration (given in Theorem 2.1 and Corollary 2.1 below) aimed to sharpen their results (see discussion in Section 3) and provide the complete theoretical characterization of A2. Theorem. 2.1. Let g(q, n) = 2 n − 2q n + q 2n−1 = t(q, n) − 1. (i) For (q, n) ranging in (1/2, 1) × (2, ∞), system of equations has a unique solution (q * , n * ) ≈ (0.748416, 4.453524). (ii) For any fixed q ∈ (q * , 1) and with respect to n, equation g(q, n) = 0 admits two solutions n L , n U : 2 < n L < n * < n U < ∞. On (n L , n U ), n → g(q, n) attains values in (−∞, 0) whereas on (2, ∞) \ [n L , n U ] it attains values in (0, ∞). (iii) For any fixed q ∈ (q * , 1), the region (n L , n U ) is the one where A2 is efficient, i.e., t(q, n) < 1 for n ∈ (n L , n U ). In that region, there exists a unique (and, therefore, global) minimizer n min of (2, ∞) n → t(q, n). For q ∈ [0.755, 1), it is given by for some t * ∈ [0, 1]. (2, ∞) n → t(q, n) also has a unique (and, therefore, global) maximizer located in the region (n U , ∞). For any fixed q ∈ (0, q * ), A2 is never optimal, i.e., (2, ∞) n → t(q, n) attains values in (1, ∞). Corollary. 2.1. Let g(q, n) be as in Theorem (2.1). Then g(q, 5) = 0 has a unique solution q 5 ≈ 0.750209961. For all q ∈ (q 5 , 1), n opt (q) belongs to the set Remark 2.1. In the Corollary 2.1, the region (q 5 , 1) ⊂ (q * , 1). An explanation for this truncation stems from the fact that (q 5 , 1) is the region where practical application of A2 makes sense. To be more precise, applying Theorem 2.1 for a fixed q ∈ (q * , q 5 ), we have that t(q, n min ) < 1 with n min = n min (q) given by (2.4) . However, this n min (q) ∈ (4, 5) and min(t(q, 4), t(q, 5)) > 1 when q ∈ (q 5 , q * ). We touch this question briefly in the discussion Section 3 when talking about relation of our results to those of Hudgens and Kim [17] . Though we do not provide a separate proof of this fact, technical details can be filled in after inspection of the proofs presented in the Appendix A. In order to shed the light on to performance of A2, we compare it to several other GT schemes in terms of magnitude of optimal configuration N opt and gain across the range p ∈ (0, 0.249790) where application of A2 seems reasonable. In this comparative analysis, for a fixed q = 1 − p, we define the gain as Here, t(q, N ) = E T N N is an average number of tests per person when the tested group size is equal to N whereas N * stands for the unrounded optimal configuration, i.e., the minimizer of the continuous argument function [1, ∞) N → t(q, N ). Defined this way, the gain multiplied by 100 has a meaning of an average number of tests saved per 100 persons in comparison to usual one by one testing. Preceded by several remarks, below comes a description of the a fore mentioned GT schemes. To distinguish between the schemes, we assign one letter abbreviations to each and, by making use of these letters, superscript all related quantities. E.g., t (A2) (q, N (A2) * ) denotes the value of t(q, N * ) when A2 is the scheme under consideration. Remark 2.2. Recall that, in section 2.1, we have introduced reparametrization of N by equality N = n 2 , where n is the number of rows (columns) in the squared array used in the definition of A2. In this example and in all what follows after, this reparametrization remains in force: writing t (A2) (q, N ), we factually mean t given in equation (2.4) with n ranging continuously unless stated otherwise. When we want to emphasize reference to (2.4), n is used instead of N . For all other schemes considered, reparametrizations of similar kind do not apply. Remark 2.3. The behaviour of quantities compared is more naturally interpreted in terms of the prevalence p = 1 − q. Therefore, p but not q is used in the accompanying graphs. Also, because of the same reason and for the sake of convenience, we quite often denote an argument of the function considered by q and write an explicit formula in terms of p. Remark 2.4. In this example, we have chosen to operate on the continuous scale because it is much easier to perceive visually in comparison to the discrete one. However, keeping in a view the practical aspect, comprehensive numerical results (see Appendix B) are given on the discrete scale. Comparing both one can find out that the discrepancies are small. Dorfman scheme D. The Dorfman scheme was described in the introductory Section 1. For this scheme, [31] has shown that rounded optimal configuration N (D) [32] for tabulated numerical results), and one can further show that Sterrett scheme S. Sterrett [34] suggested the following modification of the Dorfman 1 scheme: one should retest initial positive pool sequentially one by one until appearance of the first positive case and then again apply pool testing to the remaining tail. If the remaining untested set tests positive, one should proceed recursively as previously until the remaining set tests negative or the whole set of individuals gets tested. Malinovsky and Albert [24] conjectured that, for p ∈ (0, (3 − √ 5)/2), rounded optimal configuration N (S) opt lies in the set Skorniakov and Čižikovienė [44] affirmed their conjecture showing along the way that N From the latter result it then follows that Halving scheme H. This scheme resembles divide and conquer sorting and can be described by the following steps. Step 1. Test initial pooled cohort. If it tests negative, finish; if it tests positive, proceed to Step 2. Step 2. Divide the cohort into two approximately equal parts consisting of the first and second halves and apply the whole algorithm (starting from Step 1) to the two obtained parts recursively. It is difficult to trace back the first reference discussing this scheme in detail. To our best knowledge, it was treated already by Johnson et. al. [19] . However, it appears that its asymptotic analysis was first accomplished not so long ago by Zamman and Pippenger [43] whereas in [32] it was discussed a fresh without a strict focus on asymptotic regime when p → 0+. There it was shown that rounded optimal configuration leading thereby to the following asymptotic relationships: Figure 1 shows the behaviour of N * and gain G for the case of A2 and the three schemes described above. Note that, in case of A2, total optimal pool size N (A2) * = n 2 min with n min given in Theorem 2.1. Due to that raise to the square, N (A2) * grows to infinity much faster than the counterparts of the remaining schemes, and, because of this, in the top left sub-figure, the range of p starts quite far from the origin and the accompanying bottom left sub-figure on the log scale is given. It clearly depicts the relationships following from the formulae given above and clearly showing that the asymptotic slope of ln N (A2) * on the − ln p scale is the largest one. Turning to sub-figures on the right, one sees that there are ranges where A2 outperforms other schemes. Numerical solutions are as follows: Within these ranges, numerically estimated maximal differences are • max p∈R A2,D 100(G (A2) (q) − G (D) (q)) = 6.2179 at p = 0.017128; Finally, comparing behaviour of optimal pool sizes one has that N , consult the discussion Section 3. In reference [23] , the authors sought for the pool size leading to optimal testing by making use of scheme D when the prevalence is unknown. In their work, two approaches were used. Both (approaches) were based on the following loss function. Given scheme X, define N ) is the optimal configuration when the prevalence p (and hence q = 1 − p) is known. It is clear that L (X) (q, N ) ≥ 0 ∀(q, N ) ∈ (0, 1) × N and, for a given q, L (X) (q, N ) = 0 precisely when N = N (X) In what follows, to distinguish between N (X) opt (q) and optimal configuration suitable for unknown q's, the latter configuration is denoted by N (X) . The first approach in [23] was to make use of mini-max strategy and take N (X) as a minimizer of The second approach was to make use of Bayesian paradigm and, after putting the prior π on q, to take N (X) as a minimizer of In this example, we have adopted both approaches to the case of A2. When using the Bayesian one, π was taken uniform over (0.750210, 1). Thereby, we have modeled situation when the only prior information is that application of A2 makes sense (see Corollary 2.1). Also, we have modified (2.10) and used instead. To justify our choice, note that, in (2.10), c(π) does not depend on N . Therefore, minimization of the target function amounts to minimization of {1, 2, . . .} N → E π t (X) (q, N ). This way important information carrying function q → t (X) (q, N (X) opt ) remains unutilized. Figures 3-4 show graphs of (2.9) and (2.11) for the case of X = A2 and the previously mentioned prior π. Note that, adopting the above to our case, in (2.8), (2.9), (2.10), we have used function t defined by (2.2). Numerical estimation yielded the following values: • N (A2) = 12 2 for the case of mini-max approach; • N (A2) = 7 2 for the case of Bayesian approach. Finishing, it is important to note that, though the strategy discussed above leads to suboptimal testing in a stable environment where reliable estimation of prevalence is possible, it appears to be a reasonable strategy when the prevalence is varying rapidly and is difficult to capture by data at hand. Therefore, at least in the initial stage, it can be considered as a good alternative for optimal testing during pandemics like COVID-19. It was already mentioned in the introductory Section 1 that, to our best knowledge, [17] is the only reference where A2 was treated in the same way like we did here. Therefore, we first discuss our input in comparison with [17] and then turn to a more general setting. Figure 5 illustrates the behaviour of the function n → q n appearing in the proof of Theorem 2.1. The minimum of this function, denoted by q * is an exact lower bound of the region where A2 is efficient on a continuous scale in a sense that, given prevalence p ∈ (0, 1 − q * ) (or, alternatively, q ∈ (q * , 1)), the achievable minimum of (2, ∞) n → t (A2) (q, n) is strictly smaller than 1. This always holds true for the unrounded optimal configuration, i.e., the minimizer of (2, ∞) n → t (A2) (q, n) on the continuous scale. In [17] , the authors also provide a region of this kind. We utilize that region in Corollary 2.1 and denote it (q 5 , 1). Remark 2.1 explains why it can be thought of as a region where A2 makes sense from the practical point of view and how this can be derived from our results. Linking our work to theirs, it is important to mention that, deriving their proofs, Hudgens and Kim [17] operated on the discrete scale and obtained this region as the one where {5, 6, . . .} n → t (A2) (q, n) is efficient. Having proved that n = 2, 3, 4 are never optimal, they have also verified that (q 5 , 1) is the region where practical application of A2 makes sense. Summing up, in this direction, our input adds the missing part to the theoretical characterization of A2 yet does not bring novelty from the applied point of view. Turning to formulae (2.4), (2.5), the situation is essentially different. Kim and Hudgens [17] have given lower and upper bounds on the configuration opt ) as p → 0+ (see a comment in the forthcoming Subsection 3.2 regarding an importance of analysis of this kind). It is again worth to stress up that, in this asymptotic setting, validity of pure numerical analysis is always questionable whereas analytical result leads to accurate and definite analysis. Several authors have already noted that array based algorithms can be more efficient in certain settings [36] , [21] , [20] . Our findings confirm these observations: comparisons given in Subsection 2.2.1 demonstrated that there were regions where A2 performed better than other considered schemes. The same behaviour is expected for other unconsidered schemes satisfying binomial testing assumptions. Therefore, though initially applied in a frame of genetic a screening [4] , [3] , [6] , A2 is an appropriate candidate for other GT applications as well. Since GT is most effective when the prevalence p is low, for any scheme X, asymptotic behaviour of t (X) (p, N This, however, comes at a cost of a quickly increasing maximal tested pool size at optimal configuration with relationships inverse to those above: We would like to emphasize that, in case of A2, N (A2) opt ) is the maximal tested pool size (corresponding to the test of a distinct row/column) whereas in case of all the rest schemes it is N opt samples never get tested as a single specimen. This distinction is important since in the applied setting one has to take into account the fact that the operating characteristics of the test might depend on the size of the tested pool. That is, sensitivity and/or specificity of the test might become unacceptably low when the pooled sample is formed out of N ≥ N critical individual samples. Also, subjects might be dependent and it might be reasonable to take into account imperfectness of the test even on the single individual basis. These facts limit usage of many schemes satisfying binomial testing assumptions. E.g., scheme H, also known under the name of optimal testing algorithm and giving best asymptotic performance as p → 0+, requires large pool sizes. Inspecting figure 2, one sees that A2 also exhibits such behaviour. Nonetheless, investigations giving theoretical characterization of the GT scheme considered are important because of the following reasons. Convenience. When choosing between two realizable schemes, the preference does not always fall on the optimal one. If it turns out that the optimal scheme yields a small surplus in gain, it can be exchanged to a more simply realizable competitor. E.g., scheme D was recently employed in Lithuania [2] for massive COVID-19 testing and a large number of other countries do the same [8] . Turning to A2, it appears that one of the reasons of its emergence in genetic applications was an operational convenience. Tolerable errors. Though there exist a lot of generalizations allowing imperfectness of the test (e.g., [21] , [20] , [14] ), as noted by several authors [36] , [17] , [24] , the schemes assuming perfect tests can be quite accurate since modern tests exhibit very small errors. Benchmarking. BTA based schemes, being more simple to treat analitically, provide theoretically justified benchmark thresholds for more elaborated schemes which assume imperfectness of the test and/or other specific conditions (e.g., testing outcome dependence on subject specific characteristics). Basement. More elaborated schemes may emerge on the basis of the simpler ones. An example is the scheme S introduced in Subsection 2.2.1. It was built on the top of scheme D. In [21] and [14] , A2 serves as a basis for extensions incorporating imperfectness of the test and dilution effect. [41] Wes McKinney. Data Structures for Statistical Computing in Python. In Stéfan van der Walt and Jarrod Millman, editors, Proceedings of the 9th Python in Science Conference, pages 56 -61, 2010. [42] Eugene Litvak Xin Ming Tu and Marcello Pagano. On the informativeness and accuracy of pooled testing in estimating prevalence of a rare disease: Application to hiv screening. Biometrika, 82:287-297, 06 1995. [43] N. Zaman and N. Pippenger. Asymptotic analysis of optimal nested grouptesting procedures. Probability in the Engineering and Informational Sciences, 30:547-552, 2016. [44] Ugnė Čižikovienė and Viktor Skorniakov. On a couple of unresolved group testing conjectures. Communications in Statistics -Theory and Methods, 2021. (in press). Remark A.1. In the proof of Theorem 2.1, Step 1 is a repetition of Lemma 1 in [17] . We have decided to rewrite it here because it is very short and, along the way, some notions used in the sequel appear. Proof of Theorem 2.1. Step 1. We first show that, for any fixed n ∈ (2, ∞), there exists a unique q n ∈ (0, 1) such that g(q n , n) = 0, g(q, n) < 0 ∀q ∈ (q n , 1) and g(q, n) > 0 ∀q ∈ (0, q n ). To this end, note that ∂ ∂q g(q, n) = −2nq n−1 + (2n − 1)q 2n−2 = − 2nq n−1 1 − 2n − 1 2n q n−1 < 0 ∀n ∈ (2, ∞). Thus, given n ∈ (2, ∞), q → g(q, n) is decreasing on (0, 1). Since g(0+, n) = 2 n > 0 and g(1−, n) = 2 n − 1 < 0, the claim holds true. Step 2. From Step 1 it follows that, for any fixed q ∈ (0, 1), we have a well defined function n → q n which is given implicitly by equation g(q n , n) = 0. Since this function is continuous, its range I ⊂ (0, 1) is an interval. Further, for ε ∈ (0, 1) and n = 2 + ε, provided ε is small enough. Hence, analysis accomplished in Step 1 implies that q 2+ε ∈ (1 − ε, 1). Therefore, I = (q * , 1) for some q * ∈ (1/2, 1). To justify the lower bound 1/2, note that Since the right hand side holds true for all n > 2, it follows that g(1/2, n) > 0 ∀n > 2. Also note that g(q, n) > 0 ∀ (q, n) ∈ (0, q * ) × (2, ∞) since an opposite contradicts the definition of q * . Step 3. Fix q ∈ (q * , 1). By Step 1-Step 2, g(q, n) = 0 has at least one solution n = n(q) (suffices it to take n such that q n = q). To show that there are two solutions, put c = 1 2q , make a change of variable q n = x, and rewrite g(q, n) = 0 in a form Since 1 − cx > 1 − c > 0 and − ln x > 0 for all x ∈ (0, 1), it follows that 1 − 2cx > 0 as well, provided x solves (A.2). Therefore, the range of possible solutions of (A.2) shrinks to (0, q). Moreover, relationships lim c → x 0 (c) is well defined and decreasing since Therefore, (q * , 1) q → x 0 (q) is increasing. Taking into account that q → − ln q is decreasing, we finally deduce that h(x 0 ) > − ln q for all q ∈ (q * , 1). The monotonicity of q → x 0 (q) and q → − ln q also leads to conclusion that q * can be solved from equation along with a unique n * ∈ (2, ∞). Hence (i). Step 4. Assume the setting of Step 3. Let 0 < x L = q n U < x U = q n L < q denote two solutions of (A.1). By above, h(x) > − ln q ⇐⇒ x ∈ (x L , x U ). Reverting to (0, ∞) n → g(q, n), this reads as g(q, n) < 0 ⇐⇒ n ∈ (n L , n U ). Note that n → g(q, n) > 0 in the neighborhood of ∞. Also, from Step 1 - Step 2, we have that n → g(q, n) > 0 in the right neighborhood of 2 and that n U ∈ (2, ∞). Therefore, continuity of n → g(q, n) > 0 yields that n L ∈ (2, ∞) as well. Finally, it is clear that n L < n * < n U (see figure 5 for a graphical illustration). Hence (ii). Step 5. In this step, we identify the number and location of zeroes of the derivative of (2, ∞) n → g(q, n) having fixed q ∈ (q * , 1). From analysis given in Step 4, it follows that (2, ∞) n → g(q, n) has at least two extremes: there must be a minimum in (n L , n U ) (since function is negative here), and maximum in (n U , ∞) (since the function is positive here and lim n→∞ g(q, n) = 0+. To see that there are no other extremes, consider equation and rewrite it by making use of notions introduced in Step 3 as follows: Next, note that: Taking this information into account, we conclude that (A.4) can have at most two solutions and confirm thereby the assertion stated above. Step 6. It remains to justify expression (2.4). Let n(q, t) = 1 It suffices to prove that, for all q ∈ [0.755, 1), the following statements hold true: (a) max(g(q, n(q, 0)), g(q, n(q, 1)) < 0; and (b) ∃t ∈ [0, 1] : ∂ ∂n g(q, n) n=n(q,t) = 0. Analytical calculations behind (a) and (b) are standard yet very lengthy and tedious. Therefore, we omit the details and end up with a graphical proof and a sketch of the analytical one. Figures 6-7 show graph of q → max(g(q, n(q, 0)), g(q, n(q, 1))) from which it is evident that (a) holds. Analytical proof consists of the following steps. (s1) Calculate ∂ ∂q g(q, n(q, i)), i = 0, 1. (s2) Check that ∂ ∂q g(q, n(q, i)) < 0, i = 0, 1 on [0.755, 1) and deduce that g(q, n(q, i)) decrease on [0.755,1). (s3) Conclude that (a) indeed holds since g(0.755, n(0.755, 0)) ≈ −0.002258 and g(0.755, n(0.755, 1)) ≈ −0.013690. Turning to (b), first rewrite (A.3) as follows: −n 2 q n ln q(1 − q n−1 ) = 1. Next, consider function h(t, q) = −n 2 (q, t)q n(q,t) ln q(1−q n(q,t)−1 )−1 with n(q, t) given by (A.5) and q ∈ [0.755, 1). Since t → h(t, q) is continuous, it suffices to show that, for any q ∈ [0.755, 1), h(0, q) < 0 and h(1, q) > 0. Figure 8 shows graphs of q → h(0, q), q → h(1, q). These confirm (b). Considering analytical part, the following is the suggested route. (s1) Calculate ∂ 2 ∂q 2 h(i, q), i = 0, 1. (s2) Check that ∂ 2 ∂q 2 h(0, q) > 0 whereas ∂ 2 ∂q 2 h(1, q) < 0 on [0.755, 1) and deduce that h(0, q) is convex downwards whereas h(1, q) is convex upwards on [0.755,1). Proof of Corollary 2.1. Uniqueness of q 5 was established in Step 1 of the proof of Theorem 2.1. Hudgens and Kim [17] (Lemmas 2, 7, and 14) have demonstrated that n opt (q) ∈ {2, 3, 4} ∀q ∈ (0, 1). From their results we also have that n opt (q) = 5 ∀q ∈ (q * , 0.755]. It is straightforward to verify that ∀ q ∈ (q 5 , 0.755] 1 Hence the claim in the region (q 5 , 0.755]. For q ∈ [0.755, 1), it follows from Theorem 2.1 by noting that at least one of numbers in the set (2.5) belongs to {n ∈ (2, ∞) : t(q, n) < 1} because of (a) in Step 6 of proof of Theorem 2.1. The material contained in this appendix was produced by making use of an open source computer algebra system SymPy [26] and the python packages constituting the core kit for scientific numerical programming with python: NumPy [15] , SciPy [39] , Matplotlib [18] and Pandas [41], [29] . The table accompanying Subsection 2.2.1 is placed in our Open Science Framework repository (file A2_table.csv). For the prevalence p ranging over (0, 0.249790) at a step equal to 10 −6 and for each scheme considered there, the following information is given: optimal (rounded) pool size N opt (q) and gain G(q) = 1−t(q, N opt (q)). In case of A2, n : Graph of q → max(g(q, n(q, 0)), g(q, n(q, 1))) for q ∈ [0.765, 1). For reference, function identically equal to 0 is plotted in red. Assessment of Specimen Pooling to Conserve SARS CoV-2 Testing Resources Lithuanian firms to get 30 million euros to test employees A two-dimensional YAC pooling strategy for library screening via STS and Alu -PCR methods Theoretical analysis of library screening using a N-dimensional pooling strategy Informative Retesting Efficient pooling designs for library screening Combinatorial pooled sequencing: experiment design and decoding Wikipedia contributors. COVID-19 testing To pool, or not to pool? Evaluation of pool-based testing approaches to enable population-wide screening for COVID-19 Simulation of pooled-sample analysis strategies for COVID-19 mass testing The detection of defective members of large populations Pooling designs and nonadaptive group testing: important tools for DNA sequencing. Number v Array-based schemes for group screening with test errors which incorporate a concentration effect Array programming with NumPy Sample Pooling as a Strategy to Detect Community Transmission of SARS-CoV-2 Optimal Configuration of a Square Array Group Testing Algorithm Matplotlib: A 2d graphics environment Inspection Errors for Attributes in Quality Control Three-Dimensional Array-Based Group Testing Algorithms Comparison of Group Testing Algorithms for Case Identification in the Presence of Test Error Pooling of samples for testing for SARS-CoV-2 in asymptomatic people A note on the minimax solution for the two-stage group testing problem Revisiting Nested Group Testing Procedures: New Results, Comparisons, and Robustness. The American Statistician Pooled nucleic acid testing to identify antiretroviral treatment failure during HIV infection Sympy: symbolic computing in python A pooled testing strategy for identifying SARS-CoV-2 at low prevalence Fundamental limits of pooled-DNA sequencing The pandas development team. pandas-dev/pandas: Pandas Group Testing for Severe Acute Respiratory Syndrome-Coronavirus 2 to Enable Rapid Scale-up of Testing and Real-Time Surveillance of Incidence The Exact Solution to the Two-Stage Group-Testing Problem Group testing: Revisiting the ideas Group testing to eliminate efficiently all defectives in a binomial sample On the Detection of Defective Members of Large Populations Nucleic Acid Testing to Detect HBV Infection in Blood Donors Maximally efficient two-stage screening The use of a square array scheme in blood testing Two-Stage Hierarchical Group Testing for Multiple Infections with Application to the Infertility Prevention Project Paul van Mulbregt, and SciPy 1.0 Contributors. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python Pooled Testing for HIV Screening: Capturing the Dilution Effect