key: cord-0641720-odanagte authors: Yu, Letian; Daly, Fraser; Johnson, Oliver title: A negative binomial approximation in group testing date: 2022-03-15 journal: nan DOI: nan sha: c0ef8d9eacfe65ffc78906c88cc909fcbee4b302 doc_id: 641720 cord_uid: odanagte We consider the problem of group testing (pooled testing), first introduced by Dorfman. For non-adaptive testing strategies, we refer to a non-defective item as `intruding' if it only appears in positive tests. Such items cause mis-classification errors in the well-known COMP algorithm, and can make other algorithms produce an error. It is therefore of interest to understand the distribution of the number of intruding items. We show that, under Bernoulli matrix designs, this distribution is well approximated in a variety of senses by a negative binomial distribution, allowing us to understand the performance of the two-stage conservative group testing algorithm of Aldridge. The group testing (pooled testing) problem was introduced by Dorfman [14] , and provides a way of efficiently finding a small number of infected individuals in a large population. The key construction underlying this method is a so-called pooled test: given a subset S of the population, we combine samples from each member of S into a testing pool, and test them all together. We suppose that such a test returns a positive result if and only if at least one person in S is infected. Hence, a negative test allows us to deduce that every person in S is not infected, allowing for efficient screening of individuals. Some inference can be drawn from a positive test, but the analysis is typically more involved. By carrying out a series of pooled tests, we hope to efficiently identify all the infected individuals -using as few tests as possible, and ideally using simple algorithms which do not require intensive computation. The group testing problem has generated an extensive literature, surveyed for example in [5, 15] , and a large variety of variations on the problem exist. Group testing has been proposed as a solution to problems in a wide variety fields (see [5, Section 1.7] for a survey of some of these applications). In engineering and computer science in particular, it has been used to efficiently search computer memories [24] , to give a novel data compression algorithm [21] , to detect high-demand items in databases [11] , to bound the performance of multi-access communications channels [32] , and in many other problems besides. Due to the shortage of tests early in the COVID-19 pandemic, group testing was a natural solution proposed to find infected individuals, and has indeed been deployed at scale in a variety of countries (see [4, Section 6] for a review of some such uses). One key distinction is whether we are allowed to employ adaptive testing strategies (where the choice of individuals to be tested can depend on the outcome of previous tests) or are restricted to non-adaptive ones (where the test design is chosen in advance). Clearly, the ability to search adaptively cannot harm us, and indeed it is known (see [5, Section 1.5] ) that in this case Hwang's algorithm [22] , based on efficient binary search, requires a number of tests which is asymptotically optimal for a wide range of parameters. However, in many circumstances, adaptive algorithms may not be an option, and there is independent mathematical interest in understanding the performance of algorithms which are non-adaptive or are restricted to a small number of stages. One motivating example for the study of group testing algorithms with restricted stages is that of COVID testing. It has been known since the early days of the coronavirus pandemic that a sample from one infected individual gives a strong enough PCR positivity signal when mixed in a pool of 32 or 64 samples that group testing is a potentially viable strategy [33] . However, to take advantage of the ability of PCR machines to perform 96 or more tests at a time in parallel [16] we need to use non-adaptive algorithms. Further, as argued in for example [26] , if each round of tests takes a few hours to be processed, then multi-stage binary search algorithms can give information about the infection status of individuals too late to be useful, meaning that the virus could have already been passed on before the results are received. For this reason, in this article we will focus on non-adaptive and two-stage algorithms. When we work non-adaptively, we must declare our testing strategy in advance, and it is perhaps not immediately obvious which strategy to choose. One simple idea, which we will refer to as Bernoulli testing, consists of randomly placing each member of the population into each pool with the same probability p, each such choice being taken independently of one another. In fact, if there are k infected individuals, it is generally a good strategy to choose p = 1/k, so that there is on average one infected individual in each test pool. Of course, Bernoulli testing is not the only strategy, and improved performance can be obtained by placing each person in a fixed number of tests at random [23] , or by more advanced test designs [10] . However, Bernoulli testing is simple to describe and analyse, and generally gives performance [5, Section 2] within a constant multiple of the best possible, so we will focus on it here. Having chosen a particular non-adaptive test strategy, a key question is how we find the infected individuals. A variety of algorithms are possible but one particularly simple one is referred to as COMP after its use in the paper [9] , but dates back at least to the work of Kautz and Singleton [24] . This algorithm works as follows: as mentioned above, each person who appears in a negative test is guaranteed to be not infected. Hence, we can build a list of non-infected people by collecting together the people from each negative test. For definiteness, we simply assume that everyone else is infected. Given enough tests, each non-infected person should appear in at least one negative test, but using arguments based on the coupon-collector problem we can deduce that this may require more tests than we would otherwise hope. If we use insufficiently many tests, the algorithm is likely to fail, with a clear single source of error. That is, if some non-infected person only appears in positive tests, then they will be incorrectly classified as infected. We refer to such a person as 'intruding', and the focus of this paper will be to count the number of intruding individuals, which we will refer to as G. (Each person declared to be non-infected will definitely be so -see [5, Lemma 2.3] .) The COMP algorithm succeeds in deducing every individual's infection status exactly if and only if G = 0, but by understanding the distribution of G we can also consider some related issues. First, as mentioned previously, given perfect testing, COMP never classifies a non-infected person as infected, and can be used to provide a quick screening of the population. Knowledge of G tells us precisely how many healthy people would be wrongly quarantined as a result of this screening, so given a particular tolerance of this effect we could choose the number of tests accordingly. Second, we can regard COMP as the first stage of a 'conservative two-stage group testing' algorithm as described by Aldridge [2] , where following an initial screening using COMP we choose to test each person who has not received a clean bill of health using individual testing. [2, Theorem 1] describes the expected number of tests for such a procedure to succeed. If there are k infected people, clearly the second stage requires G + k tests to succeed, so by understanding the distribution of G we can approximate the probability this two stage algorithm will succeed, which can give more information than the expected value. Further, given an overall budget of T tests, we might wish to know the optimal number of tests T 1 to use for the initial COMP stage, and analysing the distribution of G will give insight into this. Finally, the purely non-adaptive DD algorithm introduced in [3] uses COMP as a first stage of the analysis, and performs a further analysis based on looking for positive tests that contain exactly one 'non-screened' item. Informally, we know that DD will succeed if the number of intruding items G is much less than the number of defectives k, so again by understanding the distribution of G we gain insight into the performance of DD. The structure of the remainder of the paper is as follows. In Section 2 we give a more formal introduction to the group testing problem, including introducing notation, defining the COMP algorithm and proving some simple properties of the number of intruding items G. In Section 3 we show that G can be well approximated by a negative binomial distribution, first by considering a limiting argument that shows convergence of all falling moments in an asymptotic limit and then giving a more detailed bound based on a novel adaptation of the Stein-Chen method which gives bounds in finite blocklength settings as well. Section 4 discusses the implications of these results for various group testing algorithms, before a brief conclusion is given in Section 5. We now state the group testing problem in slightly more formal language and introduce some notation similar to that of [5] . We will write n for the total population of individuals, which we will refer to as 'items'. Instead of referring to infected and healthy individuals, we will follow standard group testing terminology by calling them 'defective' and 'nondefective' respectively. We write K for the set of defective items (or defective set for short) and k = |K| for the total number of defective items, and T for the number of tests. As is standard, we can represent a non-adaptive testing strategy by a binary T ×n test matrix X, with rows corresponding to tests and columns corresponding to items. Here the entry X ti = 1 means that item i appears in test i. In this paper we focus on Bernoulli testing, where the (X ti ) are independent Bernoulli random variables with parameter p. We will refer to this as a 'Bernoulli test design with parameter p', and this design will apply throughout. Of particular interest will be the case p = 1/k. The outcome of test t is represented as a binary value Y t (where Y t = 1 means a positive test) which can be calculated for this test matrix as where represents a standard binary OR, capturing the fact that each test is positive if and only if it contains at least one of the items in K. As in [3] we write q 0 = (1 − p) k , noting that each test is positive independently with probability 1 − q 0 (since it is negative if and only if it contains none of the k defectives). Again, as in [3] , [10] and other papers we will often study what is referred to in [5] as the sparse regime. In this setting the number of items n tends to infinity and the number of defectives k = k(n) = n θ for some explicit parameter θ ∈ (0, 1). In this context it is natural to consider an asymptotic regime where the number of tests T = (c/q 0 )k log(n) for some constant c, noting that if p = 1/k then asymptotically q 0 converges to e −1 . Here and throughout our work we write log for the natural logarithm. Another asymptotic setting of interest (see for example [1] and [5, Section 5.5] ) is referred to as the linear regime. Here, again n tends to infinity, and the number of defectives k = k(n) = βn for some explicit parameter β ∈ (0, 1). In this context, we consider an asymptotic regime where the number of tests T = cn for some constant c. Although in this regime Aldridge [1] proved that no non-adaptive algorithm can outperform individual testing, there is still interest in understanding the performance of two-stage or adaptive algorithms, and analysis of G of the kind presented in this paper can help with this. However, in many practical group testing contexts we are also interested in what we refer to as the finite blocklength setting, following terminology popularised for example by the work of Polyanskiy et al. [27] . In this context we wish to understand the performance of algorithms in solving concrete problems such as n = 500, k = 10 (see [3] ), where asymptotic bounds may not necessarily give the best guide to actual performance. Interest in finite blocklength problems was particularly prompted by the COVID pandemic, where for example the use of 96-well PCR plates [16] means that the number of tests may be bounded by (a multiple of) 96. This setting has typically been less well explored than asymptotic settings such as the sparse or linear regimes described above, however we provide some bounds in this context. At various points in our analysis we will find it useful to work in terms of the falling moments of various random variables, and we will write We will also write w(u) for the Hamming weight of the binary vector u. The COMP algorithm uses the test outputs Y and matrix X to produce an estimate of the defective set K which we will write as K COMP . In fact it is easier to consider the complement of K COMP : an item will appear in this complement if it appears in some negative test. Formally speaking Notice that if item j really is defective (i.e. j ∈ K) then for every test s with X sj = 1 then Y s = 1 (by the definition of the group testing action in (1)), so that (2) implies that j is not in K COMP . In other words K ⊆ K COMP (see [5, Lemma 2.3] ). COMP is an attractive algorithm because it is simple to perform and interpret, and because of this performance guarantee in one direction. However, in practice if we don't perform enough tests then K COMP can be significantly larger than K, meaning that many non-defective items would be misclassified, potentially causing problems in healthcare related situations where unnecessary quarantine could result. For this reason, Aldridge [2] proposed what he refers to as a conservative two-stage algorithm. Here, given a total budget of T tests, we should use T 1 of them to perform the COMP algorithm in the usual way, and then the remaining T 2 := T − T 1 tests to perform individual testing of each of the items in K COMP , which have not been classified as non-defective. While this algorithm may be inferior in performance to a two-stage algorithm which uses the T 1 tests to perform the DD algorithm [3] followed by individual testing, it has the advantage of being transparent to perform for healthcare professionals without a mathematical background. Clearly the conservative two-stage algorithm of Aldridge [2] will succeed in finding all the defective items if the number of second stage tests is greater than or equal to the number of items to be tested. That is, it will succeed when T 2 = T − T 1 ≥ | K COMP |, meaning that we would like to find the size of K COMP . Further, for a given budget of T tests, since larger values of T 1 give smaller K COMP (more stage one tests allow more non-defective items to be screened out) but leave fewer tests available in the second stage, we would like to find a sensible choice of T 1 that manages this tradeoff. We now define the key property we will study in this paper: Definition 2.1. We define 1. a non-defective item as 'intruding' if it only appears in positive tests. G i = I (item i is non-defective and intruding) , G = (G 1 , . . . , G n ) the binary vector with these components, and More formally, if non-defective item is intruding then Y s = 1 for every s with X s = 1, so that (by (2)) we know / ∈ K c COMP , so ∈ K COMP . In other words, if a non-defective item is intruding then COMP will mistakenly declare it to be defective. Since nonintruding items are declared to be non-defective by COMP, we know that the size of K COMP (which determines the success of the two-stage algorithm of Aldridge [2] ) is exactly For a given non-defective item i we can work out the marginal distribution of G i relatively easily: Proof. Item i is intruding if no test contains item i and no defective item. The probability of the event that test t contains item i and no defective item is p(1 − p) k = pq 0 , so since successive tests are independent, the chance that we avoid this event for each test is If the G i were independent, then the analysis of the distribution of G would be easy: using Lemma 2.2 then G would be binomial with parameters n − k and (1 − pq 0 ) T . However, there is a dependence between the G i . In fact, if we learn that a given item is intruding, that suggests there might be more positive tests than average, which would imply that other items are more likely to be intruding. We formalise this intuition by showing that the G i are more likely be equal to 1 together than independence would imply. That means that if COMP fails, it is more likely to fail badly (with a large total G) than a naive analysis based on Lemma 2.2 might suggest. We prove this in two ways, first by showing in Corollary 3.2 that the G i are pairwise positively correlated, and second by proving a stronger result (Proposition 3.4) which shows that the G i have the property of association (see Definition 3.3), which is stronger than positive correlation (see the discussion in [17] for example). In fact, we will deduce the positive correlation result (Corollary 3.2) from an expression for the falling moments of G (Proposition 3.1), which may be of independent interest, extending the result for EG implicit in [2, Theorem 1]). We describe the distribution of G as a binomial mixture of binomial distributions as follows. As in [3] , if we let M 0 be the number of negative tests then we know that: The first result follows because a test is negative if and only if it contains no defective items, and for Bernoulli testing this occurs independently across tests with probability q 0 = (1−p) k . Moreover, the second result follows because a non-defective item is intruding if and only if it does not appear in any of the M 0 negative tests, and each non-defective item is present in a given test independently with probability p. We can use this to prove the following result: Under a Bernoulli test design with parameter p, the falling moments of G are given by for any integer s ≥ 0. Proof. See Appendix A. Using this we can deduce the following result that shows that the G i have positive pairwise correlation: Proof. We consider the variance of G in two different ways: and (using the symmetry between pairs of G i implied by the Bernoulli matrix design) using the fact that for a binary random variable (7) and (8) we obtain that using the expressions for M (2) (G) and M (1) (G) from Proposition 3.1, and the result follows on cancellation. However, we can prove a stronger property than just positive correlation. Recall the following definition: 17] ). Random variables X = (X 1 , X 2 , . . . , X n ) are (positively) associated if, for all increasing functions f and g, We will prove the following proposition: Under a Bernoulli test design, the random variables G = (G 1 , G 2 , . . . , G n ) are associated. Proof. See Appendix B. Combining Proposition 3.4 with Theorem 3.1 of [12] , we find that G is larger, in a convex sense, than a binomial random variable H with parameters n − k and (1 − pq 0 ) T . That is, we have that Eg(G) ≥ Eg(H) for all real-valued functions g with g(x+1)−2g(x)+ g(x − 1) ≥ 0 for all positive integers x, and for which the expectations exist (where we note from Property 3.4 of [13] that convex ordering on the integers is equivalent to convex ordering on the real line). This formalises the notion that G is more variable than it would be if the G i were independent. The function g(x) = x!/(x − s)! satisfies this convexity condition, since direct calculation gives that g( i.e., the falling moments of H act as lower bounds for those of G. Indeed, direct calculation gives that the ratio where so that the ratio between successive falling moments of G and H is increasing in s. Some numerical illustration of this is given in Table 1 below, along with comparison of falling moments of G with those of other distributions which are more suitable than H as approximations of G and which we now discuss in more detail. In this subsection, we start to show that the distribution of G can be approximated by Z, where Z ∼ NB(r, q) follows the negative binomial distribution. For concreteness, we use the parameterization where the probability mass function for the negative binomial distribution is Note that in the case of integer r, the normalization constant Γ(z+r) Γ(r)z! = z+r−1 z , and Z can be interpreted in terms of the number of failures to see the rth success in a sequence of Bernoulli trials. However, we do not require r to be an integer here. The parameter r is sometimes referred to as the dispersion. In this sense it is worth noting that the case r = 1 corresponds to a geometric random variable (as mentioned above) and the limiting regime r → ∞ and q = r/(r + λ) (which corresponds to a meanpreserving limit -see Lemma 3.5 below) gives the mass function of a Poisson random variable with mean λ. However, in the finite blocklength setting we will see that G is often well approximated by a distribution with 1 r ∞, meaning that neither the geometric nor Poisson approximation are valuable. One natural question is that of which negative binomial distribution (which choice of parameters) to use. We argue that one natural choice is based on a standard moment matching argument -since we need two parameters, we need to set EG = EZ and EG 2 = EZ 2 . In fact, equivalently, since Proposition 3.1 and Lemma 3.5 below give closed form expressions for the falling moments of G and Z, it is actually easier to solve M (s) (G) = M (s) (Z) for s = 1, 2. Proof. Using the probability mass function given by (12) we obtain and the result follows. Hence, combining Proposition 3.1 and Lemma 3.5, we have successfully matched the moments if or, equivalently, if In Figure 1 , we illustrate the quality of this negative binomial approximation for G in the case n = 500, k = 10 and p = 0.1. We plot the mass function of the negative binomial random variable Z (with parameter choices as in (16) ) and the corresponding estimated mass function for G obtained by simulation, for several values of the number of tests T in a non-adaptive algorithm. These experiments show that this negative binomial distribution gives a good approximation to the distribution of G for various different values of T . Figure 1 : Probability mass functions of G (blue, obtained by simulation) and approximating negative binomial random variable Z (black). In this case the group testing parameters are n = 500, k = 10 and p = 0.1. The four plots correspond to T = 60, 80, 100, 120. We also find that for many finite blocklength examples the extra flexibility offered by a two parameter approximation means that the negative binomial distribution with parameters given by (16) approximates the low order falling moments of G (and hence the variance, skewness and kurtosis) better than either the Poisson or geometric distributions with a single parameter chosen by matching means. This is illustrated in Table 1 , again in the case n = 500, k = 10, p = 0.1, T = 100. In this setting the dispersion parameter r of Z is 3.66, which is well separated from both r = ∞, corresponding to the Poisson, and r = 1, corresponding to the geometric. In general, direct calculation shows that for any n, p, q, T such that r ≥ 1, then writing X, Z, Y and H for the geometric, negative binomial, Poisson and binomial, respectively, defined in Table 1 , we have This follows by rewriting the expressions in (17) in the form which simplifies to Recall from (10) Table 1 it appears that M (s) (G) ≥ M (s) (Y ) for small s at least. In the sparse regime described above (in which k = n θ , p = 1/k and T = (c/q 0 )k log(n)), we note that the dispersion parameter r of (16) tends to infinity as n → ∞. This is equivalent to the fact that M (2) (G)/M (1) (G) 2 tends to 1. We can deduce that this holds from (16) by writing Similarly, in the linear regime (k = βn, T = cn) using (18) we obtain since p 2 T = c/(β 2 n). A Poisson approximation to the distribution of G may thus be appropriate in the large-n limit for the sparse and linear regimes, but the numerical results of this section make it clear that a negative binomial approximation is a more natural choice for finite blocklength applications. Next, we show that, in this framework, matching the first two moments of G and Z ensures that all the falling moments converge. We can see this informally by simplifying (5) and (13) respectively. The former gives (to leading order) using the fact that 1 − (1 − p) s = 1 − (1 − ps + O(p 2 )) = ps + O(p 2 ). The latter gives Note that the moment matching condition of (14) gives that r(1 − q)/q = (n − k)(1 − q 0 p) T (n−k) exp(−q 0 pT ), meaning that (19) and (20) agree. A more formal comparison of falling moments is given in the following theorem, where we recall that with the usual choice p = 1/k we have q 0 = (1 − p) k ≈ e −1 : Theorem 3.6. Consider the number of intruding defectives G and negative binomial Z with parameters given by moment matching, satisfying (16) . Under a Bernoulli test design, for any integer s ≥ 1, if we write C = q 0 (1 − q 0 )/(1 − q 0 p) 2 then the moment ratio satisfies Proof. See Appendix C. Hence, for any fixed s the ratio M (s) (G) /M (s) (Z) → 1, for both 1. the sparse regime with k = n θ for some θ ∈ (0, 1) and T = cek log n, and 2. the linear regime with k = βn and T = cn. Here we control the upper bound in (22) using the fact that (see Section 3.2 above) the p 2 T is c log n/(q 0 k) or c/(β 2 n) respectively. Similarly, we control the lower bound (21) using the fact that in both regimes the n − k and dispersion parameter r tend to infinity (again see Section 3.2). Additionally, the bounds (21) and (22) allow us to control the ratio M (s) (G) /M (s) (Z) in the finite blocklength regime, deducing bounds that can be compared with the concrete values given in Table 1 for example. Having seen in Sections 3.2 and 3.3 that a negative binomial distribution seems to be a reasonable approximation for the distribution of G, in this section we adapt the Stein-Chen method to prove explicit error bounds in the approximation of G by a negative binomial distribution. We emphasise that these bounds apply for any finite blocklength application. The error in our approximation of G by Z will be measured in total variation distance, defined by where Z + = {0, 1, . . .} and the infimum is taken over all couplings of G and Z. First, we briefly review the Stein-Chen method in the context of negative binomial approximation. For a more detailed introduction to this technique more generally, we refer the reader to [28] . Recall from [8] that Z ∼ NB(r, q) if and only if for all test functions g : Z + → R for which the expectation exists. This follows as a consequence of the fact that the negative binomial probability mass function of (12) satisfies zP(Z = z) = (1 − q)(r + z − 1)P(Z = z − 1). The key to the analysis is that for each set A ⊆ Z + we can define a function f A : Z + → R which satisfies f A (0) = 0 and the Stein-Chen equation for all z ∈ Z + . Then for any random variable Y we can take the expectation of (25) over Y to obtain Note that (as expected) if Y were negative binomial then both RHS and LHS of (26) would be zero (the latter due to the characterization in (24)). However, we can deduce that if the LHS of (26) is small, then so is the RHS. Indeed, if the LHS of (26) is small uniformly over choices of A then we can deduce a bound in total variation distance since, combining (23) and (25), we have Having reviewed the Stein-Chen method in general, we will now describe how we bound (26) in this specific case. For our approximating negative binomial distribution for G, we will make the following choices of the parameters q and r: where µ = (n − k)e −T pq 0 , and and where, as before, we write q 0 = (1 − p) k . We remark that these parameter choices do not match the first two moments of Z with those of G, unlike those of (16) . Instead, the parameters q and r are chosen to match the first two moments of Z with those of G , to be defined precisely below, in which the binomial mixture which defines G is replaced by a particular Poisson mixture. This may seem a little unnatural at first, but makes sense in the setting of the proof in Appendix D below, and the ultimate effect should be negligible since the distributions of G and G are close, as our proof demonstrates. Our main result is the following Theorem 3.7. Let G be as above, and let Z have a negative binomial distribution with parameters q and r given by (28) . Then and Γ(·, ·) is the normalised upper incomplete gamma function, defined by where Γ(·) is the gamma function. Proof. See Appendix D below. We conclude this section with some numerical illustrations of the bound of Theorem 3.7. The nature of this bound makes it difficult to extract a rate of convergence or other analytical information about its performance, and so we use these numerical illustrations to gain some understanding of its behaviour. Firstly, we note that although our bound applies for any finite blocklength, there are examples in which it is worse than the trivial bound d T V (G, Z) ≤ 1, or performs poorly compared to the simulation results we observed in Section 3.2. For example, with n = 500, k = 10, p = 0.1 and T = 100, the bound of Theorem 3.7 gives d T V (G, Z) ≤ 2.23, which is clearly uninformative. We give some further examples of our upper bound in Table 2 Table 2 : The upper bound of Theorem 3.7 in the case n = 2500 for various values of k, p and T . The symbol "-" indicates that the upper bound is larger than 1, and therefore uninformative. of Theorem 3.7 performs very well, and demonstrates proximity of the distribution of G to negative binomial. It is interesting to note that in many cases the largest share of the error estimate in Theorem 3.7 comes from the first term of the upper bound, which arises from the approximation of the binomial random variable M 0 by a Poisson random variable of the same mean (see the proof in Appendix D for details). Nevertheless, the bound we would obtain without making this approximation generally performs worse than our Theorem 3.7. We conjecture that this is because the integral in the final term of the upper bound is made smaller by this approximation of M 0 (compared to the corresponding term without this approximation), resulting in a smaller upper bound overall because of the relatively large factors multiplying this integral. As discussed previously, understanding the distribution of G allows us to control the performance of the conservative two-stage algorithm of Aldridge [2] . Specifically, we consider the scenario where T 1 tests are performed in the first stage according to a Bernoulli testing design with parameter 1/k, and then T 2 individual tests are performed afterwards to resolve the status of items we are unsure about. Given a fixed total budget of T tests, we can regard the standard Bernoulli test design as corresponding to T 1 = T and T 2 = 0 (no second stage), and individual testing as corresponding to T 1 = 0 and T 2 = T (no first stage). However, it is natural to consider strategies intermediate to these, to see if better performance can be obtained by choosing some T satisfying 0 < T 1 ≤ T . Given an overall budget of T tests, this leaves us T 2 = T − T 1 individual tests to find the status of k + G items, and we will succeed if T 2 ≥ k + G or, equivalently, if G + T 1 + k ≤ T . Equivalently, the algorithm will fail if G > T − T 1 − k. Aldridge [2, Theorem 1] considers this failure event in terms of expected values. That is, we may wish to choose T 1 to minimise and as in [2] direct calculation gives that the optimal choice of T 1 to control this expectation is Interestingly, since p = 1/k and q 0 1/e, this choice makes the expected value meaning that the average number of intruding non-defectives approximately equals the number of true defectives, so a randomly chosen individual test is positive with probability close to 1/2, maximising the information that we gain from it. However, instead of simply finding the expected value we can use the moment values calculated in this paper to bound the error probability under this kind of two-stage strategy. For example, Chebyshev's inequality gives an upper bound on the error probability: Lemma 4.1. If we use T 1 = kc 1 tests in the first stage and T 2 = kc 2 tests in the second stage Proof. A standard argument gives , and the result follows. Note that the expression (31) becomes exp(c 1 q 0 p) − 1 + exp(c 1 q 0 )/n (β(c 2 − 1) exp(c 1 q 0 )/(1 − β) − 1) 2 (32) in the linear regime (k = βn), since Hence if we take c 1 = e log((1 − β)/(βe)) as suggested by (30) , so that exp(c 1 q 0 ) (1 − β)/β, then (32) becomes so for any fixed c 2 > 2 the error probability tends to zero at rate 1/n as n → ∞. Further, using the negative binomial approximation of this paper we can find large deviations bounds on the success probability of the two-stage algorithm. Figure 2 shows that in this case this negative binomial approximation provides accurate bounds on the total number of tests needed. That is, if we write Z for the negative binomial approximation with parameters given by (16) , we know that for any g the tail probability P(G > g) P(Z > g). Using a standard large deviations argument, we know: If Z is negative binomial with parameters r and q, then for any g > EZ: w) ) for the Kullback-Leibler divergence from a Bernoulli(v) random variable to a Bernoulli(w). Proof. See Appendix E. Hence, again in the linear scenario (k = βn), taking T 1 = kβe log((1 − β)/βe) tests in the first stage (as suggested by (30) ) and T 2 = kc 2 in the second, then the probability of failure will be which we can bound using Proposition 4.2. Using the explicit bounds above, we obtain an upper bound decaying exponentially in k. In Figure 3 we see that the negative binomial approximation Z well approximates the distribution of G and that the upper bounds implied by (31) and (33) are somewhat tight. This analysis could be extended to cover a two-stage version of the DD algorithm of [3] . In Stage 1 of this algorithm there would potentially be some items which could be confirmed as 'definitely defective' (because they appear in some positive test only otherwise containing items which are guaranteed by other tests to be non-defective). Such items would not need individual testing in Stage 2, which could potentially reduce the number of tests required by up to k; this could be significant in the linear regime at least. However, we leave this question as further work, for reasons of space and due to the complexity of the analysis of non-adaptive DD in [3] . (31) and (33) are provided for comparion (n = 500, k = 10, p = 0.1). We have identified the key role played by G, the number of intruding non-defective items, in group testing algorithms. Under the standard Bernoulli testing strategy, we have identified the distribution of G, given explicit expressions for its falling moments and shown how it can be well approximated by a negative binomial distribution with given parameters. This allows us to deduce results concerning the performance of the COMP and DD group testing algorithms. We first recall the following result: Lemma A.1 (Falling Moments of Binomial Distribution). Suppose X ∼ Bin (L, t). Then the sth falling moment of X is given by: Proof. We directly evaluate the falling moments as as required. Proof of Proposition 3.1. Recall from (3) and (4) that M 0 ∼ Bin (T, q 0 ) and G | M 0 = m ∼ Bin (n − k, (1 − p) m ). Hence, using the law of iterated expectation and Lemma A.1, taking t = (1 − p) s in the fact that the probability generating function of a binomial random variable Y ∼ Bin (R, u) satisfies by the binomial theorem. Note that we can give an alternative proof of Proposition 3.1, using [20, Lemma 2.2], which gives a multinomial-type expansion for falling factorials based on the Vandermonde identity. This expansion simplifies in the case of binary random variables to give the fact that the falling moment can be expressed as a sum over sets: The summation over sets contributes s! n−k s equal expectation terms, each one of which equals the probability that all elements of a specified set are intruding, which corresponds to the event that none of them ever appear in a negative test, so the result follows by independence of all test items. Proof. We use Proposition 1 of [18] , which shows that it is enough to verify that the G i satisfy the so-called FKG condition: where ∨ and ∧ represent the maximum and minimum respectively. By independence, we can describe the distribution of G conditional on M 0 , the number of negative tests. Recall that M 0 has a binomial distribution with parameters T and q 0 . For any g, we can write where P m = (1 − p) m and R m = P m /(1 − P m ). This follows since we can check which tests the defective items appear in first, and then each non-defective item is independently intruding with probability P m , since it must not appear in the m negative tests. Using this expression, and writing P(G = x ∨ y)P(G = x ∧ y) in the form we can verify the FKG condition (37) by writing We can pair up these α m, terms, noticing the fact that w(x ∨ y) + w(x ∧ y) = w(x) + w(y) means that α m,m vanishes, and that in general we can write where for brevity we write w + = w(x ∨ y) and w(x ∧ y) = w(x) + w(y) − w + . Observe that since w + ≥ w(x) and w + ≥ w(y) both these bracketed terms have the same sign, so α m, + α ,m ≥ 0 and the FKG condition is satisfied. C Proof of Theorem 3.6 Proof. Write L = (n − k) for the number of non-defective items. We know that where we write q 0 = (1−p) k for brevity. By moment matching, we know that L(1−q 0 p) T = r(1 − q)/q, so the ratio We will treat the two bracketed (falling factorial and T th power) terms of (38) separately. 1. Falling factorial term Note that by a termwise comparison. Similarly, we can bound (39) from below using the arithmetic mean-geometric mean inequality as We can write the second term of (38) as (1 − R) T , where we write q = 1 − q 0 p, We first provide an upper bound on R using the fact that θ(x) := (1 + x) s − (1 + xs + x 2 s(s − 1)/2 + x 3 s(s − 1)(s − 2)/6) ≥ 0 (this result follows using the fact that θ(0) = θ (0) and since θ ( Bernoulli's inequality). In (40) this gives where recall that we write C = q 0 (1 − q 0 )/q 2 . We can give a complementary lower bound on R using a standard argument based on the mean value theorem with f (t) = t s by rewriting (40) to obtain for some α ∈ (1 − p(1−q 0 ) q , 1), β ∈ (1, 1 + pq 0 q ) and γ ∈ (α, β). Then since we know (β − α) ≤ p q and γ ≤ β ≤ 1 + pq 0 /q, the expression (41) gives , and the proof is complete. In proving the theorem, our strategy will be to replace G by the mixed Poisson version G defined below (bounding the error in making this replacement). We then approximate G by a negative binomial distribution by noting that a negative binomial can itself be written as a mixed Poisson with gamma mixing distribution. Lemma D.1 below allows us to transfer our negative binomial approximation problem for a mixed Poisson distribution into a gamma approximation problem for the mixing distribution. We may then bound the appropriate distance from our mixing distribution to gamma to complete the proof. Before proceeding with this programme, we first define a further metric we will need: the Wasserstein distance, denoted by d W . For any non-negative, real-valued random variables X and Y , we define where H W is the set of absolutely continuous functions h : R + → R with h ≤ 1, and · is the supremum norm defined by g = sup x |g(x)| for any real-valued function g. Lemma D.1. Let Z have a negative binomial distribution with parameters r and q, and let H have a mixed Poisson distribution, H|ξ ∼ Po (ξ) for some positive random variable ξ. Let η ∼ Γ(r, λ) have a gamma distribution with density function λ r Γ(r) x r−1 e −λx , for Proof. It can be easily checked by direct calculation that Z has the mixed Poisson distribution Z|η ∼ Po (η). Following Stein's method for negative binomial approximation (see [6, 8, 29] and the review in Section 3.4), we let f = f A satisfy f (0) = 0 and (see (25)) where A ⊆ Z + , so that we may write (see (27)) This latter expression is closely related to Stein's method for gamma approximation, as developed by Luk [25] ; see also [19] and references therein for more recent developments. In particular, it is known that since η has a gamma distribution, E[ηg (η)+(r −λη)g(η)] = 0. Letting h(x) = xg (x) + (r − λx)g(x) (and noting that our earlier calculations show that h is differentiable), we may therefore write To complete the proof, it remains only to bound |h (x)| ≤ 2−q 1−q . To that end, we note that where the penultimate inequality again uses (44). Using the bounds (43), we therefore have as required, since λ = q/(1 − q). Proof of Theorem 3.7. We now use Lemma D.1 to establish Theorem 3.7. Recalling that M 0 ∼ Bin (T, q 0 ), we define M ∼ Po (T q 0 ). Similarly, recalling that G has the mixed binomial distribution G|M 0 ∼ Bin (n − k, (1 − p) M 0 ), we define G and G as follows: We then write and bound each of these three terms separately. Firstly, we note that where the final inequality comes from the main result of Weba [31] combined with the sharpened value 0.4748 of the constant in the Berry-Esseen theorem due to Shevtsova [30] . Secondly, using the fact that d T V (Bin (m, p ) , Po (mp )) ≤ p for any parameters m and p (see page 8 of [7] ), we have that To complete the proof, it remains only to bound d T V (G , Z). To that end, we apply Lemma D.1, noting that the parameters q and r of Z are chosen such that the first two moments of Z match those of G : straightforward calculations show that E[G ] = µ and Var (G ) = σ 2 . Lemma D.1 gives where ξ = (n − k)(1 − p) M and η ∼ Γ(r, λ) has a gamma distribution with rate parameter λ = 1−q q . Using scaling properties of the gamma distribution and the Wasserstein distance, we have that where we note that it is straightforward to check that Eη = r λ < 1. Finally, we write where we use the fact that the moment generating function of the negative binomial distribution NB(r, q) is Direct calculation then gives that the optimal value of u to substitute is u * = log g (g + r)(1 − q) , (note that the assumption g > EZ = r(1 − q)/q ensures that g (g+r)(1−q) > 1 so that u * > 0 as required in (45)). The result follows on substitution in (45), since this choice of u = u * makes q 1 − (1 − q)e u = q(g + r) r . Individual testing is optimal for nonadaptive group testing in the linear regime Conservative two-stage group testing Group testing algorithms: bounds and simulations Pooled testing and its applications in the COVID-19 pandemic Group testing: an information theory perspective. Foundations and Trends in Information Theory Stein factors for negative binomial approximation in Wasserstein distance Negative binomial approximation with Stein's method Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms Optimal group testing What's hot and what's not: tracking most frequent items dynamically Does positive dependence between individual risks increase stop-loss premiums? Measuring the impact of dependence between claims occurrences The detection of defective members of large populations Biological screens from linear codes: theory and tools. bioRxiv Association of random variables, with applications Correlation inequalities on some partially ordered sets Chi-square approximation by Stein's method with application to Pearson's statistic Thinning, entropy and the law of thin numbers Group testing for image compression A method for detecting all defective members in a population by group testing Performance of group testing algorithms with near-constant tests-per-item Nonrandom binary superimposed codes Stein's method for the gamma distribution and related statistical applications A strategy for finding people infected with SARS-CoV-2: optimizing pooled testing at low prevalence Channel coding rate in the finite blocklength regime Fundamentals of Stein's method. Probability Surveys Power laws in preferential attachment graphs and Stein's method for the negative binomial distribution On the absolute constants in the Berry-Esseen type inequalities for identically distributed summands Bounds for the total variation distance between the binomial and the Poisson distribution in case of medium-sized success probabilities Born again group testing: Multiaccess communications Evaluation of COVID-19 RT-qPCR test in multi sample pools The collaboration between Letian Yu and Oliver Johnson was supported by a Charles Kao bursary from the Chinese University of Hong Kong in the summer of 2021. We note the following bounds on f , taken from Lemma 3 of [8] and Lemmas 2.2 and 2.3 of [29] , respectively:where ∆f (j) = f (j + 1) − f (j) and D (r) f (j) = j r + 1 f (j + 1) − j r f (j), so that ∆(D (r) f )(j) = j + 1 r + 1 ∆f (j + 1) − j r ∆f (j) .