key: cord-0677083-hs9cbsq1
authors: Aldridge, Matthew
title: Conservative two-stage group testing in the linear regime
date: 2020-05-06
journal: nan
DOI: nan
sha: 9b9775f4d64589bca9905396483cfad16f87dfef
doc_id: 677083
cord_uid: hs9cbsq1

Inspired by applications in testing for Covid-19, we consider a variant of two-stage group testing called"conservative"(or"trivial") two-stage testing, where every item declared to be defective must be definitively confirmed by being tested by itself in the second stage. We study this in the linear regime where the prevalence is fixed while the number of items is large. We study various nonadaptive test designs for the first stage, and derive a new lower bound for the total number of tests required. We find that a first-stage design as studied by Broder and Kumar (arXiv:2004.01684) with constant tests per item and constant items per test is extremely close to optimal for all prevalences, and is optimal in the limit as the prevalence tends to zero. Simulations back up the theoretical results.

Group testing (or pooled testing) is the following problem. Suppose there are n individuals, some of whom are infected with a disease. If a test exists that reliably detects the disease, then each individual can be separately tested for the disease to find if they have it or not, requiring n tests. However, in theory, a pooled strategy can be better: we can take samples from a number of individuals, pool the samples together, and test this pooled sample. If none of the individuals are infected, the test should be negative, while if one or more are the individuals are positive then, in theory, the test should be positive. It might then be possible to ascertain which individuals have the disease using fewer than n pooled tests, thus saving resources when tests are expensive or limited.

Experiments show that the group testing paradigm holds for SARS-CoV-2, the virus that causes the disease covid-19; that is, pools of samples with just one positive sample and many negative samples do indeed produce positive results, at least for pools of around 32 samples or fewer [1, 11, 30, 34] . This work led to a great interest in group testing as a possible way to make use of limited tests for covid-19; for a review of such work we point readers to the survey article [6] .

Many of these papers use a similar model that we also use here: the number of individuals n is large; the prevalence p is constant; each individual is infected independently with probability p (the 'i.i.d. prior'); we wish to reduce the average-case number of tests ET ; and we want to be certain that each individual is correctly classified (the 'zero-error' paradigm). We emphasise the fact that p is constant as n → ∞ puts us in the so-called 'linear regime', rather than the often-studied 'sparse regime' where p → 0 as n gets large; the linear regime seems more relevant with applications to covid-19 and other widespread diseases.

Later, it will sometimes be convenient to instead consider the 'fixed-k' prior, where there is a fixed number k = pn of infected individuals. We discuss this mathematical convenience further in Subsection 3.3.

We note also that here we are making the mathematically convenient assumption that all tests are perfectly accurate. For work that considers similar techniques to those here with imperfect tests, see, for example [6, 4] For more background on group testing generally, we point readers to the survey monograph [7] .

An important distinction is between nonadaptive testing, where all tests are designed in advance and can be carried out in parallel, and adaptive testing, where each test result is examined before the next test pool is chosen.

Recall we are the linear regime, where p is constant. For nonadaptive testing, any nonadaptive scheme using T < n tests has error probability bounded away from 0 [2] and if T < (1 − )n, the error probability in fact tends to 1 [10] . So simple individual testing will always be the optimal nonadaptive strategy, unless errors are tolerated -and errors that become overwhelmingly likely as n gets large. (For group testing in the linear regime with errors permitted, see, for example, [32, Appendix B] .) For adaptive testing, the best known scheme is a generalized binary splitting scheme studied by Zaman and Pippenger [35] and Aldridge [3] , based on ideas of Hwang [24] . This scheme is the optimal 'nested' strategy [35] , and is within 5% of optimal for all p ≤ 1/2 [3] . This algorithm (or special cases, or simplifications) was discussed in the context of covid-19 by [21, 22, 27] . However, fully adaptive schemes are unlikely to be suitable for testing for covid-19 or similar diseases, as many tests must be performed one after the other, meaning results will take a very long time to come back.

We propose, rather, using an adaptive strategy with only two stages. Within stages, tests are performed nonadaptively in parallel, so results can be returned in only the time it takes to perform two tests. This provides a good compromise between the speed but inevitable errors (or full n individual tests) of nonadaptive schemes and the fewer tests but unavoidable slowness of fully adaptive schemes. Two-stage testing goes back to the foundational work of Dorfman [17] , and has been discussed more recently in the context of covid-19 by [8, 11, 13, 19, 21, 23, 31] and others.

Although our interest in two-stage testing is inspired by the usefulness in testing for covid-19 or other similar diseases, the results in this paper are theoretical, and our model is chosen for its mathematical appeal rather than direct applicability to the real world. For further discussion of models with greater direct applicability, we again direct readers to [6] .

From now on, we adopt standard group testing terminology as in, for example, [18, 5, 7, 6] . In particular, individuals are 'items' and infected individuals are (slightly unfortunately) 'defective items'.

A two-stage algorithm that is certain to correctly classify every item works as follows:

1. In the first stage, we perform some fixed number T 1 of nonadaptive tests.

This will find some nondefective items: any item that appears in a negative tests is a definite nondefective (DND). This will also find some defective items: any item that appears in a positive test in which every other item is DND is itself a definite defective (DD).

2. In the second stage, we must individually test every item whose status we so not yet know -that is, all items except the DNDs and DDs. This requires a random number T 2 = n − (# DNDs + # DDs) of tests.

The total number of tests is T = T 1 + T 2 , which has expected value ET = T 1 + ET 2 Ruling out DNDs when they appear in a negative test is a simple procedure in practice: following a negative test, a laboratory must simply report which samples were in that pool, and those items are nondefective. Further, if the test procedure can be unreliable, the procedure can easily be changed to ruling out items after they appear in some number d > 1 of negative tests. However, 'ruling in' DDs is trickier: first, information about all the DNDs must be circulated (potentially among many different laboratories, with the privacy problems that entails), then each positive test must be carefully checked to see if all but one of the samples has been previously ruled out as a DND. Confirming that an item is defective thus involves checking a long chain of test results and pool details, which is complicated, very susceptible to occasional testing errors, and can be difficult to prove to a clinician's or patient's satisfaction.

With these problems in mind, we study the conservative two-stage group testing 1 variant. This adds the rule that every defective item must be definitively 'certified' by appearing as the sole item in a (necessarily positive) test in the second stage. This gives a very simple proof that an item is defective, with a 'gold standard' individual test that will not be susceptible to dilution or contamination from other samples.

So in the first stage of conservative two-stage testing, a nonadaptive scheme is used only to rule out DNDs -that is, items that appear a negative tests and are thus definite nondefectives. However, one may not 'rule in' DDs from the first stage; these must still be tested individually. Thus in the second round, each remaining item is individually tested, requiring T 2 = n − # DNDs tests.

Note that in the first stage of non-conservative two-stage testing, we want to discover both DNDs and DDs, while in the first stage of conservative twostage testing we can concentrate simply on discovering DNDs. Group testing experts will therefore notice that non-conservative two-stage testing has a lot in common with the 'DD algorithm' of [5, 25, 7] , while conservative two-stage testing is more like the 'COMP algorithm' of [14, 5, 25, 7] .

Two-stage testing has previously received attention in the sparse p → 0 setting; we direct interested readers to [28] for more details, or [7, Section 5.2] for a high-level overview. In the sparse regime, recovery always requires at least order k log n tests, so the difference between testing up to k DDs or not -that is, the extra number of second-stage tests required by conservative two-stage testing compared to the 'non-conservative' version -makes up a negligible proportion of tests. Thus in the sparse regime, the distinction between conservative and nonconservative group testing is usually mathematically unimportant. It is only in the linear regime we consider here that we have to worry about the costs of definitively confirming items we think are defective and that the distinction between conservative and non-conservative testing is mathematically important. (When in this paper we give results in the limit p → 0, these results apply equally well to conservative and non-conservative testing.)

In this paper we consider five algorithms for conservative two-stage testing. Recall that the second stage is always 'test every item not ruled out as a DND', so we need only define the first stage.

Individual testing tests nothing in the first stage and tests every item individually in the second stage. Although very simple, this is provably the best scheme for p ≥ (3 − √ 5)/2 = 0.382, and is the best conservative two-stage scheme of those we consider here for

Dorfman's algorithm partitions the items into sets of size s and tests each set in the first stage, then individually tests each item in the positive sets in the second stage. Dorfman's algorithm is the best scheme we consider here for 0.121 < p < 0.307.

Bernoulli first stage where in the first stage, each item is placed in each test independently with the same probability. This scheme is suboptimal, but within 0.2 'bits per test' of optimal for all p. For p > 1/(e+1) = 0.269, the optimal number of first-stage tests is 0, and we recover individual testing.

Constant tests-per-item first stage where in the first stage, each item is placed in the same number r of tests, with those tests chosen at random. This scheme is suboptimal, but very close to optimal when p is small. For p > 0.269, the optimal number of first-stage tests is 0, and we recover individual testing.

Doubly constant first stage where in the first stage, each item is placed in the same number r of tests and each test contains the same number s of items, with tests chosen at random. This is the best scheme we consider for all p, and is extremely close to our lower bound. For p > 0.307, the optimal number of first-stage tests is 0 and we recover individual testing; while for 0.121 < p < 0.307, the optimal number of tests per item in the first stage is r = 1, and we recover Dorfman's algorithm.

A multi-stage scheme of Mutesa et al. [29] will be used as a comparison from the literature.

We also give a lower bound for the number of tests required for conservative two-stage testing (Theorem 6). Along the way, we also find a new lower bound for usual non-conservative two-stage testing (Theorem 5), which may be of independent interest.

Our main results on the average numbers of tests necessary are illustrated in Figure 1 . The top subfigure shows the aspect ratio [3, 6] : the expected number of tests normalised by the number of items ET /n (lower is better) in the large n limit. We compare the aspect ratio to that of individual testing with T /n = 1, to the counting bound (see, for example, [9, 7] ) which says that one must have ET /n ≥ H(p), where H(p) is the binary entropy, and to our new lower bound of Theorem 6.

The middle subfigure shows the rate nH(p)/ET (higher is better) in the large n limit, which corresponds the average number of bits of information learned per test [9, 7] . Looking at the rate allows a much clearer comparison of the different schemes when p is small. The rate can be compared to individual testing, with nH(p)/T = H(p) and to the counting bound that the rate is upper-bounded by 1. Our lower bound of Theorem 6 on the number of tests now becomes an upper bound on the rate.

The rate of the doubly constant design is so close to the lower bound for the number of tests, that it can be difficult to see both lines on the middle subfigure. The bottom subfigure shows a zoomed in section of the rate graph, which demonstrates just how close to optimal this design is.

While the expressions in our main theorems for the expected number of tests required are smooth for fixed values of the parameters, some parameters must be integers, which means there are 'jumps' in the optimal value of those parameters as they change from one integer to the next. This leads to 'crooked lines' in graphs of the aspect ratio, which then corresponds to 'bumpy lines' in graphs of the rate. The 'kink' in the lower bound at p = 0.171 is where the dominant lower bound of Theorem 5 switches from Bound 2 to Bound 3 of that theorem.

Alongside our theoretical results for large n, we present evidence from simulations with n = 1000 items (or just above 1000, if convenient for rounding reasons) and prevalence p = 0.027. We picked this value of p as it corresponded to an estimate by the Imperial College covid-19 Response Team for the prevalence of covid-19 in the UK at the time these simulations were first carried out [20] .

Specifically, we used the following algorithms, with parameters chosen according to what is optimal for p = 0.027 in the large-n limit:

Individual testing with n = 1000 items, so T = 1000 tests.

Dorfman's algorithm with n = 1001 items, and s = 7 items per test, so T 1 = n/s = 143 tests in the first stage.

Bernoulli first stage with n = 1000 items, Bernoulli parameter π = 1/pn = 0.037, and T 1 = 190 tests in the first stage, so σ = πn = 1/p = 37.0 items per test on average.

Constant tests-per-item first stage with n = 1000 items, r = 4 tests per item, and T 1 = 160 tests in the first stage, so σ = nr/T 1 = 25 items per test on average.

Doubly constant first stage with n = 1000 items, r = 4 tests per item, and s = 25 items per test, so T 1 = nr/s = 160 tests in the first stage.

We simulated each algorithm 1000 times. Table 1 shows the results of these simulations, displaying the mean number of tests used, alongside the first and ninth deciles. These simulated results are compared with a 'theory' result, which takes the theoretical behaviour of ET as n → ∞ (from Section 3) and plugs in n = 1000.

We see that, compared to individual testing, the other four algorithms each give at least a three-fold reduction in the number of tests required on average, with constant tests-per-item and doubly constant designs giving a four-fold reduction. The Bernoulli first stage was a significant improvement on Dorfman's algorithm, while the constant tests-per-item and doubly constant designs were a large improvement further. The difference between a constant tests-per-item and doubly constant first stage was small; this is not surprising, as our theoretical results show that constant tests-per-item is very close to optimal for p this small (see Figure 1 ).

We see that Dorfman's algorithm performs on average very close to theoretical predictions. The Bernoulli, constant tests-per-item and doubly constant designs require about 6 more tests on average than the n → ∞ asymptotics imply; this is presumably because the expected number of defective items pn = 27 is sufficiently small that rare large defective populations drive up the average number of tests in a way that becomes increasingly unlikely as pn → ∞. For the numbers we used here, on 10% of occasions there are at least 34 defective items -more than 25% above the expected number of defectives -which would require more tests than expected; but as n → ∞, such a 25% excesses of defective items -even just 1% excesses -will become negligibly rare.

Throughout we write ∼ for asymptotic equivalence: a(n) ∼ b(n) means that a(n)/b(n) → 1 as n → ∞, or equivalently that a(n) = (1 + o(1))b(n); and c(p) ∼ d(p) means that c(p)/d(p) → 1 as p → 0.

Individual testing has no first round T 1 = 0 then tests every item in the second round T 2 = n. This is a conservative two-stage algorithm with T = 0 + n = n. This corresponds to a rate of H(p), which tends to 0 as p → 0. It is proved in [33] that individual testing is the optimal adaptive algorithm for all p > (3 − √ 5)/2 = 0.369.

Dorfman's algorithm [17] was the first group testing algorithm. We split the items into n/s groups of size s. (Here s has to be an integer, but since we are assuming n is large we don't have to worry about n/s being an integer.) If a group is positive, we test all its items individually in stage two. Work that discusses Dorfman's algorithm in the context of testing for covid-19 includes [8, 11, 13, 23, 6] .

The following theorem is well known, but we include it here for completeness.

Theorem 1. Using a Dorfman first stage with with tests of size s, conservative two-stage testing can be completed in

As p → 0, the optimal choice of s is s ∼ 1/ √ p, and we have ET ∼ 2 √ pn.

The rate tends to 0.

Dorfman's algorithm outperforms individual testing for all p < 1 − 1/ 3 √ 3 = 0.307. Interestingly, Dorfman's algorithm with s = 2 is never optimal.

Proof. Dorfman's algorithm uses T 1 = n/s tests in stage 1. Each of the n/s groups is positive with probability 1 − q s , and if it is positive it requires s more individual tests in stage 2. So the expected number of stage 2 tests is

This is a total of

Simple calculus shows this is maximised at s = 1/ √ p, and the final part of the theorem follows.

The Bernoulli design is the most commonly used nonadaptive design and the mathematically simplest. In a Bernoulli design, each item is placed in each test independently with probability π. Here we suggest a Bernoulli design for the first stage of a two-stage algorithm. Bernoulli designs have been studied for nonadaptive group testing in the p = o(1) regime by [14, 5, 7] and others. It will be convenient to write σ = πn for the average number of items per test. Although the Bernoulli first stage is not optimal (see Figure 1 ), it is close to optimal, and the mathematical simplicity allows us to explicitly find the optimal design parameter π = 1/np and the optimal number T 1 of first-stage tests. For models with slightly better performance, the design parameters can optimised be found numerically.

It will be convenient here to work here and for the following algorithms with the so-called 'fixed k' prior, where we assume there are exactly k = pn defective items, chosen uniformly at random from the n items. Since we are assuming the number of items n is large, standard concentration inequalities imply the true number of defectives under the i.i.d. prior will in fact be very close to k = pn. We also note that none of the algorithms we consider here will actual take advantage of exact knowledge of k; it is merely a mathematical convenience to make proving theorems easier. The results we prove under this 'fixed k' prior do indeed hold for the i.i.d. prior also in the large n limit; see [7, Appendix to Chapter 1] for formal details of how to transfer results between the different prior models.

Throughout we write ∼ for asymptotic equivalence: a(n) ∼ b(n) means that a(n) = (1 + o(1))b(n) as n → ∞, or equivalently that a(n)/b(n) → 1.

Theorem 2. Using a Bernoulli(π) first stage with an average of σ = πn items per test, conservative two-stage testing can be completed in

tests on average, , as n → inf ty. When the prevalence p is known, the optimum value of π is 1/pn, and we can succeed with Proof. We need to work out how many nondefective items are discovered by the Bernoulli design. A given nondefective item is discovered by a test if that item is in the test but the test is negative. This happens with with probability

When the p is known, simple calculus shows that this is maximised at σ = 1/p, where it takes the value e −1 /pn. Thus the probability a nondefective item is not discovered is

Therefore, the total number of tests used by this algorithm on average is

At the optimal σ = 1/p, this is

We differentiate to find the optimum value of Compared to the binary entropy H(p) ∼ p log 2 1/p, we see this gives a rate of 1/(e ln 2).

In a constant tests-per-item nonadaptive design, we have a constant number r tests per item. For convenience, we arrange these in r rounds of T 1 /r tests, one test per item in each round, with that tests chosen independently uniformly at random. Rounds can be conducted in parallel, so this is not adding extra stages to our two-stage algorithm. The test for an item in a given round is chosen uniformly at random from the T 1 /r tests, independently from other items. It will be convenient to write σ = nr/T 1 for the average number of items per test. Constant tests-per-item designs are optimal nonadaptive designs in the sparse p → 0 regime [16, 25] , so are a good candidate for the nonadaptive stage of a two-stage scheme. It is therefore not surprising that its performance is very close to optimal when p is small (see Figure 1 ). Theorem 3. Using a first stage with a constant number r of tests per item and an average number of σ items per test, conservative two-stage testing can be completed in

tests on average, as n → ∞.

As p → 0, we can succeed with

tests on average, and can achieve the rate to ln 2 = 0.693.

When the prevalence p is known, r and σ can be numerically optimised easily.

Proof. As before, we prove our result under the fixed-k prior. A nondefective item appearing in a given test sees a positive result with probability

as n → ∞, as we get a positive result unless in that round all k defective items avoid the test that the given item is in. Thus all r tests are positive with probability (1 − e −pσ ) r , since splitting the tests into rounds and using the fixed-k prior ensures these events are independent. Therefore the number of tests required is

As p → 0, we have

Let us chose σ = (ln 2)/p and r = log 2 1/p. (Since r must be an integer, we should actually take r to be the nearest integer to log 2 1/p. However, since r → ∞ as p → 0, the rounding error is negligible in this limit.) This gives us

which corresponds to a rate of ln 2.

We now consider a first stage with both constant tests-per-item and constant items-per-test. We take r tests per item and s items per test. Note that doublecounting tells us we must have T 1 s = nr. We again use a structure where the first stage has multiple parallel rounds. There are r rounds, and each round consists of T 1 /r = n/s tests each containing exactly s items, and each items placed in exactly 1 test, with the design uniformly at random according to this structure. (A round can be thought of by making n/s rows of s boxes each, each row representing a test in that round, then placing the n objects in one of those n/s × s = n boxes at random, one item per box.) Note that r and s must be integers. Taking r = 1 and s > 1 recovers Dorfman's algorithm. Taking r = 2 gives the 'double pooling' algorithm of Broder and Kumar [13] . Taking r > 2 gives Broder and Kumar's more general 'r-pooling' algorithm [13] .

Work to discuss doubly constant designs in the context of testing for covid-19 includes [11, 13, 31] . When the prevalence p is known, r and s can be numerically optimised. Note that putting r = 1 does indeed give

as for Dorfman's algorithm. We note that the expression here is the same as that heuristically demonstrated by Broder and Kumar [13] , who use the i.i.d prior as if there were independence within rounds. (They say that they will discuss the accuracy of this approximation in 'the final version' of [13] ). By using the fixed-k prior here, we actually do have independence within rounds, so can formally prove the result. In the large n limit, this then transfers to the i.i.d. prior, as discussed earlier and in [7, Appendix to Chapter 1].

Proof. Fix r and s. Consider a test containing a given nondefective item. There are n−1 s−1 ways for the remaining s − 1 items in the test to be filled, and n−1−k s−1 ways for it to be filled up with other nondefective items. Therefore, the probability the probability that the test is negative is

Since with the fixed-k prior we have independence between rounds, we have that the probability all the tests containing the nondefective item are positive -and this that the nondefective item requires retesting in the second stage -is (1 − q s−1 ) r . All k = pn defective items must be retested in the second stage, of course.

Over all, the expected number of tests required is

The analysis as p → 0 is the same as for the constant tests-per-item design: we again take s = (ln 2)/p and r = log 2 1/p (or the nearest integers thereto), and the asymptotic behaviour is identical in the p → 0 limit.

We will briefly compare the results of our schemes with a scheme from a Nature paper by Mutesa et al. [29] . This scheme is multiple-stage scheme that 'usually' requires two stages, and uses similar ideas to ours in this paper. However, Mutesa et al. report that their scheme requires ep ln(1/p) tests as p → 0, which corresponds to a rate of 1/(e ln 2) = 0.531, which is inferior to the rate of the constant tests-per-item and doubly-constant designs we looked at here.

The scheme of [29] works as follows:

1. The first stage uses a Dorfman-like design, where each item is tested once in a test of s 1 items. This parameter s 1 is chosen to be of the form s 1 = a r2 for positive integers a and r 2 . If a test is negative, all s 1 items can be definitively ruled as nondefective; if a test is positive, the s 1 items go through to stage 2.

2. The second stage deals with s 1 items from stage 2 using a doubly constant design, with r 2 tests per item and s 2 = a r2−1 items per test.

If a set of s 1 items contains no defective items, that is discovered in the first stage. If the set contains exactly one defective item, that is discovered in the second stage, albeit non-conservatively, without the benefit of a 'gold standard' individual test, as our schemes in this paper always have. If the set contains two or more defective items, it will typically not be possible to identify them (although one can get lucky), and further stages of testing will be neededwhereas the schemes in this paper are always guaranteed to complete in two stages.

The second stage of the scheme of Mutesa et al. does not, as we do here, use a random design subject to the tests-per-item and items-per-test constraints. Rather the parameters are chosen to allow a 'hypercube' design. The s 1 = a r2 items can be pictures as making up an r 2 -dimensional a × a × · · · × a hypercube. Then each of the T 2 = ar 2 tests represents an (r − 1)-dimensional slice of the hypercube. We don't believe this changes the mathematics in the n → ∞ limit compared to a fully randomised design like those we consider here, but it may have practical benefits in terms of the simplicity of running the tests in the laboratory and may have better performance at finite n, although this restricts the choices of parameters s 1 , r 2 , s 2 , which can no longer be arbitrary integers.

In order to see how good our conservative two-stage algorithms are, we will compare the number of tests they require to a theoretical lower bound (Theorem 6).

It will be convenient to start with a lower bound for usual non-conservative two-stage testing (Theorem 5), which may be of independent interest. We will then show how to adapt the argument to conservative two-stage testing.

Let us start by thinking about a lower bound on the number of tests necessary for usual two-stage testing.

Theorem 5. The expected number of tests required for two-stage testing is at least

Proof. Our goal is to bound the expected number T 2 of items that are not classified as DND or DD. A nondefective item fails to be classified DND if and only if it only appears in positive tests -that is, if for each test it is in, one of the other items is defective. A defective item fails to be classified DD if -but not only if -for each test it is in, one of the other items is defective. (It's not 'only if' because finding a DD requires one of its tests to contain solely definite nondefectives, but we are only seeking a bound.) Let us call an item hidden if every test it is in contains at least one other defective item. Then

where H i is the event that item i is hidden.

We seek a bound at least as good as individual testing T = n. Then without loss of generality we may assume there are no tests of weight w t = 1 in the first stage. If there is one, remove it and the item it tests; this leaves p the same, does not increase the error probability, and reduces the number of available tests per item.

It will be convenient to write x ti = 1 if item i is in test t, and x ti = 0 if it is not. With this notation, the probability that item i is hidden is bounded by

where q = 1 − p, due to a result of [2] . Note that 1 − q wt−1 is the probability of the event that i gets hidden in test t, and the bound (1) follows by applying the FKG inequality to these increasing events; see [2] for details. It will be useful later to write L(i) for the logarithm of the bound (1), so P(H i ) ≥ e L(i) , where

The expected number of hidden items is

We now use the arithmetic mean-geometric mean inequality in the form

to get the bound

We now need to bound term inside the exponential. By manipulations similar to those in [2] we have

as in the statement of the theorem. (We introduce the minus sign so that f (p) is positive.) Putting this back into (2), we get

Thus the total expected number of tests required is at least

To find the optimal value of T 1 , we differentiate this, to get

with optimum

and we are done.

We can now use the machinery of the previous result to prove a lower bound for conservative two-stage testing.

Theorem 6. For conservative two-stage group testing we have the following bounds:

(ln g(p) + 1); In the limit as p → 0, this gives an upper bound on the rate of ln 2 = 0.693.

Here, f is the same as in Theorem 5. It may be useful to note that Bound 2 dominates for p < 0.171, and Bound 3 dominates for 0.171 < p < 0.382.

Comparing these bounds with the results of our algorithms (see Figure 1 ), we see that testing with a doubly constant first stage is very close to optimal for all p, and extremely close to optimal for p ≤ 1/4. Comparing this result with the theorems from the previous section, we see that the constant tests-per-item and doubly constant designs achieve the optimal rate in the limit p → 0.

Proof. Bound 1 is a universal bound of Ungar [33] that applies to any group testing algorithm. It's left to prove Bounds 2 and 3.

The proof of the bound for conservative two-stage testing proceeds in a similar way to that of Theorem 5. There are two different ways we can count the number of items that require testing in the second stage. For Bound 2, we count every item that appears solely in positive tests -such an item is either defective or a hidden nondefective. For Bound 3, we count all the defective items, of which there are pn on average, plus all nondefective items that appear solely in positive tests.

For Bound 2, the probability a test of weight w is positive is 1 − q w , where q = 1 − p. We use the same argument as before -this time in less detail.

(For conservative two-stage testing we don't have to be so careful about ruling out individual tests in the first round: they can simply be moved into the second round.) The probability an item i appears in only positive tests is P(E i ) ≥ t:xti=1 (1 − q wt ), by the FKG inequality. Going through exactly the same argument, the expected number of items in only positive tests is

where g(p) = max w=2,3,...

giving an average number of tests ET ≥ T 1 + n exp −g(p) T 1 n .

Optimising T 1 the same way gives the final bound ET ≥= n 1 g(p)

(ln g(p) + 1) .

For Bound 3, we must test the average of pn defective items, plus the average of (1 − p)nP(H i ) hidden nondefectives; here (1 − p)n is the average number of nondefectives, and P(H i ) is the probability a given nondefective is hidden. We can use from before the bound P(H i ) ≥ exp −f (p) T 1 n .

ET ≥ T 1 + pn + (1 − p)n exp −f (p) T 1 n .

Optimising in the same way gives For the result as p → 0, we concentrate on Bound 2. In the maximum expression for g(p) we take w = (ln 2)/p, to get Comparing this with H(p) ∼ p log 2 (1/p), we see we have an upper bound of ln 2 on the rate.

Assessment of specimen pooling to conserve SARS-CoV-2 testing resources

Individual testing is optimal for nonadaptive group testing in the linear regime

Rates of adaptive group testing in the linear regime

Pooled testing to isolate infected individuals

Group testing algorithms: bounds and simulations

Pooled testing and its applications in the COVID-19 pandemic

Group testing: an information theory perspective. Foundations and Trends in Communications and Information Theory

Optimization of group size in pool testing strategy for SARS-CoV-2: A simple mathematical model

The capacity of adaptive group testing

Optimal non-adaptive probabilistic group testing in general sparsity regimes. Information and Inference: A

Pooled RNA extraction and PCR assay for efficient SARS-CoV-2 detection. medRxiv

Optimal two-stage algorithms for group testing problems

A note on double pooling tests

Non-adaptive probabilistic group testing with noisy measurements: near-optimal bounds with efficient algorithms

Noise-resilient group testing: limitations and constructions

Information-theoretic and algorithmic thresholds for group testing

The detection of defective members of large populations

Multi-stage group testing improves efficiency of large-scale COVID-19 screening

Estimating the number of infections and the impact of non-pharmaceutical interventions on COVID-19 in 11 European countries

Early detection of superspreaders by mass group pool testing can mitigate COVID-19 pandemic. medRxiv

Group testing against COVID-19

Boosting test-efficiency by pooled testing strategies for SARS-CoV-2. arXiv

A method for detecting all defective members in a population by group testing

Performance of group testing algorithms with near-constant tests per item

Trivial two-stage group testing for complexes using almost disjunct matrices

Analysis and applications of adaptive group testing methods for COVID-19. medRxiv

Group testing with random pools: optimal two-stage algorithms

A pooled testing strategy for identifying SARS-CoV-2 at low prevalence

Efficient high throughput SARS-CoV-2 testing to detect asymptomatic carriers. medRxiv

Evaluation of group testing for SARS-CoV-2 RNA. medRxiv

Performance bounds for group testing with doubly-regular designs

The cutoff point for group testing

Evaluation of COVID-19 RT-qPCR test in multi-sample pools. medRxiv

Asymptotic analysis of optimal nested grouptesting procedures

The author was supported in part by UKRI Research Grant EP/W000032/1.The author thanks David Ellis, Oliver Johnson, and Jonathan Scarlett for helpful discussions, and Mahdi Cheraghchi, Anthony Macula, and Ugo Vaccaro for useful pointers to relevant literature.