key: cord-0553739-aimraxlf
authors: Cuturi, Marco; Teboul, Olivier; Vert, Jean-Philippe
title: Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design
date: 2020-04-26
journal: nan
DOI: nan
sha: 182fb20107d83e7d10ccb716093011838b4bbd27
doc_id: 553739
cord_uid: aimraxlf

When test resources are scarce and infection prevalence is low, testing groups of individuals can be more efficient than testing individuals. This can be done by pooling individual samples (in groups, and only test those groups for the presence of a pathogen. The rationale is that if prevalence is low, many of these groups will ideally test negative, clearing all all individuals from such groups, whereas individuals appearing in (ideally few) positive groups will require further screening. Forming those groups in order to minimize testing costs while maintaining good detection is the goal of group testing algorithms. We propose a new framework to form such groups that takes into account various constraints of the testing environment, and which can easily incorporate individualized infection priors. Our solution solves a Bayesian sequential experimental design problem: Given previous group test results, we sample the posterior distribution of infection status vectors using sequential Monte Carlo samplers; these samples are then fed to an optimizer, which seeks to form groups that maximize an information gain if those future tests were to be known. To output marginal probabilities of infection, we use loopy belief propagation as a decoder. We show a significant empirical improvement over individualized tests in simulations: our G-MIMAX test procedure has an average specificity/sensitivity that significantly exceeds that of other baselines, including individual tests, as long as the disease prevalence $leq 5%$.

Singling out infected individuals in a population that has little immunity to a pathogen is of paramount importance to control the propagation of an epidemic. As a response to the ongoing COVID-19 epidemic, a recent study (Allen et al., 2020) recommends to scale up test capacities to more than 30 million tests a week, at an estimated cost of 100 billion US Dollars, all of this for the US alone. When tests are expensive and the base infection rate is low, an approach first pioneered by Dorfman (1943) consists in pooling individuals into groups (e.g. by pooling nasal swabs) and test only those pooled samples first (e.g. to detect traces of virus RNA in each pool). In a second stage, only those individuals that belonged to positive groups are tested individually. By properly tuning the size of groups, Dorfman showed that this two-stage group testing procedure can significantly reduce the number of tests needed to identify infected individuals. For these reasons, the Dorfman procedure is reportedly in use to test for SARC-CoV-2 infection (Yelin et al., 2020; Seifried and Ciesek, 2020) .

Since Dorfman's seminal work on medical testing, the field of group testing at large has significantly grown, and numerous strategies have been proposed and many applications considered, notably quality control (Sobel and Groll, 1959) , communications (Berger et al., 1984; Wolf, 1985) , molecular biology (Balding et al., 1996; Ngo and Du, 2000) , pattern matching (Macula and Popyack, 2004; Clifford et al., 2010) , database systems (Cormode and Muthukrishnan, 2005) , traitor tracing (Meerwald and Furon, 2011; Laarhoven, 2015) , or machine learning (Zhou et al., 2014) ; see (Aldridge et al., 2019) for a recent and exhaustive review.

Adaptive Group testing. Group testing strategies differ in how they design the groups to be tested, and what assumptions they make on the quality of the tests. They can be non-adaptive, when every group to be tested is designed before or without observing any test results, or adaptive, when the tests are performed in several stages, and when groups to be tested at the next stage can be chosen based on the results of the tests performed up to the current stage (Scarlett, 2019) . For example, Dorfman's test is a two-stage adaptive strategy. There exists a large body of work on adaptive and non-adaptive group testing in the noiseless setting, i.e., when we assume that a test is positive if and only if at least one individual in the group tested is infected . In that setting, adaptive strategies tend to have better theoretical guarantees and result in more practical algorithms than non-adaptive strategies (Scarlett and Cevher, 2016; Aldridge, 2017; Scarlett, 2019) . For example, Hwang (1972) proposed a multi-stage adaptive binary splitting algorithm, which achieves the information-theoretical asymptotic lower bound on the number of tests needed to identify all infected individuals when the population size increases and the proportion of infected individuals vanishes (Baldassini et al., 2013) . Additionally, it is known that non-adaptive designs can be suboptimal compared to adaptive strategies in some regimes (Agarwal et al., 2018) . Optimal two-stage adaptive algorithms are also well understood in the noiseless setting (Mézard and Toninelli, 2011; De Bonis et al., 2005) .

Noisy Group testing. With Covid-19 as a backdrop, where RT-PCR are known to be noisy (Wikramaratna et al., 2020; Yelin et al., 2020) , the noisy regime of group testing is far more relevant. As noise increases, results that are outputted by testing devices gradually depart from the theoretical combinatorial setting, rendering it far less relevant in practice. In the noisy regime, informationtheoretic limits of group testing are well understood (Malyutov, 1978; Malyutov and Mateev, 1980; Atia and Saligrama, 2012; Baldassini et al., 2013; Scarlett and Cevher, 2016; Aldridge, 2017) but most existing group testing strategies are non-adaptative (Malyutov and Mateev, 1980; Chan et al., 2011 Chan et al., , 2014 Scarlett and Cevher, 2018) , with the exception of Cai et al. (2013) ; Scarlett (2019) . The latter two algorithms have various optimality properties in an asymptotic regime, when the population size increases and the fraction of infected individuals vanishes; however, little is known about the quality of these methods in a non-asymptotic regime, with a small but non-vanishing proportion of infections in the population.

Our contributions. In this work, we depart from the standard asymptotic analysis and propose to phrase the problem of adaptive group design in the noisy regime as a sequential experimental design problem. Deciding on which groups to test next requires formulating a hypothesis on what are the most likely infectious states of individuals, given groups seen so far and prior infection beliefs.

Our first proposal is to use loopy belief propagation (Pearl, 1982) to recover an approximation of the marginal distribution of infection states from past group tests results. We follow in that sense work by (Sejdinovic and Johnson, 2010) with a slightly different setup as described in §A. We propose a baseline adaptive methods by combining that LBP approximation with the so-called informative dorfman (ID) method McMahan et al. (2012) .

Our second and most important contribution is to go beyond marginal approximations, and use more computationally demanding sequential Monte-Carlo (SMC) samplers (Del Moral et al., 2006) to sample from the binary space {0, 1} n of infection status vectors. We start from the overall approach proposed by (Schäfer and Chopin, 2013) in the context of sparse Bayesian regression, and consider alternative ways to define Metropolis-Hastings kernels. The main rationale for using posterior approximations is that they can be used to inform a Bayesian experimental design strategy (Chaloner and Verdinelli, 1995) , where groups at each stage are selected to maximize the expected utility of their test results, and where after each stage we update our belief about the joint status of all individuals from the tests' results using Bayes theorem. We consider different utility functions to guide the choice of groups, including minimizing the entropy of the posterior (maximizing mutual information or information gain), or maximizing the expected area under the ROC curve (AUC) when we rank individuals by their posterior marginal probability to be infected. Because of its favorable computational properties, we focus in our experiments on maximizing the mutual information of new tests, and we use a greedy forward/backward algorithm to do so. We call that algorithm G-MIMAX (Greedy Mutual Information MAXimization) . We show using a large number of simulations on a set of 70 patients infected with 2%, 5% and 10% prevalence rates that our approach converges much faster to accurate identification of infected individuals than several baselines, including random group mechanisms which are known to be optimal in an asymptotic regime, as well as individual tests.

Prior on infection. We consider a population of n individuals, who can be either infected or not. We model the infection status of the i-th individual by a binary random variable X i , where X i = 1 if that individual is infected and X i = 0 otherwise, and let X = (X 1 , . . . , X n ) ∈ {0, 1} n be the infection status of the whole population. We assume that a prior probability distribution P (0) for X is given, reflecting our prior knowledge about the infection status of the population. For example, we may model the infection status of individuals as independent random variables, each following a Bernoulli distribution X i ∼ B(q i ), where q i may be the same across all individuals and set to a base infection rate, or on the contrary specific to each individual, extrapolated for instance from, e.g., a questionnaire asking for risk factors ("were you in contact with a sick person in the past week?", "did you experience symptoms?", etc). In other words, under this model, the prior probability satisfies, for any x = (x 1 , . . . , x n ) ∈ {0, 1} n ,

Of course, more informed and non-independent priors (relating for instance two people in the same household) may be considered; as discussed in Section 4.3, we represent the prior P with a set of particles in {0, 1} n , giving us the flexibility to consider any sort of initial prior.

Group vs. individual testing in the presence of testing noise. Our goal is to infer which individuals are infected and which ones are not. For that purpose, we can run tests, which give us information about the status of one or several individuals. More precisely, we focus on group tests, which for any given group g ⊂ {1, . . . , n} give us information about the binary status of the group: either none of the individuals in the group is infected, in which case the group status is negative, or at least one individual is infected, in which case the group status is positive. A straightforward approach to test the whole population would be to test each individual one-by-one, however, this raises two questions: (i) this requires a total of n tests, which may be costly if n is large, and may not be necessary if only a small percentage of the population is infected; (ii) tests can be noisy (e.g., nose and throat swab tests by qRT-PCR can create false positives and false negatives), so by testing only once each individual there is a risk of error which may not be acceptable. Our goal in this paper is to propose solutions to both problems, by designing group tests that allow to control the errors made on each individual, while minimizing the number of tests needed.

Probabilistic inference using group tests. Given an integer n, we write n for the set of integers {1, . . . , n}. We denote by G the set of all non-empty groups, i.e., non-empty subsets of n . With a slight overload of notations that should be clear from context, we equivalently represent a group g ∈ G as a binary vector g ∈ {0, 1} n , where the i-th element of g is 1 if and only if i ∈ g. We use the notation g = g T 1 n to denote the size of group g (therefore if we use indexed groups g i , g i stands for the size of g i ). Given two binary vectors g, x in {0, 1} n we write

the binary status of group g given the binary status of individuals x. In other words, [g, x] ∈ {0, 1} is equal to 0 if and only if all entries in x indexed by g are equal to 0. For any distribution P over {0, 1} n , we further denote

Given a group g ∈ G, a group test associated to g is a binary random variable Y g to assess if at least an individual in g is infected, i.e., to assess the group status [g, X] . We assume that conditioned on X, all group tests considered are independent from each other, and that each group test follows a Bernoulli law that depends on the status of the group tested, and of the specificity and sensitivity of the test. More precisely, a group test of size g with sensitivity s g and specificity σ g is such that

Put differently, we can write for any x ∈ {0, 1} n and y ∈ {0, 1}:

where ρ g := s g + σ g − 1. Alternatively, using the logit function (u) = log u 1−u , we can rewrite this probability in terms of σg = (σ g ), sg = (s g ) and γ = log(1 − s g ) − log(σ g ) as follows:

We assume that tests can be run simultaneously in parallel, and will therefore not only consider the outcome of one group Y g , but rather the outcome of k groups G = (g 1 , . . . , g k ) ∈ G k . We denote by G * = ∪ i≥1 G i the set of all possible non-empty batches of groups. We equivalently represent a batch of k groups as a matrix in {0, 1} n×k , and we denote, for any x ∈ {0, 1} n , the vector of group status for all groups in the batch as

and naturally introduce Y G = (Y g1 , . . . , Y g k ) for the random vector of test results in the batch, taking values in {0, 1} k . Note that a group g may appear several times in G, if for example we want to test several times the same group. In that case, with a slight abuse of notations, we use the same notation Y g for potentially several independent random variables corresponding to the results of testing g several times. Since we assume that the test results of different groups are independent from each other given X, the entries of Y G are independent Bernoulli variable of parameter s g or 1 − σ g , depending of whether the corresponding entries in [G, x] are 1 or 0. In other words, we can write compactly the law of Y G given X as follows, using the individual test likelihoods (5):

where ⊗ and stand for element-wise exponentiation and multiplication, respectively. Alternatively, when k is very large and precision becomes an issue, one may consider instead log-likelihoods (6) to obtain

x] − σ u + log(σ)) , (8) where log(σ), σ , s and γ are k-dimensional vectors with respective entries log(σ gi ), σg i , sg i and γ gi , for i = 1, . . . , k. In our experiments, the number of new tests at every round k is of the order of 8, and therefore evaluating (7) directly poses no underflow/overflow challenge as long as the specificity and sensitivity are not too close to 1.

In T -stage adaptive group design, for a given T ∈ N, we iteratively select batches of groups G (1) , . . . , G (T ) ∈ G * . After each stage i, we observe the results of the tests Y G (i) , and the choice of the next batch G (i+1) can depend on the results of all previous tests (

After observing the test results at the i-th stage, we denote by P (i) the posterior distribution given all previous observations, i.e., for any event A,

In particular, P (0) is the prior distribution P that reflects our prior assumption about the population and the quality of the tests.

We phrase the problem of adaptive group design as a myopic sequential Bayesian experimental problem (Chaloner and Verdinelli, 1995) , where at each stage we try to greedily select the batch of groups which largest utility given our current belief:

where G (i) ⊂ G * is the set of batches allowed at stage i (which, e.g., may be constrained in terms of number of groups, group size, or possibility or not to test each individual). The utility U (G, P (i) ) of a batch of groups G given our current belief P (i) is itself defined as the expected value, using the current P (i) , of a function φ of the posterior probability P (i+1) that would include test results that would involve pools in G:

The choice of φ is dictated by our final goal and cost function, e.g., identifying as many infected people as possible with a given specificity; ranking individuals from the less likely to the most likely to be infected, or more generally reducing as much as possible the uncertainty about the status of the population in the posterior distribution for further Bayesian analysis. We now detail and derive algorithms for two particular classes of utility functions, (i) the negative entropy of the posterior and (ii) performance of a predictor of individual infections based on the marginal probabilities to be infected.

A standard measure of uncertainty for a random variable is its entropy. To reduce the uncertainty over the population status, we can therefore take for φ the negative entropy function:

where for any pair of random variables X, Y with joint distribution P, I P is their mutual information (MI) (Cover and Thomas, 1990) :

In other words, the utility of a group G is, up to an additive constant, equal to the MI under P (i) between X and Y G . Since the additive constant is the same for all groups, we can as well just consider the MI term to define the utility of a group:

The MI is a standard utility function in Bayesian experimental design (Lindley, 1956; Chaloner and Verdinelli, 1995) . In our particular setting, the evaluation of U MI is facilitated by the following formulation: Lemma 1. For any batch of groups G = (g 1 , . . . , g k ) ∈ G * , and any joint distribution between X and Y G ,

where

is the binary entropy, and for any group g, h σg = h(σ g ), h sg = h(s g ), and γ g = h sg − h σg . In the case of a single group g ∈ G, this reduces to

Proof. Let us start with a single group g ∈ G. From (13) we use the fact that I P (X; Y g ) can also be written as

and compute each term in turn. H P (Y g ) can be computed easily from the law of X since, by (5),

For the second term, we notice that, conditionally to X = x, Y g is a Bernoulli random variable whose expectation only depends on [g, x] , which itself can only take two values 0 and 1. By (4) we deduce:

which we can summarize as

We deduce that

Plugging (18) and (20) into (17) gives (16). Moving now to the case of a batch G = (g 1 , . . . , g k ) ∈ G * , we use the fact that the entries of Y G are independent from each other given X to write, for any x ∈ {0, 1} n , and using (19),

As a result,

Plugging this equation into (13) gives (15).

A second class of utility functions assess directly the performance of a predictor of infection, based on the estimated probability that each individual is infected. While quality of a predictor is correlated with the certainty we have in the state of the population, our framework gives the flexibility to optimize directly a specific measure of performance of the predictor, such as the AUC, instead of the negative entropy of the posterior. For that purpose, we consider a quality measure ψ : R n × {0, 1} n to quantify by ψ(s, x) how good the prediction of infections based on the individual scores s ∈ R n is when the true infection status of the population is x ∈ {0, 1} n . Examples of measures include accuracy, specificity, sensitivity or AUC:

where TP(s,

are respectively the number of true positives and true negatives when we predict an individual i as infected when

(1 − x i ) are the total number of infected and non-infected individuals, respectively.

Given a distribution P X over the state of the population, let µ i (P X ) = P X (X i = 1) = E P X X i be the probability that the i-th individual is infected is, and µ(P X ) = (µ 1 (P X ), . . . , µ i (P n )) the vector of marginal probabilities of infection for individuals in the population. For each of the measures ψ(s, x) of prediction quality in (21), we can define a corresponding function φ(P X ) where P X is a distribution on {0, 1} n as φ(P X ) = E P X ψ(µ(P X ), X) , and the corresponding utility function from (11).

In order to implement the sequential Bayesian experimental design procedure described in Section 3, we now describe in more details the algorithmic components to compute the utility of a batch of groups (11), find a batch that maximizes the utility (10), and maintain a computationally tractable description of the posterior distribution after each stage (9).

For any functional φ, Algorithm 1 estimates the utility φ(G, P) of a batch G of groups given a distribution P over {0, 1} n . We assume that the distribution P is provided to the function as a tuple of N pairs

All weights taken together form a probability vector in the N simplex ω ∈ Σ N , whereas all particles taken together form a matrix X. If the population size n is not too large, then we can take N = 2 n to cover the full space {0, 1} n ; however, when n is larger than a few tens, we suggest instead to encode the true posterior after each stage with N 2 n particles using a sequential Monte Carlo approach described in Section 4.3.

The space complexity of Algorithm 1 is O(N × max(2 k , n)), where n is the number of individuals, k is the number of groups, and N is the size of the support of the distribution P(X), e.g. the number of particles. The time complexity is dictated by line 6, where we repeat 2 k times a call to the utility function φ(P) where P has a support of size N in {0, 1} n . If this operation has complexity C(N, n), then the time complexity of Algorithm 1 is O(2 k C(N, n)). For example, the entropy utility φ Ent has C(N, n) = O(N ), while for utilities based on marginals we need to compute the n marginals first in O(N ) operations each, hence C(N, n) = O(N n) to compute marginals.

Algorithm 1: Compute utility of a set of groups 

When the utility function is the mutual information (Section 3.1), we can bypass a few steps in the algorithm. Indeed, instead of using Algorithm 1 with the utility function

, and directly evaluate the first term from the vector D and the second term from the matrix A, using the fact that since Y G is a product distribution conditioned to X we have

The resulting algorithm is shown in Algorithm 2. Compared to using Algorithm 1 with φ = φ Ent , the computation of F in O(N × 2 k ) operations to compute 2 k entropies over a space of cardinality N in Algorithm 1, line 6, is replaced by the computation of H 2 in O(N × k) (Algorithm 2, line 2) and of H 1 in O(2 k ) to compute a single entropy over a space of cardinality 2 k (Algorithm 2, line 7).

Algorithm 2: Compute MI utility of a set of groups

1} n×k a set of groups; σ, s ∈ [0, 1] n the specificities and sensitivities of the test for each group in G. Output: The MI utility of the groups U

Given a function that computes the utility of any candidate batch of groups, as described in Section 4.1, the question of finding a batch that maximizes the utility (10) is a complex discrete optimization problem. Any standard algorithm for discrete optimization can in principle be used to find suboptimal solutions, such as greedy forward/backward optimization, simulated annealing or genetic algorithms. We choose to use a greedy approach to maximize the mutual information utility, subject to the constraint that each group should have at most n max individuals, and that the batch should contain any number m ≤ k of groups. Simply put, we greedily create groups one by one, until we have m groups. Once we have created groups G j = (g 1 , . . . , g j ), we create a new group g j+1 by starting from an empty group g = ∅ (line 3) and growing iteratively the group by selecting the individual that adds the most mutual information g ← g ∪ {i} where i ∈ arg max u I P (X ; Y (G j ,g∪{u}) ), until either we stop making progress in terms of mutual information, or when the group has already reached size n max . We consider additionally a variant in which we do not only consider greedy addition of individuals to form a group, but also removal, resulting in Forward-Backward iterations. Algorithm 3 describes an efficient way to carry out such forward passes more efficiently than by evaluating repeatedly Algorithm 2, because it leverages the fact that G is built sequentially, column by column. We omit the backward pass which only consists in changing line 5 (by setting g ω = 1 instead) and removing (rather than adding) ι u * from g in line 16. At each loop index in i (line 4), having a number F of forward passes and B of backward passes, with F > B, requires executing the body of the loop (lines 5 to 14) F times in forward mode, and B times in backward mode.

We use the following notations in Algorithm 3: small letters denote constants, small bold letters denote vectors, bold capital letters are matrices and bold greek letters are 3D tensors.

Once a new batch of groups G (t) is selected at the t-th stage of the campaign, we observe the result of the test Y G (t) ∈ {0, 1} k and must update the posterior (9):

A naive approach to store and update P (t) , and one that is in fact tractable up to n ≈ 25 patients, is to consider all possible 2 n infection status vectors, and update the probability of each state x ∈ {0, 1} n Algorithm 3: G-MIMAX: Optimize MI of m prospective group tests with greedy search

Number of m groups to add, n max upperbound on group size

// initialize group, objective, positive in group across particles indicator 4 for i ← 1 to n max do 5 ι ← (w ∈ n : g w = 0), r ← |ι| // indices that can be added

// probability tensor across all possible candidate groups × particles × 2 j hypothetical test results across j groups. 

.

At n = 25, this results in enumerating about 33 million possible states, that is as many binary numbers of length 25. A more reasonable approach, but one which would lead to a minor decrease would be to consider states with up to a certain proportion of infected (e.g. 20%) but with n+1 n/5 possible states remains still intractable for moderate n.

Instead of the naive approach, we propose two strategies to approximate the posterior P (t) (X):

• A cheap approximation of the posterior that only estimates the marginal posterior distribution of each state. This approximation is particularly relevant to estimate and optimize utility functions based on marginal posteriors (Section 3.2). We use a loopy belief propagation algorithm for that purpose (Pearl, 1982) , see Appendix A for details on the implementation.

• A more expensive sequential Monte Carlo sampler (SMC-S) (Del Moral et al., 2006) using the likelihood ratio above, sampling from the unnormalized numerator at the bottom of (22) above, from scratch after each new test, or by using the update likelihood above with particles sampled at the previous step. Based on preliminary tests, we choose to resample at every iteration. Building on the overall approach outlined by Schäfer and Chopin (2013), we consider a few variants for the MH kernel, which all seem to provide similar results as shown in Figure 3 . We consider classical Gibbs sampling Geman and Geman (1984) by looping cyclically across all n components of all particle (4 times by default at each kernel application); a slightly modified Metropolis-Hastings Gibbs kernel introduced by Liu (1996) for discrete spaces resulting in larger acceptance rate; the approach described in Schäfer and Chopin (2013) that uses a global kernel estimated from sparse logistic regression. Although our experiments show that the latter approach is faster than Gibbs sampling, we obtain with our parameter settings a slightly lower performance in terms of specificity/sensitivity in our simulations. Pairing it with (local) Gibbs sampling recovers a better performance, with a slightly faster sampling compared to Gibbs. In practice all three methods perform similarly on our final task.

It is important to note that these two approaches are fundamentally different, in the sense that LBP only produces a single vector of marginals µ ∈ [0, 1] n , whereas the SMC-S approach produces a cloud of weighted particles (ω i , x i ) ∈ [0, 1] × {0, 1} n for i = 1, . . . , N to encode the approximation

While the former can be used to quickly assess the status of an individual, the latter is more relevant to optimize for groups:

• We use the LBP approximation to evaluate (decode), at any point in the testing campaign, the marginal probability of individuals. This allows to monitor the performance of group testing strategies even though the testing campaign has not fully ended, as was proposed by Sejdinovic and Johnson (2010) . We also propose to use that marginal distribution to define a simple adaptive rule to select groups, building on the informative Dorfman (ID) procedure (McMahan et al., 2012) . We propose a modified ID procedure: instead of using an individualized infection prior known beforehand to define groups using ID, we use the LBP marginal at every iteration to re-run the ID selection procedure.

• We use the SMC-S approximation of the posterior to inform our adaptive experimental design, e.g. feed directly the results from the SMC-S to compute and optimize with a greedy approach the MIMAX criterion as in Alg. 3.

All of the components we have described (posterior approximation, group optimization) can be pieced together to define an overall group testing strategy. We call those policies, sequential algorithms that propose groups to test using the knowledge of previous tests.

A group testing policy is an algorithm that can design iteratively, given prior beliefs and observed test results, what groups to form next. A policy is therefore a sequence of algorithms that can take previous tests into consideration and output new groups. We call each of these algorithms a group selector. We consider in this work various policies, all designed as predefined sequences of group selectors (although each group selector can be adaptive, the policy defines a sequence of group selectors that is not). All the policies we consider require having beforehand a prior on infection status. Dorfman policies can only use a prior infection rateq that is shared across all individuals. In the simulations we run in this paper we have assumed no mis-specification, namely the infection rate used in the simulator is the same as that assumed by the policy,q = q, and equivalently for specificity and sensitivity. Recall that group selectors must comply with the constraint that the size of their groups be smaller than a number n max . We start by describing the group selectors, and list policies that combine them.

We start with selectors that require no knowledge other than the base infection rate.

. This is the first stage of Dorfman tests, splitting all n patients into subgroups of size roughly equal to min n max , 1 + 1 √ q at the first stage.

Split only positives (SplitPos). This is the second stage of Dorfman tests, with a focus exclusively on those groups that returned positive in the first wave. (SplitPos:0) tests individually all samples that have appeared in a positive group; (SplitPos:2) uses Hwang's hierarchical approach (1972) to split all positive groups into two subgroups of roughly equal sizes, to subsequently tests those smaller groups, and then start again by dividing those positive subgroups. At each stage, all positive groups that have been flagged as positive previously are split until only individuals are tested. Mézard-Toninelli Splitting (MT) . This approach selects randomly groups of the same size g across all n possible patients. Given a prior q, the group is chosen to get an acceptance probability of 1/2. In our setting, this translates to the requirement that the group size g satisfies

This identity follows from the fact setting

leads to a group size g as given above, with the added constraint on maximal group sizes. Mézard and Toninelli (2011) have proved that in the absence of noise this choice is asymptotically optimal.

Origami fixed design (OM3). We also consider predefined groups, as enumerated in the Origami M3 assay matrix (Kainkaryam and Woolf, 2008) . This design is a binary matrix G of size 70 × 22, with 22 tests of groups whose size is roughly 10. This matrix was proposed with a deterministic decoder that operates assuming up to ≈ 5% individuals are infected, without taking into account noisy tests, as is usual in the combinatorial group testing literature. We therefore expect that assay to be the most useful for infection rates that are below 5%.

Modified Informative Dorfman (MID). Given an individualized prior infection probability, q 1 , . . . , q n , Hwang (1975) proposed an optimal rule that proceeds by grouping individuals by increasing infection probability with groups of decreasing size. McMahan et al. (2012) proposed the informative Dorfman rule as a generalization of that problem that can adapt to a noisy setting. The rule proceeds by sorting patients by increasing marginal infection probability, and group them with groups that are initially large (to clear large subsets of unlikely infected patients) to small (to test individuals likelier to be infected in smaller groups). More precisely, given a sorted list of individuals with increasing infection probability q 1 , . . . , q n , (McMahan et al., 2012) propose in their pool specific optimal Dorfman (PSOD) algorithm to group together the first c * individuals, where c * is defined as

remove them from the queue and proceed until all individuals are grouped. We modify this rule as follows: in the absence of an individual prior, we recover results from a first wave of tests to produce an LBP marginal approximation that we plug in the rule. We additionally constrain c to be smaller than n max . Because the informative Dorfman procedure was designed as a single stage procedure, with no consideration of testing capacity per unit of time, we modify it to fit our sequential scheme as follows: (i) we discard individuals with marginal probabilities smaller and larger than 0.1% and 90% respectively. we sort them and use the efficiency criterion to form groups that cannot be larger than n max . as in the R package binGroup (Bilder et al., 2010) . (ii) Because the number of groups resulting from the informative Dorfman rule is random and likelier to be bigger than the maximal budget k, we propose to subsample k of them randomly at each cycle, in order to be able to propose more informed tests at the next cycle.

Greedy Maximization of Mutual Information (G-MIMAX). We optimize the mutual information utility using k groups and samples produced from a SMC Sampler, as described in Algorithm 3. By fedault the number of forward iterations is F = 1, backward is B = 0. We do however observe in simulations a small increase in performance using F = 4 and B = 3.

We consider the following policies, here all composed by one or at most two group selectors.

• Dorfman: starts with Dorfman splitting first (D) followed by (SplitPos:0) • Binary Dorfman: starts with Dorfman splitting first (D) followed by (SplitPos:2) applied as many times as needed. • MT : generates random groups at each stage with the (MT) selector using the optimal size outlined in (23). For both q = 5% and 10%, that size is above the limit of n max = 10, and we therefore stick to 10. For 10 = 2% the rule results in groups of size 7. • OM3-MID: uses (OM3) for 22 tests, and then switches to (MID) using the LBP marginal.

• G-MIMAX: uses (G-MIMAX) on a sample from the prior first, and then from the posterior distribution using a SMC sampler.

Given a batch of n patients to be tested, we assume that testing capacities allow for k tests per testing cycle, carried out in parallel. For instance, in RT-PCR testing, what we call a testing cycle (not to be confused with the thermal cycles running in PCR) would require from 1 to 4 hours, and the machine would allow for k simultaneous tests. If the population is much larger and testing capacity K per cycle much bigger, a simple way to adapt our procedure would be to split the population in B batches of n patients, to allocate a testing capacity for each batch equal to k = K/B. We assume for simplicity that the testing machine has constant specificity σ and sensitivity s that is independent of the group size, but cap the maximal group size to a number n max . Since our simulator can account for varying specificity/sensitivity per group size, we leave simulations in that setting for future work.

Our simulations run for a predefined number T of test cycles, during which we can carry out up to k tests per test cycle. We consider settings where T k < n. The testing simulator is described in Algorithm 4 using the following notations: (A) is the number of lines of a matrix A, A :i for the first i lines of a matrix, and A i: for the matrix A stripped of those i first lines. It essentially follows by sampling a patient infection status and then, through an iterative selection of tests reach a marginal distribution whose AUC is compared to the sampled ground truth labels.

We use the following parameters in our simulation.

• Number of patients n = 70.

• Constant specificity σ = 97% and sensitivity s = 85%.

• Base infection rates of q = 2%, 5%, 10%

• Two different testing environments:

1. We consider a PCR machine that can allocate a budget of k = 8 tests per cycle, for that pool of 70 patients. We track the performance over at most T = 5 test cycles (equivalent to a full day if these cycles took 5 hours). We consider a maximal group size n max of 10. In that setting, most methods we consider can generate a varying number of tests per cycle, and can therefore saturate the test budget at will. Only Dorfman methods may, on the contrary, use less than 8 tests at some cycles, and even possibly terminate (notably for low infection rates) before 5 cycles. 2. In order to study Dorfman baselines in a setting that is more favorable, we also consider 1 test cycles, where T = 24 and only one test, k = 1, is generated at a time, for an infection rate of 5%. In that setting Dorfman methods only use pools of size 6 or less, which is why we also cap n max to 6% for all methods. This setting is of course very costly for posterior based methods, since it requires (re)sampling the posterior at every iteration new test.

Input: Prior P 0 on patients infection state space {0, 1} n Test(G) returns (G) noisy tests σ, s ∈ [0, 1] 2 by pooling individuals according to G Policy(t, m, y t , G t , ) algorithm that outputs at iteration t up to m new groups according to past test results y t /G t , possibly informed by posterior particles or marginal approximation . Sampler(y t , G t , P 0 ) produces N posterior samples given test results and prior MarginalSampler(y t , G t , P 0 ) produces only approximate marginal distribution. Output: Pairs of ground-truth vectors sampled from prior and marginal predictions of infection. 1 x truth ∼ P 0 // sample one ground truth status 2 ω 0 = 1 N /N, X 0 ∼ P ⊗N 0 // sample N particles from prior 3 G totest ← 0 n×0 , G 0 ← 0 n×0 , y 0 ← 0 0 // initialize groups and test results 4 for t ← 1 to T do 5 if (G totest ) < k then // check enough groups to test (ω t , X t ) ← Sampler(y t , G t , P 0 ) // sample particles using test results 14x t ← MarginalSampler(y t , G t , P 0 ) // recompute marginal using tests 15 a t ← (x truth ,x t ) // store current belief at cycle t 16 return a

We run 5,000 simulations for each policy, by sampling an infection status vector from the prior. Each of these simulations produces a ground truth infection status vector and a sequence of T marginal probability distributions, generated using LBP by observing the sequence of group tests observed up to cycle T .

In Figures 1 and 4 we apply the same threshold on the LBP marginals of all simulations of a given policy, to decide for each simulations which individuals are classified as positive/negative. We record the resulting sensitivity/specificity for each simulation at that threshold and average these values over 5,000 simulations (for those ground-truth samples where only negatives are sampled, which can notably happen with a base rate of 2%, the sensitivity cannot be evaluated and those simulations are therefore only used to record specificity). Using the same threshold, rather than the AUC ranking based loss that we used in an earlier version of this work, results in a more realistic assessment of these methods: If these testing policies where to be deployed, one would need to decide beforehand on a single threshold to decide on the infection status of individuals, not on a ranking. We measure the progression of the average specificity/sensitivity of the policy as a function of the number of test cycles carried out.

In Figures 2 and 3 we plot the average specificity/sensitivity obtained by varying that threshold (recovering a curve of average sensitivity/specificity levels using all experiments with, again, the same threshold), and provide two plots as a function of the number of tests (24 and 32 for low prevalence, 32 and 40 for higher prevalence). Figure 3 illustrates the robustness of the SMC sampler to the kernel that is used, as seen through the lens of the performance of G-MIMAX.

Our goal in this work is to maximize the efficiency of group testing in the noisy setting in which tests can be flipped at random. By relying on a particle representation of the posterior, we are able to formulate the problem of designing groups iteratively as a combinatorial problem, solved with a greedy algorithm. We have benchmarked our approach against the celebrated Dorfman procedure and show a substantial improvement in performance. Our results for the more complex (1) in §6.2, to plot average specificity/sensitivity over 5,000 simulations for each of the policies we consider in three base infection settings. Those two numbers are recovered by computing first, given past group test results, an LBP approximation of the marginal distribution of infection of each patient and thresholding it at certain level (5, 10 and 15% respectively for 2, 5 and 10% base infection rates) to make a binary decision. These thresholds are arbitrary, and follow from examining Fig.2 . We report these average specificity/sensitivity as the number of tests that is effectively carried out grows. The specificity/sensitivity of individual tests (requiring therefore 70 tests) is plotted in red, and can be attained or improved upon in most settings by G-MIMAX with nearly half the number of tests. Here G-MIMAX uses a Gibbs sampler using Liu's modification with 10,000 particles and the overall algorithm setting described in (Schäfer and Chopin, 2013) . Figure 1 we report average specificity/sensitivity for various policies, but use instead a single number of tests for each plot and vary the threshold used to make decisions. We therefore obtain a global specificity/sensitivity curve for all 5,000 experiments using a common set of thresholds. Figure 4 : Progression of average specificity/sensitivity using the second setup in §6.2 where each testing cycle consists in only one test. Note that both Dorfman and Binary Dorfman have an identical initial trajectory because they share the first 12 initial uniform split (70 into groups of 6 or 5 individuals).

MI approach suggests a fertile ground to improve our algorithms along several directions: quality of posterior sampling, consideration of more complex utility functions φ, and additional efforts on the combinatorial solver tasked to produce groups out of posterior samples. Since our method currently scales exponentially with the number k of groups that need to be designed (which we have equated in this work to the number of tests available per round for that pool of patients), an extension of our work that carries out resampling at each group optimization iteration might be required for longer horizons.

A standard way to compute an approximation of the posterior marginals is to run loopy belief propagation (LBP) until convergence. Here we detail the LBP equations for our setting. Given n individuals and m tests performed with groups g i , . . . , g m ⊂ [1, n], LBP alternates passing messages µ i→j = (µ i→j (0), µ i→j (1)) ∈ R 2 from individuals i ∈ [1, n] to groups j ∈ [1, m] with i ∈ g j , and µ j→i = (μ j→i (0),μ j→i (1)) ∈ R 2 from groups j with i ∈ g j to individuals i, respectively.

Adding a superscript (t) to clarify the messages sent at the t-th iteration of LBP, the messages from an individual i ∈ [1, n] to a group j ∈ [1, m] with i ∈ g j follow the standard equations:

The messages from a group j ∈ [1, m] to an individual i ∈ [1, n] with i ∈ g j depend on the result of the test Y gj : if Y gj = 0 (negative test), then Furthermore, let us make the change of variables, for any (i, j, t) ∈ [1, n] × [1, m] × N,

Then (24) can be rewritten as:

Similarly, denotingᾱ

we can rewrite (25) and (26) 

and if Y gj = 1,

After convergence of the messages (denoted as t = ∞), we estimate the posterior marginal of the i-th individual as

that is, P LBP (D i = 1 | Y g1 , . . . , Y gm ) = 1 1 + e

Novel impossibility results for group-testing

The capacity of Bernoulli nonadaptive group testing

Group testing: an information theory perspective. Foundations and Trends R in Communications and Information Theory

National covid-19 testing action plan, pragmatic steps to reopen our workplaces and our communities. pages 1-30. The Rockefeller Foundation

Boolean compressed sensing and noisy group testing

The capacity of adaptive group testing

A comparative survey of nonadaptive pooling designs

Random multiple-access communication and group testing

bingroup: a package for group testing

Grotesque: Noisy group testing (quick and efficient)

Bayesian experimental design: a review

Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms

Non-adaptive group testing: Explicit bounds and novel algorithms

Pattern matching with don't cares and few errors

What's hot and what's not: tracking most frequent items dynamically

Optimal two-stage algorithms for group testing problems

Sequential monte carlo samplers

The detection of defective members of large populations

Combinatorial group testing and its applications

Stochastic relaxation, gibbs distributions, and the bayesian restoration of images

A method for detecting all defective members in a population by group testing

A generalized binomial group testing problem

poolhits: A shifted transversal design based pooling strategy for high-throughput drug screening

Asymptotics of fingerprinting and group testing: Tight bounds from channel capacities

On a measure of the information provided by an experiment

Peskun's theorem and a modified discrete-state gibbs sampler

The separating property of random matrices. Mathematical notes of the Academy of Sciences of the USSR

Planning of screening experiments for a nonsymmetric response function

Informative dorfman screening

Group testing meets traitor tracing

Group testing with random pools: Optimal two-stage algorithms

A survey on combinatorial group testing algorithms with applications to dna library screening

Reverend Bayes on inference engines: A distributed hierarchical approach

Noisy adaptive group testing: Bounds and algorithms

Phase transitions in group testing

Near-optimal noisy group testing via separate decoding of items

Sequential monte carlo on large binary sampling spaces

Pool testing of sars-cov-2 samples increases test capacity

Note on noisy group testing: Asymptotic bounds and belief propagation reconstruction

Group testing to eliminate efficiently all defectives in a binomial sample

Estimating false-negative detection rate of sars-cov-2 by rt-pcr. medRxiv

Born again group testing: Multiaccess communications

Evaluation of COVID-19 RT-qPCR test in multi-sample pools. medRxiv

Parallel feature selection inspired by group testing