key: cord-0680424-qkyw45ox
authors: Srinivasavaradhan, Sundara Rajan; Nikolopoulos, Pavlos; Fragouli, Christina; Diggavi, Suhas
title: Dynamic group testing to control and monitor disease progression in a population
date: 2021-06-20
journal: nan
DOI: nan
sha: 869490fa6f2220c7b7868de1bb5d2ef7debe7b89
doc_id: 680424
cord_uid: qkyw45ox

In the context of a pandemic like COVID-19, and until most people are vaccinated, proactive testing and interventions have been proved to be the only means to contain the disease spread. Recent academic work has offered significant evidence in this regard, but a critical question is still open: Can we accurately identify all new infections that happen every day, without this being forbiddingly expensive, i.e., using only a fraction of the tests needed to test everyone everyday (complete testing)? Group testing offers a powerful toolset for minimizing the number of tests, but it does not account for the time dynamics behind the infections. Moreover, it typically assumes that people are infected independently, while infections are governed by community spread. Epidemiology, on the other hand, does explore time dynamics and community correlations through the well-established continuous-time SIR stochastic network model, but the standard model does not incorporate discrete-time testing and interventions. In this paper, we introduce a"discrete-time SIR stochastic block model"that also allows for group testing and interventions on a daily basis. Our model can be regarded as a discrete version of the continuous-time SIR stochastic network model over a specific type of weighted graph that captures the underlying community structure. We analyze that model w.r.t. the minimum number of group tests needed everyday to identify all infections with vanishing error probability. We find that one can leverage the knowledge of the community and the model to inform nonadaptive group testing algorithms that are order-optimal, and therefore achieve the same performance as complete testing using a much smaller number of tests.

Abstract-In the context of a pandemic like COVID-19, and until most people are vaccinated, proactive testing and interventions have been proved to be the only means to contain the disease spread. Recent academic work has offered significant evidence in this regard, but a critical question is still open: Can we accurately identify all new infections that happen every day, without this being forbiddingly expensive, i.e., using only a fraction of the tests needed to test everyone everyday (complete testing)?

Group testing offers a powerful toolset for minimizing the number of tests, but it does not account for the time dynamics behind the infections. Moreover, it typically assumes that people are infected independently, while infections are governed by community spread. Epidemiology, on the other hand, does explore time dynamics and community correlations through the wellestablished continuous-time SIR stochastic network model, but the standard model does not incorporate discrete-time testing and interventions.

In this paper, we introduce a "discrete-time SIR stochastic block model" that also allows for group testing and interventions on a daily basis. Our model can be regarded as a discrete version of the continuous-time SIR stochastic network model over a specific type of weighted graph that captures the underlying community structure. We analyze that model w.r.t. the minimum number of group tests needed everyday to identify all infections with vanishing error probability. We find that one can leverage the knowledge of the community and the model to inform nonadaptive group testing algorithms that are order-optimal, and therefore achieve the same performance as complete testing using a much smaller number of tests.

Index Terms-Dynamic group testing, SIR stochastic network model, COVID-19 testing

COVID-19 has revealed the key role of accurate epidemiological models and testing in the fight against pandemics [1] - [6] . For any new disease or variant of the existing ones, we will always need to fast develop strategies that allow efficient testing of populations and empower targeted interventions. This poses several daunting challenges: (i) we need to test populations not just once but in a continual manner (on a daily basis), and (ii) we need to estimate the epidemic state of each individual and isolate only the infected ones. And all these, under the accuracy and cost limitations imposed by the various types of tests.

Recent works have identified the significance of proactive testing and individual-level intervention for the control of the disease spread (e.g. [7] - [9] ), but to the best of our knowledge none of them addresses the challenges above efficiently. Most solutions rely on the idea of "testing everyone individually", which is inefficient for two reasons: on one hand, using cheap rapid testing usually results in many people (false positives) ending up in isolation without reason and at non-negligible societal cost; on the other hand, using accurate tests like PCR can be forbiddingly expensive. As a result, these works need to either neglect the cost of the former or alleviate the cost of the latter by scheduling tests on a (bi)weekly or monthly basis.

Therefore, a critical question is still open: can we use accurate/expensive tests more efficiently? In other words, can we identify all new infections that happen each day (complete testing performance), using significantly fewer accurate/expensive tests than complete testing? Complete accurate testing (e.g. PCR) on a daily basis and isolation of infected individuals can significantly reduce the number of infected people, as the example in Fig. 1 illustrates. Note that even with complete testing new infections still occur due to the delay between testing and receiving the test results ( Fig. 1 assumes the usual delay of one day). Still, this is the best performance we can hope for, both in terms of containing infection and alleviating the societal impact of "false" quarantines; we thus ask how many tests do we really need to replicate it.

Traditional group testing strategies offer a powerful toolset for minimizing the number of tests, but they do not account for the time dynamics of a disease spread and do not take into account community structure. When the number of available tests is limited, two strategies are usually applied: sample testing, which tests only a sample of selected individuals, and/or group testing, which pools together diagnostic samples to reduce the number of tests needed to identify infected individuals in a population (e.g., see [10] and references therein). Both examine a static scenario: the state of individuals is fixed (infected or not), and the goal is to identify all infected ones.

To the best of our knowledge, our work in [11] was the first paper that targeted community-aware, group-test design for the dynamic case. In that work we used the well-established continuous-time SIR stochastic network model in [12] , where individuals are regarded as the vertices of a graph G and an edge denotes a contact between neighboring vertices, and explored group testing strategies that were able to track the epidemic state evolution at an individual level, using a small number of tests. However, due to the complexity of the continuous time model, we were not able to provide theoretical arXiv:2106.10765v1 [stat.ME] 20 Jun 2021 guarantees for the minimum number of tests, and although we did consider testing delays, we left interventions for future work.

In this paper, we allow interventions, we use discrete-time SIR models for disease spread and we derive theoretical guarantees. Discrete time models fit more naturally with testing and intervention (which happen at discrete time-intervals), and are more amenable to analysis enabling methods to derive guarantees on the number of tests needed to achieve closeto-complete-testing accuracy. In this paper we use a model called the "discrete-time SIR stochastic block model," which can be considered as a discrete version of the continuoustime SIR stochastic network model over a specific type of weighted graph. The graph used captures knowledge of an underlying community structure, as discussed in Section III. In Appendix A, we compare the continuous-time model from [12] with the discrete-time model introduced in our work and justify the use of our discrete-time model. We also note that our results are applicable to a larger set of SIR models, as discussed in Section IV-C.

Our main conclusion is that we can leverage the knowledge of the community and the dynamic model to inform group testing algorithms that are order-optimal and use a much smaller number of tests than complete testing to achieve the same performance. We arrive at this conclusion building on the following contributions.

We first argue that for discrete-time SIR models, given test results that identify the infection state the previous day, the problem of identifying the new infections each day reduces to static-case group testing with independent (but not identical) priors. So, existing nonadaptive algorithms such as CCA and/or random testing [13] can be reused. Figure 2 illustrates the sequence of events taking place on each day t.

We then derive a new lower bound (Theorem 2) for the number of tests needed in the case of independent (but not identical) priors. The main benefit of the new bound is not on "improving" upon the well-known entropy lower bound (stated as Lemma 1), but having a form that allows to prove order-optimality of group testing algorithms. In particular, we can prove that under mild assumptions existing nonadaptive algorithms are order-optimal in the static case (Corollary 2). This, in our opinion, is an interesting result on its own, since non-identical priors static group testing remains a relatively unexplored field compared to i.i.d. probabilistic group testing.

Finally, we derive conditions on the discrete-time SIR stochastic block model parameters under which order-optimal group test designs for the static case are also optimal for the dynamic case (Theorem 3). Simulation results show that indeed under these conditions we can achieve the performance of complete individual testing using a much smaller (close to the entropy lower bound) number of tests; for example, over a period of 50 days, group testing needs an average of around 100 tests per day for a population of 1000 individuals. Our simulations use existing non-adaptive test designs -we do not derive new code designs as the existing ones are sufficient. However, we do use marginal probabilities derived daily from the SIR model to inform the group design: that is, the group tests we use vary from day to day, and their design leverages the knowledge of the underlying system dynamics that depend on the community structure, as well as the previous day test results.

The rest of the paper is organized as follows: Section II discusses related work; Section III provides our setup and background; Sec IV contains our main results; Section V provides numerical evaluation and concludes the paper.

As stated in our introduction, our work shares similar goals with our prior work in [11] , where we considered the well established continuous-time SIR stochastic network model (see [12] ) and focused on how many tests to use and whom to test in order to track the infected individuals in the population. That work also explored how well one can learn the infected individuals given delayed test results, but gave no theoretical guarantees on the methods and did not consider intervention. Our discrete-time model for disease spread in this work, however, is more amenable to analysis and illustrates better the usefulness of group testing, being at the same time useful for practical reasons (more about this in Section III). We further note that our results are applicable to a more general set of SIR models as discussed in Section IV-C, remark 2.

Our model is closely related to the independent cascade model (see for example [14] and references therein), studied in the context of influence maximization in social networks, where we can interpret influence/rumor propagation as infections in our context. A crucial difference of our model from this is that our model allows multiple opportunities of infections over time whereas the independent cascade model only allows one opportunity to "infect". Therefore, as is noted, in our model the infectious individuals remain infectious until recovered or isolated.

The work in [15] considers a discrete stochastic model for the progression of COVID- 19 however their scope is different, as they test infrequently and thus infections are highly correlated, do not consider interventions, do not look for optimal group test designs, and do not provide theoretical guarantees on the number of tests needed.

Since we use the main principles of the SIR model our work is closely related to epidemic modeling. Works in epidemiology discuss the implications of testing and intervention for COVID-19 employing stochastic network models (see [8] , [9] and references therein) but do not consider test designs that exploit the knowledge of the underlying dynamical system. Works in control theory (see [16] and references therein) consider deterministic SIR compartment models (at the population level) and focus on intervention schemes. Here we are interested in both testing and intervention and use an individual-level SIR model.

Our work can be positioned in the general context of community-aware group testing where infected are not independent, and correlations follow from the community structure. Our work in [17] , [18] demonstrated that using a known community structure to design group testing strategies and decoding, can significantly extend the advantages of group testing by utilizing these structural dependencies. Concurrently, the works in [15] , [19] proposed decoding algorithms that take the community structure into account. Following up on these works in the static case (without temporal dynamics), there have been other recent works with similar goals [20] - [22] . Our work also leverages a known community structure that informs the system dynamics as well as the group test designs.

Further related to static group testing is the work on graphconstrained group testing (see for example [23] , [24] ), which solves the problem of how to design group tests when there are constraints on which samples can be pooled together, provided in the form of a graph. In our case, no such constraints exist and individuals can be pooled together into tests freely.

In this section we formalize our setup. Since our work for the dynamic case builds upon existing ones from static group testing, we first review some major results in that area that we also reuse in our paper (Section III-A). We then provide our model (Section III-C) and problem formulation (Section III-C).

A. Preliminary: review of results from static group testing Traditional group testing typically assumes a population of N individuals out of which some are infected. Three infection models are typically considered: (i) in the combinatorial priors model, a fixed number of infected individuals k , are selected uniformly at random among all sets of size k ; (ii) in i.i.d probabilistic priors model, each individual is i.i.d infected with probability p; (iii) in the non-identical probabilistic priors model, each item i is infected independently of all others with prior probability p i , so that the expected number of infected members isk = N i=1 p i [13] . In this paper we mostly use results that apply to the last case.

A group test τ takes as input samples from n τ individuals, pools them together and outputs a single value: positive if any one of the samples is infected, and negative if none is infected. More precisely, let U i = 1 when individual i is infected and 0 otherwise. Then the group testing output y τ takes a binary value calculated as y τ = i∈Dτ U i 1 , where stands for the OR operator (disjunction) and D τ is the group of people participating in the test.

The usual goal in static group testing is to design a testing algorithm that is able to identify all infection statuses U = (U 1 , . . . , U N ). These algorithms can be adaptive or non-adaptive. Adaptive testing uses the outcome of previous tests to decide what tests to perform next. An example of adaptive testing is binary splitting, which implements a form of binary search. Non-adaptive testing constructs, in advance, a test matrix G ∈ {0, 1} T ×N where each row corresponds to one test, each column to one member, and the non-zero elements determine the set D τ . Although adaptive testing uses less tests than non-adaptive, non-adaptive testing is often more practical as all tests can be executed in parallel.

The main challenge in static group testing is the number of group tests T = T (N ) needed to identify the infected members without error or with high probability. In the following, we present some well established results that we reuse in our work (hereafter we use typical asymptotic notations; i.e., f (n) = O(g(n), f (n) = Ω(g(n)), or f (n) = Θ(g(n)) respectively means that a f (n) ≤ cg(n), f (n) ≥ cg(n), c 1 g(n) ≤ f (n) ≤ c 2 g(n) asymptotically, where c, c 1 , c 2 are universal constants):

• For the probabilistic model (ii), any non-adaptive algorithm with a success probability bounded away from zero as N → ∞ must have T = Ω min{k log N, N } [25, Theorem 1], [26] . This means that either any non-adaptive group testing with a number of tests O(k log N ) is order optimal, or individual testing is order optimal 2 . In particular, random test designs, such as i.i.d. Bernoulli [28] - [30] and near-constant tests-per-item [31] , [32] have been proved to be order-optimal in a sparse regime wherek = Θ(N α ) and α ∈ (0, 1). In fact, in the same regime, [26] has provided the precise constants for optimal non-adaptive group testing. Conversely, classic individual testing has been proved to be optimal in the linear (k = Θ(N )) [33] and the mildy sublinear regime (k = ω( N log N )) [25] . • For the probabilistic model (iii), a lower bound for the number of tests needed is given by the entropy, stated below:

Lemma 1 (Entropy lower bound). Consider the non-identical probabilistic priors model of static group testing, where each individual i ∈ [N ] is infected independently with probability p i . The number of tests T needed by a non-adaptive algorithm to identify the infection status of all individuals with a vanishing probability of error satisfies

where h 2 (·) is the binary entropy function.

See Appendix A in [13] for a proof. On the algorithmic side, two known algorithms are: the adaptive laminar algorithms that need at most 2 N i=1 h 2 (p i ) + 2k tests on average, and the "Coupon collector" nonadaptive algorithm (CCA) that needs at most T ≤ 4e(1 + δ)k ln N test to achieve an error probability no larger than 2N −δ whenever p i ≤ 1 /2 [13] , [34] .

Distinction from traditional group testing. In this paper, we focus on probabilistic infections and non-adaptive test designs, but we differ from traditional group testing in two ways: (a) Traditional group testing examines a static scenario, where the state of individuals is fixed (infected or not); we are instead interested in a dynamic scenario, where the state of an individual may change, even during the test period. This is particularly true since test results may not be available instantaneously but instead with a delay (e.g. after one day). (b) The infection probabilities p i are not independent; instead, they are correlated where the correlation is induced by the underlying community structure and dynamic infection spread model we consider (for instance, two individuals who live in the same household are more likely to be both infected or not). This implies that U i and U j are not independent, as is the assumption in traditional group testing.

We now describe our infection model via the discretetime SIR stochastic block model with parameters (N, C, q 1 , q 2 , p init ). Consider a population of size N that is partitioned into multiple communities of size C . For simplicity we assume that N /C is an integer. On any day t ∈ N, each individual can be in one of three states: Susceptible (S), Infected (I) or Recovered (R). Let X (t) i ∈ {S, I, R} denote the state of individual i on day t, and define the state of the system as

. A small number of individuals are initially infected, and all new infections occur during "transmissible contacts" between infected and susceptible individuals. Recoveries occur independent of infections.

More precisely, on day t = 0, every individual is i.i.d infected with probability p init . The following steps repeat everyday starting at t = 1:

• An infected individual in some community infects a susceptible one from the same community w.p. q 1 , independently of the other infected individuals of the community. • An infected individual in some community infects a susceptible one from another community w.p. q 2 , independently from all other infected individuals. • An infected individual recovers independently from all other individuals w.p. r . The discrete-time SIR stochastic block model can be envisioned as a discrete version of the well-established continuoustime SIR stochastic network model [12] on the corresponding weighted graph. It inherits the main properties from the latter; for example, infections are transmitted only from an infected to a susceptible individual and both infections and recoveries are stochastic. The main difference is that in the continuoustime one, the infections and the recoveries happen according to continuous-time Markovian process with transmissibility rate β and recovery rate γ, which means that the time until a new state transition (S → I or I → R) is exponentially distributed (with mean β or γ respectively). Indeed this makes the event that an individual got infected and subsequently recovered within a single day possible in the continuous-time model, whereas this is impossible in our discrete-time model.

Learning the intra-community and inter-community structure to model infection transmissions is, we believe, also practically feasible. Close contact "community" data is often readily available; for example students in each classroom in a school could form a community, and so could workers in the same office space. We also note that community-level network models alleviate some of the privacy concerns associated with using contact tracing data which tracks the exact pairs of individuals who come in contact with each other on a daily basis.

A useful remark about our model is that the state of an individual X

This difference is important in our context, because our tests do not distinguish between susceptible and recovered individuals. In the remainder of the paper, X (t) i will be called the "SIR state" of individual i, while U (t) i will be i's "infection status." As a result, whether a individual is infected or not changes with the day, and thus we now consider a random variable U (t) i associated with each individual that describes whether it is infected on day t (t ∈ N ).

As can be seen from Fig. 1 , testing everyone, everyday, and isolating infected individuals helps drastically reduce the number of infections that happen on a given day. We assume that the results of a test administered on a particular day are available only the next day (as usually is the case with classic PCR testing for SARS-COV-2). We also isolate only the individuals who test positive, and we do so as soon as the test results are available. Moreover, we bring back the isolated individual into the population only when they are completely recovered. Note that in the SIR model, recovered individuals cannot get infected and play no role in transmitting the infection. Therefore, without loss of generality, we could assume that isolated individuals remain isolated for the rest of the testing period.

Given these assumptions, the question we ask is if complete testing is necessary to identify all new infections everyday, or if we can achieve the same performance as complete testing with significantly fewer number of tests. In particular, how many non-adaptive group tests are necessary and sufficient to identify all new infections (with a vanishing error probability) on each day? Our problem formulation is depicted in Fig. 2 .

To aid a precise mathematical formulation for the problem, we first introduce some notation.

• I (t) j : number of new infections in community j that occurred on day t. Note that in the set-up of Fig. 2 , this number is also equal to the number of infected individuals remaining in community j after intervention has been decided for day t+1. The new infections which happened on day t will only be identified by the tests administered on day t + 1, whose results are available only on day t + 2. • p (t) j : the probability of an individual in community j who was susceptible at the end of day t − 1 getting infected on day t. Note that this is same for every such individual in community j, by symmetry of the model. Moreover, we can calculate this probability as

Reduction to static group testing with non-identical probabilistic priors. Note that given I (t−1) j ∀j, an individual belonging to community j is infected independently of every other individual with probability p (t) j on day t. Thus, conditioned on the infection status of all individuals on day t − 1, the infections which happen on day t are independent (but not identically distributed). Now in our dynamic testing problem set-up, on day t we perfectly learn the infection statuses of all non-isolated individuals at the time of testing on the previous day t − 1. Given this information, we can exactly calculate the p (t−1) j ∀j (see Fig. 3 ), i.e. the probability that each susceptible individual in community j was later infected because of the non-isolated infected individuals.

So, given accurate test results, the dynamic testing problem is transformed daily to the problem of static group testing with non identical probabilistic priors (model (iii) in Section III-A). Therefore, the precise question we are after is the following: given that each individual in community j is infected with probability p (t) j independently of every other individual, how many tests are necessary and sufficient to learn the infection status with a vanishing probability of error? We answer this question in the next section.

In this section, we prove our main theoretical results. For brevity, we will use the terms "i.i.d. priors" and "nonidentical priors" to refer to i.i.d probabilistic priors model (ii) and non identical probabilistic priors model (iii) from Section III-A, respectively. The contents of this section are ordered as follows:

• First, we provide a new lower bound on the number of tests required for the problem of static group testing with non i.i.d probabilistic priors (Theorem 2). To prove Theorem 2, we use two intermediate results: (a) we show that any test design that "works" for a given prior probabilities of infection (p 1 , p 2 , ..., p N ) also works for the reduced prior probabilities (p 1 , p 2 , ..., p N ) where p i ≤ p i ≤ 0.5 ∀i. In words, we essentially prove that group testing is easier when the infections are sparser (Theorem 1); and (b) we show the following interesting property of the optimal decoder (Lemma 3) -if the optimal decoder correctly infers all the infection statuses when a set D is the set of infected individuals, then it will also correctly infer all the infection statuses when D ⊂ D is the set of infected individuals. • Second, we use simple asymptotic arguments to show that some existing group testing strategies (such as CCA [13] for non-identical priors and random testing for i.i.d priors) are order-optimal for non-identical priors (Corollary 2), when p max = O(p min ), where p max is the maximum entry in (p 1 , p 2 , ..., p N ) and p min is the minimum entry. The order O(·) is order with respect to the size of the population N . • Finally, in Theorem 3, we bridge the gap between our dynamic testing problem formulation and the above static testing problem by showing that if

, then the above two conditions on the prior vector are satisfied everyday in the discrete-time SIR stochastic block model parameterized by (N, C, p init , q 1 , q 2 ). As a result the existing group testing strategies discussed above are order-optimal even for the dynamic testing problem formulation considered, provided that we use a sufficient number of tests each day to identify all new infections for that day. 

Identify all individuals who were infected in the community at the beginning of day − 1.

Isolate all individuals who were infected in the community at the beginning of day − 1.

Perform non-adaptive tests to identify individuals who are currently infected

Our problem: How many tests are needed? all individuals. We first define some notation specific to this subsection:

Basically represents the vector notation for the set of infections given by D. Note that there is a one-one correspondence between D and U(D). We will use these two notations interchangeably based on convenience. In case of ties, the MAP decoder will select the solution which comes the earliest lexicographically.

In words, the MAP decoder chooses the most likely configuration which explains the test results. We next show that the MAP decoder is the optimal decoder for a fixed G and p, i.e., the MAP decoder minimizes the probability of error amongst all decoders for any G, p.

Remark. Though the MAP decoder is optimal, it is unclear if the optimization problem corresponding to the MAP decoder can be solved efficiently. However, many heuristics such as belief propagation (see for example [17] ) and random sampling methods exist which approximate well the MAP decoder. That said, in this work we use the MAP decoder only as a tool for theoretical analysis of the error probability.

Lemma 2 (Optimality of MAP decoder). For given test matrix G and priors p, the corresponding MAP decoder minimizes the probability of error for the test matrix under the given priors, i.e., P err (G, R map ( · ; G, p); p) ≤ P err (G, R; p) ∀R.

We give the proof of Lemma 2 to Appendix B. Given the optimality of the MAP decoder, we will denote by P * err (G, p) P err (G, R map ( · ; G, p); p), the optimal probability of error corresponding to a given test design and priors.

We next prove a property of the MAP decoder, and this property will be used in the proof of our main result that follows. The following Lemma says that it is easier for the MAP decoder to identify a sparser defective set. For the proof of Lemma 3, we refer the reader to Appendix C.

We next prove the main new result for the static case. In words, the following theorem says that the group testing problem is only easier when the infections are sparser. As a result, this allows us to lower/upper bound the group testing problem with non-identical priors by a group testing problem with identical priors. Theorem 1. Consider a testing matrix G used with two different sets of priors p and p . Further let p i = p i for every i ∈ [N ], i = j and p j ≤ p j ≤ 0.5. The two prior vectors are same everywhere except at index j where p is smaller. Then P * err (G, p ) ≤ P * err (G, p). Proof. We prove this by showing that when the MAP decoder corresponding to (G, p) pair is used as a decoder with (G, p ), the probability of error is always lower, i.e., P err (G,R map ( · ; G, p); p ) ≤ P err (G, R map ( · ; G, p); p) = P * err (G, p). As a result the optimal decoder for (G, p ) pair has a probability of error not exceeding this quantity. Now, we can express the probability of error for the MAP decoder of (G, p) pair. For simplicity of notation in the follow- 

where in (a) we split the summation into two cases -one where j ∈ D and the other where j / ∈ D; in (b) we take j out of the summation.

Similarly, one could express the probability of error for the same decoder with the pair (G, p ) as The first error term P err (G, R map ( · ; G, p); p) is of the form p j a + (1 − p j )b, and the second error term P err (G, R map ( · ; G, p); p ) is of the form p j a + (1 − p j )b. From Lemma 3, we have E(D ∪ {j}) ≥ E(D) and hence a ≥ b. Since a ≥ b and p j ≤ p j , one can verify that p j a + (1 − p j )b ≥ p j a + (1 − p j )b, and thus P err (G, R map ( · ; G, p); p) ≥ P err (G, R map ( · ; G, p); p ), concluding the proof. Now one could repeatedly apply Theorem 1 on the prior vector p to conclude that any test matrix G should only do better on the reduced uniform prior vector p min = (p min , p min , ..., p min ) where p min min i∈[N ] p i . On the other hand, the test matrix G should only do worse on the prior vector p max = (p max , p max , ..., p max ) where p max max i∈[N ] p i . This is stated below without a formal proof. 

. As a consequence of the above corollary, the number of tests required to attain a fixed (small) probability of error with prior vector p min is not more than the number of tests required to attain probability of error with prior vector p. This observation allows us to use the lower bound on the number of tests when the priors are identical. This is made precise in the following theorem. Proof. From Corollary 1, suppose a test matrix G achieves a probability of error on prior vector p, the same test matrix achieves a probability of error not more than on the prior vector p min , where p min = (p min , p min , ..., p min ) and p min min i∈[N ] p i . Any strategy that achieves a probability of error → 0 as N → ∞ with the prior vector p min requires a number of tests equal to Ω(min{N, N p min log N }). Thus, we need at least as many tests with the prior vector p.

As discussed in Section III-A, the entropy bound in Lemma 1 is an alternate lower bound on the number of tests needed for this problem. We note that the entropy bound might be greater or smaller than the term N p min log N in Theorem 2. In particular, if p i ≤ 1/2 ∀i it is easy to see that N i=1 h 2 (p i ) ≥ N h 2 (p min ) ≥ N p min log 1 /pmin. However the term 1 /pmin may be smaller or larger than N ; thus our bound, that applies independently of the value of p min (as long as p i ≤ 0.5) cannot be directly derived from the entropy bound, and could be either greater or lesser than the entropy bound. Having said that, the main advantage of the lower bound in Theorem 2 is its particular form, which allows the proof of order-optimality of several static group testing algorithms, as we will see in the next subsection. Now, if the prior vector p is "bounded", in the sense that the maximum entry and minimum entry in p differ by a constant factor (constant with respect to N ), then the lower bound can be re-written in terms of the maximum entry in p or the mean of p. Basically we here just use the fact that constant factors do not affect the order. We next make this corollary precise. = Ω(min{N, N p max log N }).

Suppose p is η-bounded and each p i ≤ 0.5. The following non-adaptive algorithms can be proved to be order-optimal with respect to the lower bound in Corollary 2:

• The Coupon Collector Algorithm (CCA) from [13] for prior vector p, as discussed in Section III-A, achieves a probability of error less than 2N −δ with a number of tests less than 4e(1 + δ)N p mean log N (see Theorem 3 in [13] ). As a result, w.r.t to the lower bound in Corollary 2, either CCA is order-optimal (if N ≥ N p mean log N ) or individual testing is order optimal (if N ≤ N p mean log N ). • As discussed in Section III-A for the group testing problem with identical priors (say every item is infected with the same probability p ), a variety of randomized and explicit algorithms have been proposed 3 which achieve a vanishing probability of error with a number of tests O (N p log N ) . From Corollary 1, any test matrix that achieves a vanishing probability of error with p max should also attain a vanishing probability of error with p, and as a result O(N p max log N ) tests are sufficient for the prior vector p. Consequently w.r.t our lower bound in Corollary 2, any of these designs is order optimal (if N ≥ N p max log N ) or individual testing is order optimal (if N ≤ N p max log N ).

Given the discussion above, we next show conditions under which the prior probabilities of infections each day (these change everyday) are η-bounded and are each not more than 0.5. If these two conditions are satisfied everyday for our discrete-time SIR stochastic block model set-up in Fig. 3 , then CCA and the other algorithms discussed in Section IV-B are order-optimal for our dynamic testing problem formulation. (see 3). We first define some notation, building upon the notation in Section III-C.

j , the maximum probability of new infection on day t.

j , the minimum probability of new infection on day t. Fig. 3 where the infections follow the discrete-time SIR stochastic block model (N, C, q 1 , q 2 , p init ).

≤ η and as a result the prior vector for each day is η-bounded.

Proof. We prove (i) first. We first have p

where in (a) we used the fact that the total number of infections inside a community and overall cannot be greater than C and N , respectively; (b) follows because of the algebraic inequality (1 + x) y ≥ 1 + xy if x ≥ −1 and y / ∈ (0, 1); in (c) we used our assumptions about q 1 and q 2 .

We next prove (ii). Since q 1 ≥ q 2 in our model, we have

where max j I (t) j is simply the maximum number of infections over all communities. Likewise,

where in (a) we used q 2 ≤ q 1 . Combining (3) and (4) we have

where (a) follows from the following facts: the function f 1 (x) = 1−κx 1−x is increasing for κ ∈ (0, 1), the function f 2 (x) = (1 − q 2 ) x is decreasing for q 2 ∈ (0, 1), and the sum j I (t−1) j is lower bounded by max j I (t−1) j ; and (b) follows from the fact that the function q2) x is decreasing in x ≥ 1 for q 1 ≥ q 2 , and therefore the maximum of the ratio is obtained for max j I (t−1) j = 1. All proofs of the above statements are provided in Appendix D.

Finally, we make three remarks related to the results introduced in this section. Remark 1. Both assumptions (i) and (ii) on the parameters in Theorem 3 will hold true when the number of communities is a constant, i.e., the size of each community is C = Θ(N ) (as is the case when the population is well-mixed, or if we just consider a single community); assumption (i) does not require C = Θ(N ). In our simulations, we observed empirically that assumption (ii) also holds when C << N ; we do not have a formal proof of Theorem 3 for this case however. Remark 2. Our results hold not just for the specific model introduced in Section III (where in particular we assume symmetric intra and inter community transmissions) but for any underlying community structure where the two conditions (bounded prior vectors and the value of each prior not exceeding 1 ⁄2) are satisfied. For example, one could have a model where an infected individual can transmit the infection only to a subset of his fellow community members with probability q 1 (he/she cannot transmit to the rest of the individuals in his/her community) and only to a subset of individuals outside his/her community with probability q 2 . For this example model, the conditions in Theorem 3 are sufficient to prove the two requirements on the prior vector. Remark 3. Intervention is a crucial aspect for our results to hold true. Without intervention in our dynamic model, many of the prior probabilities would be greater than 1 /2 and our requirements on the prior vector would not be satisfied.

In this section, we show illustrative numerical results on the necessary and sufficient number of tests required for the discrete-time SIR stochastic block model. We next describe the experimental set-up.

• We simulate multiple instances (or trajectories) of the pipeline in Fig. 2 where the infections follow the discretetime SIR stochastic block model (N, C, p init , q 1 , q 2 ), and for different testing strategies. We simulate 200 trajectories and in Fig. 4 , plot the daily average of the quantities across these trajectories. • For each of these testing strategies, we empirically find the number of tests required on each day to identify all the infections on the previous day. To do this, on each day for a given trajectory, we start with 1000 tests and decrease this number (at a certain granularity) until the testing strategy makes a mistake. The smallest number of tests for which the strategy worked is plotted. , regardless of whether the conditions required for Theorem 3 hold or not. The reason we use the entropy bound instead of our lower bound in Theorem 2 is that the entropy bound was numerically observed to be larger. Indeed, the lower bound in Theorem 2 contains some accompanying hidden constants which are small when used for our particular choice of N .

We compare the following testing strategies in our numerical simulations.

• Complete testing. We test every non-isolated individual remaining in the population each day. • Coupon Collector Algorithm (CCA) from [13] . We showed the order-optimality of this algorithm for the dynamic testing problem at the beginning of Section IV-C. In short, on each day, the CCA algorithm constructs a random non-adaptive test design which depends on p (t)

j . The idea is to place objects which are less likely to be infected in more number of tests and vice-versa. We refer the reader to [13] for the exact description of the algorithm. • Random group testing for max probability (Rnd. Grp.

max.) Here we construct a randomized design assuming that each individual has a prior probability of infection p (t) max . From Corollary 1, such a design must also work for the actual priors p (t) j . We construct a constant column-weight design (see e.g. [32] ) where each individual is placed in L = T /(Np (t) max log 2) tests. Such a test design achieves a vanishing probability of error with O(N p (t) max log N ) tests (see for example [32] for a proof), and hence is orderoptimal under the conditions in Theorem 3. • Random group testing for mean probability (Rnd. Grp. mean) Here we construct a randomized design assuming that each individual has a prior probability of infection p

mean is defined as the mean prior probability of infection across all individuals. Unlike Rnd. Grp. max., there is no guarantee on how many tests are needed by such a design to identify the infection statuses of all individuals. However, the numerical results in Fig. 4 show that such a design requires fewer tests than CCA or Rnd. Grp. max. designs. The numerical results in Fig. 4 are illustrated for two different parameter values of the discrete-time SIR stochastic block model. In both cases, we see that Rnd. Grp. mean requires the least number of tests to identify the infection statuses of all non-isolated individuals. Moreover, the number of tests required by all three testing strategies considered is much less than the number required by complete testing. In fact the numerics in Fig. 4 indicate that if we use a number of tests equal to 1 /5 of number of tests required for complete individual testing, all these algorithms would achieve the same performance as complete individual testing, at least for the particular examples that we considered.

A natural follow-up question to ask is if there is a systematic way to choose the number of tests that need to be administered, given the upper bounds discussed in Section IV-B. In Appendix E, we discuss one such heuristic and show that it achieves close-to-complete-testing performance.

In this work we proposed the problem of dynamic group testing which asks the question of how to continually test given that infections spread during the testing period. Our numerical results answer the question we started with -in the dynamic testing problem formulation, given a day of testing delay, is it possible to achieve close to complete testing performance with significantly fewer number of tests? The answer is yes, and in this paper we not only showed numerical evidence supporting this fact, but also gave theoretical bounds on the optimal number of tests needed in order to achieve this.

Although we gave theoretical bounds on the optimal number of tests needed each day, many open questions remain. In particular, it would be interesting to study the same problem when tests are noisy, or when we can use other models of group tests such as the one in [35] , or simply when one cannot perfectly learn the states of all individuals on a testing day. In addition, it would also be of interest to study/use other test designs, for example like the ones in [27] - [32] , [34] , [36] - [39] . Finally, it remains open to see how these results translate to the continuous-time SIR stochastic network model of [12] .

VII. ACKNOWLEDGEMENTS This work was supported in part by NSF grants #2007714, #1705077 and UC-NL grant LFR-18-548554. We also thank Katerina Argyraki for her ongoing support and the valuable discussions we have had about this project.

The well-studied continuous-time SIR stochastic network model from [12] has been the main motivation for our discretetime SIR stochastic block model. In fact, the discrete-time SIR stochastic block model described in Section III-B can be considered as a discretized version of the continuous-time SIR stochastic network model over the weighted graph, where 2 individuals belonging to the same community are connected by an edge with weight q 1 and 2 individuals belonging to different communities are connected by an edge with weight q 2 , and recoveries occur at the rate r/day -i.e., an infected individual transmits the disease to a susceptible individual in the same community at the rate q 1 /day and to a susceptible individual in a different community at the rate q 2 /day. In Fig. 5 , we compare the continuous-time model above and the discrete-time model for a few example values of q 1 , q 2 and r for illustration.

We make a few observations: • The progression of the disease in the discrete-time and continuous-time models, though not identical, follow a similar pattern, justifying the use of the discrete-time model. • In both the models, 1 /q1 is the expected time for an infected individual to transmit the disease to a susceptible individual in the same community, 1 /q2 is the expected time for an infected individual to transmit the disease to a susceptible individual Day # of infected individuals q 1 = 0.02, q 2 = 0.001 q 1 = 0.02, q 2 = 0.001 q 1 = 0.008, q 2 = 0.0004 q 1 = 0.008, q 2 = 0.0004 q 1 = 0.008, q 2 = 0.0001 q 1 = 0.008, q 2 = 0.0001 in a different community and 1 /r is the expected time for an infected individual to recover. • In the continuous-time model, an individual can get infected and recovered in the same day, whereas this is not possible in our discrete-time model (infected individuals can recover starting from the day after they are infected).

The optimality of the MAP decoder is a standard result in statistics and signal processing. We however give the proof in the context of our problem, for completeness.

Lemma 2 (Optimality of MAP decoder). For given test matrix G and priors p, the corresponding MAP decoder minimizes the probability of error for the test matrix under the given priors, i.e., P err (G, R map ( · ; G, p); p) ≤ P err (G, R; p) ∀R.

Proof. As stated at the beginning of Section IV, the Probability of error for a test matrix, decoder pair under given the priors is

where Y is the set of test results. For the MAP decoder, the term inside E Y is .

Similarly, for any decoder R, we have

. (6) Comparing (5) and (6) concludes the proof.

Lemma 3. Consider a test matrix G and priors p. Suppose the corresponding MAP decoder is erroneous when identifying the defective set D. Then the MAP decoder is also erroneous for the set of defectives is D ∪ {j} with p j ≤ 0.5, i.e., . For comparison, we also plot the performance when no one is tested (no testing) and when everyone is tested (Complete).

is decreasing for q 1 ≥ q 2 , because of the following: Let c 1 = 1 − q 1 and c 2 = 1 − q 2 , so that c 2 ≥ c 1 . Then,

where (a) follows from the fact that c 2 ≤ c 1 and the function g(c) = c ln c 1−c is non-increasing for c ∈ (0, 1). The latter can be seen by taking the derivative g (c) = ln c−c+1

(1−c) 2 , which is always non-positive for c ∈ (0, 1), as ln c ≤ c − 1.

Given the results and discussion in Section V, a natural question to ask is if one could use a number of tests based on the upper bounds discussed in Section IV-B. In particular, we focus on the upper bound for CCA which implies that CCA achieves a probability of error less than 2N −δ with a number of tests at most 4e(1 + δ)N p mean log N (see Theorem 3 in [13] ). Note that the probability of error is small, but not zero, for finite values of N . Here, we use a number of tests equal to 12eN p mean log N each day (corresponding to an error probability less than 2N −2 ) and plot the number of errors made by each of the three test designs considered in Section V. The experimental set-up is as follows: (i) We maintain an estimate of the probability p (t) j that a susceptible individual belonging to community j becomes infected on day t (see Section III-B for the precise definition), for each community j, and for each day t.

(ii) At the beginning of day t, we obtain the results of the tests administered on day t−1. From these results, we form an estimate U 

is the current number of non-isolated individuals in the community. We next construct a testing matrix with T tests and administer these tests. For complete testing, we use T = N (t) .

(vi) Steps (ii) − (v) repeat each day.

Given the above set-up, Figure 6 compares the performance of the test designs described in Section V. We make a few observations:

• We see that the algorithms do not always attain the performance of complete testing. This is due to the fact that for finite N , the probability of error is non-zero. However, as seen from Figure 6 , the performance of CCA improves as N increases. On the other hand, Rnd. Grp. max. has the opposite trend; this is not surprising since the number of tests was chosen based on the upper bound for CCA and as a result there is no guarantee that the same number of tests is sufficient for Rnd. Grp. max.

• In comparison to the plots in fig. 4 , the number of tests used here is much higher during the initial few days, which indicates the looseness of the upper bound; it remains open to show tighter upper bounds for these algorithms.

• Suppose we make an error when identifying the infection status on a particular day t, the estimates of p (t−1) j are not exact, which in turn leads to potentially insufficient choices for the number of tests needed for subsequent days and inaccurate test designs. This drives an error accumulation and as a result the later days are more prone to error, as also seen in Figure 6 .

Group testing against covid-19

Coronavirus test shortages trigger a new strategy: Group screening

Five people. one test. this is how you get there

Group testing for sars-cov-2 allows up to 10-fold efficiency increase across realistic scenarios and testing strategies

Tapestry: A single-round smart pooling technique for covid-19 testing

Variation in false-negative rate of reverse transcriptase polymerase chain reaction-based sars-cov-2 tests by time since exposure

Population-scale testing can suppress the spread of infectious disease

Population-scale testing can suppress the spread of covid-19

Frequency and accuracy of proactive testing for covid-19

Group testing: an information theory perspective

An entropy reduction approach to continual testing

Group testing with prior statistics

Maximizing the spread of influence through a social network

Contact tracing enhances the efficiency of covid-19 group testing

Safetycritical control of compartmental epidemiological models with measurement delays

Group testing for connected communities

Group testing for overlapping communities

Noisy pooled pcr for virus testing

Adaptive group testing on networks with community structure

Group testing with a graph infection spread model

Network group testing

Graphconstrained group testing

Sequential group testing with graph constraints

Optimal non-adaptive probabilistic group testing requires θ(min{k log n, n}) tests

Optimal group testing

A fast binary splitting approach to non-adaptive group testing

Group testing algorithms: Bounds and simulations

Boolean compressed sensing and noisy group testing

Phase transitions in group testing

Information-theoretic and algorithmic thresholds for group testing

Performance of group testing algorithms with near-constant tests per item

Individual testing is optimal for nonadaptive group testing in the linear regime

Non-adaptive group testing: Explicit bounds and novel algorithms

Ac-dc: Amplification curve diagnostics for covid-19 group testing

Sublinear-time non-adaptive group testing with o (klogn) tests via bit-mixing coding

Efficient algorithms for noisy group testing

On the optimality of the kautz-singleton construction in probabilistic group testing

Saffron: A fast, efficient, and robust framework for group testing based on sparse-graph codes

(1) there exists some set D = D, such that G(U(D )) = G(U(D)) and P(U(D ); p) > P(U(D); p) or (2) there exists some set D = D, such that G(U(D )) = G(U(D)) and P(U(D ); p) = P(U(D); p) and D is lexicographically earlier than D. Hence MAP identifies incorrectly D as the defective set given that D was the true defective set. We prove assuming that the first situation occurred; the proof follows identical arguments for the second situation. Now, we consider two different cases for individual j that is added to D ∪ {j}: (i) If j / ∈ D , then from our assumption in (1), notice that the defective set D ∪ {j} explains the test results of D ∪ {j} -D gives the same test results as D and the extra individual j added to both the sets will still give the same results. We next claim that P(U(D ∪ {j}); p) > P(U(D ∪ {j}); p), and consequently the MAP decoder will fail to correctly identify the defective set D ∪ {j}. Now to prove our claim, we start with our assumption (1), i.e.,where in (a) we take out the term corresponding to j, also we use the fact that j / ∈ D and j / ∈ D ; (b) follows from multiplying both sides with pj /1 − pj ; in (c) we push the p j term into the first product term.(ii) If j ∈ D , we again first note that the defective set D ∪ {j} = D explains the test results of D ∪ {j}. We next claim that P(U(D ); p) > P(U(D ∪ {j}); p), and consequently the MAP decoder will fail to correctly identify the defective set D ∪{j}. Now to prove our claim, we start with our assumption (1), i.e.,where in (a) we take out the term corresponding to j, also note that j / ∈ D but j ∈ D ; (b) follows from the fact that 1−p j ≥ p j when p j ≤ 0.5, so we can replace the (1−p j ) term on the right-hand side by p j without affecting the inequality; in (c) we push the p j term into the first product term.

In this section we prove some auxiliary statements about functions f 1 (x), f 2 (x) and f 3 (x) that are used at the end of the proof of Theorem 3:• f 1 (x) = 1−κx 1−x is increasing for κ ∈ (0, 1), because f 1 (x) = − κ−1 (x−1) 2 > 0. • f 2 (x) = (1 − q 2 ) x is decreasing for q 2 ∈ (0, 1), because f 2 (x) = ln (1 − q 2 ) (1 − q 2 )x < 0.