key: cord-0678108-4rr6152i
authors: Lin, Yifan; Ren, Yuxuan; Wan, Jingyuan; Zhou, Enlu
title: Group Testing During the COVID-19 Pandemic: Optimal Group Size Selection and Prevalence Control
date: 2020-08-15
journal: nan
DOI: nan
sha: 630c1e90c3cef1fa4d3da1307ca0612e840dd152
doc_id: 678108
cord_uid: 4rr6152i

Group testing pools multiple samples together and performs tests on these pooled samples to discern the infected samples. It greatly reduces the number of tests, however, with a sacrifice of increasing false negative rates due to the dilution of the viral load in pooled samples. Therefore, it is important to balance the trade-off between number of tests and false negative rate. We explore two popular group testing methods, namely linear array (a.k.a. Dorfman's procedure) and square array methods, and analyze the optimal group size of a pooled sample that minimizes the group false negative number under a constraint of testing capacity. Our analysis shows that when there is reasonably large testing capacity, the linear array method yields smaller false negative number and hence is preferred. When the testing capacity is small, square array method is more feasible and preferred. In addition, we consider testing a closed community in a period of time and determine the optimal testing cycle that minimizes the final prevalence rate of infection at the end of the time period. Finally, we provide a testing protocol for practitioners to use these group testing methods in the COVID-19 pandemic.

Group testing refers to the idea of pooling multiple samples together and performing tests on certain subsets of these samples to discern the infected samples. Due to the resource and time it takes to run the RT-qPCR test for COVID-19, it is nearly impossible to conduct individual test of everyone in a relatively large population. Instead, group testing provides a promising way to save the testing budget while detecting the infected samples out of a large population. Dating back to [1] , many different group testing methods have been developed and analysed over the years, such as those based on compressed sensing (see [2] , [3] ), and information theory (see [4] , [5] ). The COVID-19 pandemic reignites a great interest in group testing, both from academic research (e.g. [6] , [7] , [8] , [9] , [10] , [11] , [12] , [13] and [14] ) and from the public (e.g. [15] and [16] ).

To illustrate group testing, let's first take a look at a simple group testing method, called binary search. Suppose there are eight people to test and only one person is infected (unknown to us). Since we don't know there is only one infected person, we have to test them all in individual testing. In binary search, we first divide people into two groups of four. For the first four people, we pool their samples together and carry out one test. Suppose it is negative, we then confirm these four people are not-infected, and conduct the group testing for the second group of four people. Suppose the result for the second group of four people is positive, we further divide these four people into two halves, and repeat the above procedure. Here we use only six tests in binary search, which saves 25% number of tests compared with individual testing.

There are two paradigms of group testing, as illustrated in [17] . Combinatorial group testing (CGT) assumes a known number of infected people among the tested population. Probabilistic group testing (PGT) assumes that people test positive independently with probability p. From the perspective of sequences of testing, group testing can be classified as adaptive testing and non-adaptive testing. In adaptive testing, first a group is chosen randomly and tested, and the outcome of this test determines the next group to test and so on. In non-adaptive testing, a fixed number of tests are always performed irrespective of the number of infected samples present in the pool, so all the tests can run in parallel. In particular, [18] formulate it as a two-stage process. At the first stage, all individuals are arranged into several groups and each group gets tested. At the second stage, positive/negative result for each individual is deduced.

However, the aforementioned work usually assumes that testing result is accurate. In other words, when samples are pooled together and tested, if there is at least one infected sample, then the test result is always positive. Under this assumption, naturally the main goal for developing different group testing methods is to minimize the total number of tests. [7] gives a scheme for comparing several different group testing methods, and shows that the prevalence rate is the major factor that determines the number of tests of a group testing method. In other words, performance of group testing varies when the prevalence rate changes. [19] , [20] discuss the relationship between these quantities. However, when individual samples are pooled together, the viral load in the infected samples gets diluted and hence leads to false negative detection (i.e., infected samples test negative). We take this dilution effect into consideration, and consider how to optimize the group size to balance the trade-off between number of tests and false negative rate. On the one hand, the bigger the group size is, the less number of tests needed. On the other hand, a bigger group size also causes a larger false negative rate. When the group size is too large, the false negative rate will be so large that renders the group testing method useless. Therefore, we aim to choose the optimal pool size that minimizes the false negative number under a constraint of test capacity.

In addition, we consider testing everyone in a large population in a testing cycle of multiple days to help control the prevalence rate. To model the dynamics of the system, we proposed a testing-quarantine-infection model, where group testing is conducted at the very beginning of each day and people who test positive get quarantined, while the infection keeps spreading. This model requires the testing can be conducted in a short period of time, so we mainly consider non-adaptive group testing methods that can run in parallel. Specifically, we consider linear array and square array group testing methods that are easy to understand and implement in practice.

In summary, the contribution of this paper is two-fold:

• We take into consideration the dilution effect caused by pooling, balance the trade-off between the number of tests and the false negative number, and derive the optimal group size under a stochastic optimization formulation. • We design a testing protocol such that we can test everyone in a testing cycle under a limited testing capacity and keep the prevalence rate low in the population.

The remainder of the paper is organized as follows: Section 2 briefly introduces the dilution effect in group testing. Section 3 studies the linear array and square array methods, and derives the optimal group size that minimizes the false negative number under a constraint of daily testing capacity. Section 4 proposes a testing-quarantine-infection model for group testing, and determines the optimal testing cycle length through simulation. Sensitivity analysis is also conducted in this section with regard to different parameters. Section 5 concludes the paper with the complete testing protocol for practitioners.

Even though group testing is efficient in detecting the infected people using as few number of tests as possible, it is at the sacrifice of decreased sensitivity. In group testing, the false negative rate, defined as the probability of infected people being tested negative, comes from two sources.

• Swabbing. The swab used to take samples might not contain enough amount of the virus from an infected person. • Pooling. When pooling many samples into one sample, those infected samples are pooled with uninfected samples, which will dilute the viral load in the pooled sample and hence decrease the sensitivity of the test.

We focus on the second source of pooling dilution. The impact of pooling dilution has been studied in [21] , [22] , [23] etc.. To the best of our knowledge, pooling dilution particularly for SARS-CoV-2 has been studied in [17] and [10] , till the time of completion of this manuscript. More specifically, [17] utilizes a dilution effect model derived for testing HIV, and argues that it is applicable to other viral testings including SARS-CoV-2. Most recently, based on the mechanism of the RT-qPCR test and the clinical data for SARS-CoV-2 virus, [10] proposes a new statistical model for determining the false negative rate induced by pooling. The basic idea is to measure the false negative rate as the probability that the amount of virus contained in the pooling sample exceeds the detection limit of RT-qPCR test. Assume exactly one infected individual is pooled with N − 1 uninfected individuals to form a group of size N , the false negative rate caused by pooling is estimated as

where F µ k ,σ k (·), k = 1, 2, 3 is the cumulative distribution function (CDF) of normal distribution N (µ k , σ k ), k = 1, 2, 3, d cens is the parameter called detection limit. For more details, such as mathematical deductions, parameters settings, etc., we refer the readers to Appendix D and [10] . Figure 1 shows the relationship between pooling size N and the false negative rate γ. Note that in this so-called "uncensored model" in [10] , the false negative rate from the first source of swabbing is neglected. Remark 1. The results throughout the paper are based on the uncensored model in [10] . If more precise models for the pooling dilution effect become available, they can be used to replace the current model in our approach proposed in this paper.

Among the various group testing methods, we choose to focus on two methods: the linear array method and the square array method. These two methods are non-adaptive, easy to understand and implement in practice, and parallelable. We first introduce these two methods, and then evaluate and compare their performance. Specifically, we will compare the number of tests needed and the false negative number under different parameter regimes of testing capacity and initial prevalence rate.

In the linear array group testing, suppose the total number of people to test is N and the desired group size is n (n < N ). Note that the corresponding pool size is also n. We will form N n linear array groups and perform group testing on each group. Figure 2 shows a linear array of size n = 5. Note that if N is not divisible by n, N n of the groups are of size n and the last group is of size N − N n n. If a group tests positive, we will test each one in the group individually. Without the follow-up diagnostic tests, the non-infected samples in the positive group will be deemed as infected, resulting in false positive detection. Notice that in the linear array method, we need two samples from each person. If only one sample is collected, it will be split into two sub-samples, of which one is used for group testing and the other for possible follow-up diagnostic test.

Another group testing method is square array group test used by [12] . Note that this method is almost the same as double pooling proposed by [9] . In the square array group testing, suppose the total number of people to test is N and we consider the square array of size n × n. Pools of size n are created from each row and each column. We will form N n 2 square arrays of size n × n. A sample is deemed as suspicious if both its row and its column pools test positive. See Figure 3 for a toy example. Before we form each group, we need to swab three samples from the tested individual. If we only have one sample for each, it will be split into three sub-samples. One is used for row testing, one is used for column testing, and the remaining one is used for possible follow-up diagnostic test, when the person is deemed as suspicious in group testing phase. For the remaining N − N n 2 n 2 people, we will conduct individual tests. The follow-up diagnostic test is indispensable since the square array test may result in false positive number (defined as the number of not-infected samples that test positive). To illustrate, consider the case where only the sample labeled by '1' and the sample labeled by '13' are infected in the 5 × 5 square array in Figure 3 . It is very likely that there are four pools that will test positive, namely the first and the third row pools, and the first and the third column pools. If we do not perform the follow-up diagnostic tests, samples labeled by '3' and '11' will be deemed as infected, though none of them are infected actually. We formally define the following notations:

• Total number of people to test: N .

• Initial prevalence rate: p 0 . It measures the probability that an individual is infected.

• Pool size in group testing: n. It is the number of samples that are pooled together in one test.

• Maximum pool size:n. For the linear array test,n = N . For the square array test,n = √ N .

• Optimal pool size: n * . n * ≤n. The pool size which minimizes the expected total false negative number, subject to the constraint of the expected number of total tests.

• Number of pools: M P (n). This is also the number of tests conducted in group testing phase. For the linear array test, the number of pools is N n . For the square array test, the number of pools is 2n N n 2 . • Group size in group testing: g(n). For the linear array test, g = n. For the square array test, g = n × n.

• False negative rate caused by dilution effect: γ(n, d). It is a function of pool size n and number of infected d in a pooled sample.

• Testing capacity: C. It is the maximum number of tests we can conduct.

• Randomness of the sample placements and pool positiveness: (ω, ξ) ∈ (R N × R M P ). Random variable ω represents the placement of infected samples. ω i = 1 means the i th sample is infected. Random variable ξ represents the positiveness of each pool, and hence ξ is dependent on ω. The reason we need the random variable ξ is that, given ω, we still cannot tell the test positiveness of a pool, because of the dilution effect. ξ i = 1 means pool i tests positive, ξ i = 0 means pool i tests negative.

• Number of tests conducted individually: M I (n, (ω, ξ)). Suppose we have a realization of random variables (ω, ξ). For the linear array test, it tells us the numbering of pools that test positive. For the square array test, it tells us the numbering of samples at the intersection of positive rows and columns. Construct a suspect set S that contains the above numbers. For linear array test, M I would be |S|. For square array test, M I would be |S| + (N − N n 2 n 2 ). • Number of follow-up diagnostic tests followed by a single group test: m(n, (ω, ξ)).

• Number of total tests: M (n, (ω, ξ)). M = M P + M I .

• Number of people that test positive: I(n, (ω, ξ)). Given the set S and ω, I = i∈S 1{ω i = 1}.

• Number of infected people: D. D follows a binomial distribution with parameters N and p 0 .

• false negative number: f (n, (ω, ξ)) = D − I.

• Value-at-Risk: VaR (·) [·]. VaR q [·] can used to measure the extreme quantile q. VaR q [Z] := inf{t : P(Z ≤ t) ≥ q}, where 0 < q < 1.

In the following sections, we assume that the events that each individual gets infected are mutually independent. In addition, we suppress the randomness (ω, ξ). We use superscripts 'L' for the linear array method, 'S' for the square array method, and 'B' for the benchmark of purely individual testing. For subscripts, 'I' stands for individual test, 'G' stands for group test. Finally, in order to find the optimal group size, we will first find the optimal pool size n * , and the optimal group size would be g(n * ). The following subsections will focus on finding the optimal pool size.

In this subsection, we evaluate the expected number of tests and the expected total false negative number in the linear array group testing method. We leave the details of derivation for the following to Appendix A.

Denote by E m L (n) the expected value of follow-up diagnostic tests for a single linear array group test of size n.

Denote by E M L (n) the total expected number of tests for the linear array method. Note E M L (n) is lower bounded by N n , the total number of groups.

Denote by E f L G (n) the expected false negative number for a single linear array group test of size n.

For the follow-up diagnostic tests following a single linear array group test of size n, the expected false negative number is

The expected total false negative number, including all follow-up diagnostic tests due to N n arrays of size n and one array of size (N − N n n), is

The closed-form expressions above show the expected values of the total number of tests and false negative number. To see the variability of these quantities, we run simulations to plot the empirical distributions, as shown in Figure 4 . 

In this subsection, we evaluate the expected number of tests and the expected total false negative number in the square array group testing method. We leave the details of derivation for the following to Appendix B.

Denote by E m S (n) the expected value of follow-up diagnostic tests for a single square array of size n × n.

As for the expected total number of tests, denote it by E M S (n) . Note that E M S (n) is lower-bounded by 2n N n 2 + (N − N n 2 n 2 ), which is the number of tests for all row and column groups as well as all individual tests for the remaining samples outside arrays.

Denote by E f S G (n) the expected false negative number for a single square array of size n × n.

Denote by E f S I (n) the expected false negative number for the follow-up diagnostic tests following a single square array of size n × n.

The expected total false negative number, including all tests due to the N n 2 square arrays and the remaining ones outside the arrays, is: Figure 5 shows the empirical distribution of the total number of tests and the distribution of the false negative number for the square array method. Notice that in all the group testing methods, the total expected number of tests and total expected false negative number depend on the pool size. For the purpose of comparing the methods, we will consider their total false negative number under the same testing capacity, in a stochastic programming formulation. 

In this subsection, we will measure the performance of group testing methods (linear array and square array) with respect to the total false negative number. The objective is to minimize the expected total false negative number, subject to the expected total number of tests not exceeding a given testing capacity. Recall that in Section 3, the total number of tests is M (n), the false negative number is f (n). With the same notations, we formulate the problem of selecting the optimal pool size as follows:

To solve (5), we make use of the closed-form expression for the expected total number of tests (i.e., (1), (3)) and find the feasible region of decision variable n, and evaluate the objective function using the closed-form expression for the expected total false negative number (i.e., (2) , (4)).

Under the above formulation, we compare the aforementioned two group testing methods against the benchmark individual testing. Fix the population size to be N = 10000. Under different values of the initial prevalence rate p 0 and Figure 6 : Comparison of false negative number between linear array group testing, square array group testing, and benchmark individual testing. under different settings of p 0 and C.

testing capacity C, we find the optimal pool size for each method and the corresponding total false negative number.

Since the expected total infected samples is the same for both group testing methods, higher expected false negative number also means higher false negative rate. As a benchmark, we randomly select C samples from the total N samples to conduct the individual testing, ignoring the rest. The expected false negative number of benchmark individual testing can be approximated by E f B ≈ (N − C)p 0 . We say a group test method is in its infeasible region if there is no pool size n such that the expected number of tests required is less than C. Figure 6 shows the expected false negative number under optimal pool size for all three methods, leaving the infeasible region blank. In summary, both methods perform better than the benchmark individual testing. The square array group testing has larger feasible region than the linear array group testing does, since the former requires less number of tests on average. However, within its feasible region, the linear array group testing has uniformly less false negative number, and thus smaller false negative rate, than the square array group testing. To see how the testing capacity C affects the performance of both group testing methods, we fix p 0 = 0.001 and show more details of how the test capacity affects the performance of testing result for all methods. Figure 7 provides the relationship between the pool size and the total number of tests, as well as the relationship between the pool size and the total false negative number for both methods. We obtain VaR 0.95 [·] (orange curves) by running simulations 5000 times, and obtain E[·] (blue curves) by expressions (1) to (4) . In the two graphs on the left side of Figure 7 , the pool sizes corresponding to the test numbers below the testing capacity C (i.e., the horizontal lines) are feasible. In the two graphs on the right side of Figure 7 , we find the minimal total false negative number for the feasible pool sizes. Take the square array method for example. Setting C = 300 gives optimal pool size n * = 100 and minimal mean total false negative number f * = 5.26. infeasible  infeasible  infeasible  infeasible  200 infeasible  infeasible  infeasible  infeasible  infeasible  300 infeasible  infeasible  infeasible  infeasible  infeasible  400 infeasible  infeasible  infeasible  infeasible  infeasible  500 infeasible  infeasible  infeasible  infeasible  infeasible  600 25 Table 3 : Benchmark individual testing. p 0 = 0.001 Table 1 and Table 2 summarize more details of testing results for the linear array group test and the square array group test, respectively. In general, if we increase the testing capacity, we will have more feasible solutions, and smaller mean total false negative number. Again, we calculate the benchmark false negative number by randomly selecting C from the N samples for benchmark individual testing, ignoring the rest. Table 3 shows the benchmark result for comparison. Intuitively, given the same pool size for the two group testing methods, the square array method will have larger false negative number because each sample in the square array method will be tested in both the row pool and the column pool. Because of the dilution effect, the probability that infected sample will be tested negative in the square array group test is relatively larger. In conclusion, if we are given a relatively large test capacity, we should use the linear array method. On the other hand, if we face the shortage of testing kits and only have a relatively small test capacity, we prefer using the square array group test. Remark 2. We also notice a widely applied group testing method called general binary search (GBS). GBS is an adaptive group testing method which takes a long time to conduct, and the number of swab samples from each individual is large and uncertain. Therefore, we leave the details of GBS to Appendix C for those who are interested. Remark 3. In case there are no closed-form expressions for the objective function and constraints, we can also apply sample average approximation (SAA) to both constraints and objective function, and solve the approximate deterministic problem in order to find the feasible region of decision variable n and evaluate the objective function to find the optimal pool size within the feasible region. We refer the readers to [24] and [25] for the application of SAA to stochastic programming. Remark 4. We also consider some other performance metrics with regard to the total false negative number and the total number of test, from the risk-averse perspective. For example, the following two sets of performance metrics can be applied.

In this section, we consider testing a large population in a closed community such as college or nursing home. Due to limited daily testing capacity, we can only test the whole population in a testing cycle of multiple days. The daily model in Section 3 chooses the optimal group size to minimize the group false negative number in a day, and the group false negative number will affect the number of people quarantined, which further impacts the prevalence rate at the next day. The influence will propagate to eventually impact the final prevalence rate. Therefore, we propose a testing-quarantine-infection model for this scenario. Before we describe this model, we first make the following assumptions:

• Among untested population, people are randomly chosen to have the test.

• We use a simplified model for the infection process, where prevalence increases exponentially without intervention. • We assume the tests are conducted in the morning, and results can be revealed immediately. Following test results, people who test positive are assumed to be quarantined, either by self-quarantine or hospitalization. Thus, those who test positive will be removed from the whole population in the morning right after the testing. • We assume that people are within a closed community with no infections being imported. As a consequence, once all infected individuals are quarantined, the pandemic ends. • We assume that there are no false positives.

We introduce more sets of notations. The first set of notations are the static quantities during the T -day testing:

• N total : total number of individuals in the closed community.

• T : length of time period for group testing.

• l: testing cycle length. Note that l ≤ T .

• α: daily growth rate of infection.

• p 0 : initial prevalence rate at the beginning of the time period.

• C: daily testing capacity.

Note that the last two notations are the same as those defined in Section 3. The next set of notations are quantities measured before the testing stage of each day:

• N test t : number of people to test in day t. For a testing cycle of l days for N total people, we have

infected ratio among those who haven't been tested.

We consider testing N total people in l days, and each day we test N test t people. At the beginning of day t, we randomly choose N test t people from those who have not been tested yet, and conduct group testing (linear array or square array methods) under a limited test capacity. After group testing, we mark those who already get tested, and quarantine those who test positive, and return the rest of the people back to the population. People who have been tested will not be tested again in this testing cycle. We divide one day into two stages, namely testing stage and infection stage. At the testing stage, people get tested and those who test positive will be quarantined accordingly. At the infection stage, people who are infected will continue to infect the susceptible people in the population.

The following set of notations are quantities measured before the testing stage of each day:

• γ t : group false negative rate calculated in day t. It is calculated as the false negative number divided by the number of infected people in the tested population in day t. • y T I t : increased number of individuals who are tested and infected at day t. • y T Q t : increased number of individuals who are tested, infected and quarantined at day t. • y T N Q t : increased number of individuals who are tested, infected but not quarantined at day t.

• y T N I t : increased number of individuals who are tested but not infected at day t.

The following set of notations are quantities measured after the testing stage of each day:

• N t : remaining population size, i.e., number of individuals that have not been quarantined after the day t; Note N 0 = N total . • Y T I t : number of individuals who are infected and have been tested at day t. Note that for the above notations with subscript t, a notation with t = 0 denotes the corresponding quantity at initial status, if applicable. The following algorithm shows the testing-quarantine-infection model, and outputs the optimal testing cycle length as well as the optimal pool size for each day within the cycle. Note that we use B(N, p) to denote the binomial distribution with parameter N and p. Details and analysis of the algorithm are presented in the following subsections.

Algorithm 1: Testing-quarantine-infection model. input :total population N total , initial prevalence rate p 0 , infection rate α, testing capacity C output :optimal testing cycle length l * , optimal pool size n * t , t = 1, · · · , l * for l ← 1 to T do for replication ← 1 to 100 dô

for t ← 1 to min(l,T ) do before the testing stage; solve (5) with input p t , N test t ,C; find the optimal pool size n * t and the optimal false negative rate γ t ; infected ratio among those not tested:

; tested, infected, and quarantined: ; T =T − t; end end record the final prevalence rate at the end of the given time period; end compute the average final prevalence rate at the end of the given time period; end return the optimal testing cycle length l * that yields the smallest average final prevalence rate; return the optimal group size n * t , t = 1, · · · , l * associated with the optimal testing cycle length;

Given all the pre-determined parameters, which are denoted precisely by the first set of notations in Section 4.1, we consider the factor that influences the prevalence rate. By the definition of p t , we have

With the testing-quarantine-infection model described in Section 4.1, we can analytically compute the final prevalence rate p T as follows:

Eq.6 shows that an increase in quarantining infected individuals, i.e., y T Q t for any t = 1, 2, · · · , T leads to a decrease in N T . Hence, both the first and the second terms in Eq.6 will increase. However, it can be shown that an increase in y T Q t will actually lead to a decrease in the final prevalence rate. To formalize this, let y T Q t = y T Q t + k t , t = 1, 2, · · · , T , where k t ∈ N and p T denote the corresponding final prevalence rate. Then we have

We leave the details of the proofs for Eq.7 to Appendix E. Eq.7 implies that it suffices to focus on y T Q t , t = 1, 2, · · · , T when considering which factors affect final prevalence rate p T . From the above argument, we know that the larger the number of quarantined, the lower the final prevalence rate will be. This result is consistent with the common sense in that the most effective approach for controlling the spread of virus is to quarantine the infected. Note that if testing cycle length l becomes smaller, the daily test number will have to increase, but the false negative rate γ t will increase as well. An increase in the testing number is likely to increase y T Q t , but an increase in the false negative rate decreases y T Q t . Formally, in one testing cycle for N t0 people(t 0 = 0, l, 2l, · · · , [ T l ]l):

Due to this trade-off between the number of tests and the false negative rate, it is important to select the optimal cycle length l to minimize the final prevalence rate. To this end, we compare different cycle lengths(l = 1, 2, · · · , T ) by simulating the above test-quarantine-infection model over a time period of T days. The details and results are described in the next subsection.

In this subsection, we use the square array method, since it uses less number of tests and has a relatively higher accuracy, according to our findings in Section 3.3. We also assume that before the testing, people go through pre-screening, such that the initial prevalence rate p 0 would be kept low (for example 0.1%).

To find the optimal testing cycle length, we compare the final prevalence rates of different testing cycle lengths by simulating the test-quarantine-infection model till the end of T days. When the testing cycle length l is less than T , we run multiple cycles till T days. For every cycle length l, we run 100 simulation replications and average the final prevalence rates over these replications. In an iteration of this T -day time period, if at some day the test capacity constraint can not be satisfied for all possible group sizes, then this iteration will not be recorded.

We set T = 7, N total = 10000, p 0 = 0.001, α = 1.26, C = 300. We consider testing cycle length l ranges from 1 to 7 to make sure at least one complete testing cycle can be done, which means everyone in the population is tested at least once. Given values of parameters N total , α, p 0 , C and l, the simulation is carried out in two stages for each day as mentioned before: the testing stage and the infection stage. We initialize each simulation run as follows:

During day t, 1 ≤ t ≤ 7, we first randomly select N test t individuals who have not been tested so far, then we apply square array tests on these N test t individuals with different pool size to obtain the optimal pool size n * , which minimizes the expected group false negative detection(as expressed in Eq.4) under the daily testing capacity C. The corresponding average number of total tests for pool size is calculated according to Eq.3. Based on the false negative rate, we obtain values of

In the infection stage, we sample newly infected individuals from those who have not been infected yet, based on the infection rate α. Then we update X T I t , X T N I t , X N T I t , X N T N I t and finally, p t . As a benchmark, we also simulate using the individual testing method, which randomly select and test C individuals each day from people who have not been tested so far.

We keep track of the prevalence rate each day within the 7-day time period. Figure 8 shows the prevalence rate under different cycle lengths l and that of the individual testing. The cycle length l = 1 is not feasible due to the daily testing capacity, so it is not included in Figure 8 . The estimated final prevalence rates, total number of quarantined individuals and total number of testing are listed in Table 4 . The optimal pool size and the number of tests each day are shown in Figure 9 . Table 4 : The estimated final prevalence rate, total number of quarantined, and total number of tests for square array group testing with different cycle length l and the benchmark individual testing. From Figure 9 , we can see that the square array method leads to a much lower final prevalence rate than the individual testing. Group testing with cycle length l = 2 or 3 lowers the prevalence rate over time, while cycle length 4 − 7 keeps the prevalence rate almost steady. In contrast, the prevalence rate gets out of control when individual test is used. In the current parameter setting, the optimal testing cycle length is 2 days, leading to a final prevalence rate 0.00022 at the end of 7 days, which is about 80% lower than the initial.

When l increases, the population get tested in each day N test t = N total l decreases. Therefore, the optimal pool size decreases as l increases, as the left of Figure 9 suggests. Compared with the individual testing, group testing uses much less number of tests each day. Table 4 shows that the number of tests is minimized when l = 4. We notice that there are sudden increases of number of tests at day 3 for l = 2, and at day 5 for l = 4. These two jumps are corresponding to the increase of the optimal pool size. The reason for this kind of jump is that when determining the optimal pool size n * under a given test capacity, the feasible region for the pool size n contains discrete integers. Hence the optimal pool size may jump from one integer to another due to some tiny change of the prevalence rate from day to day. In the left of Figure 10 and table 4, we see that the number of newly quarantined individuals each day decreases as the testing cycle length l increases. Recall that the final prevalence rate is also monotonically increasing with respect to l, we confirm our analysis in Section 4.2 that an increase in the number of quarantined each day, y T Q t , leads to a decrease in the final prevalence rate. Furthermore, it is interesting to observe that the optimal cycle length of 2 days initially has the largest false negative rate because of the large pool size, but the false negative rate drops quickly over time and becomes comparable with other cycle lengths. Under current parameter setting, it turns out that when l is small, the larger tested population offsets the disadvantage brought by higher false negative rate, leading to the result that more infected individuals get quarantined and hence less virus carrier in the population. We conduct sensitivity analysis on the simulation output with respect to three key parameters p 0 , α, C, and our decision, the optimal cycle length l. We test for the cases p 0 = 0.0005, 0.0015, α = 1.16, 1.36, and C = 200, 400, respectively. The results for final prevalence rates are shown in Figure 11 , 12, 13. It turns out that small changes of p 0 and α bring significant changes to the final prevalence rate. In contrast, it seems that varying C slightly only has a small impact on the prevalence rate. This is because there exist several ranges of values of C such that we have the same optimal pool size within the range, as it is shown in Table 2 From the above results, it is interesting to see that the optimal cycle length seems to be robust with respect to the parameters p 0 , α, C. In all scenarios that we simulate, it turns out that l = 2 is always the optimal cycle length for controlling the prevalence rate. Moreover, among all the feasible cycle lengths, a smaller optimal cycle length always have a better performance on controlling the virus than a bigger one.

Based on these observations, we have two remarks for the robustness of the optimal pool size with respect to p 0 , α, C. Remark 5. Though the false negative rate γ t is relatively high when l = 2, the testing population size N test t each day is large. The large testing population size will dominate the number of quarantined each day. In Figure 14 , although the number of newly quarantined when l = 2 is not always the largest, it is common that the number of newly quarantined in the first couple of days is the highest among all scenarios, and more quarantined individuals in the beginning will be helpful to control the spread of virus. Remark 6. We note that even though the false negative rate is higher when l = 2, the infected individuals that are deemed to be negative have more opportunities to be correctly diagnosed later on. Specifically, if an infected individual tests negative in the first run, it will have the chance of getting retested for three times when l = 2. However, it will have at most two more tests when l = 3. In the worst case, it will no longer have opportunity to get more test when l = 7. Hence, with a smaller testing cycle length, the false negatives have a higher chance of being corrected in later runs of the testing procedure, which leads to more quarantined individuals, hence lower final prevalence rate.

We consider group testing for a relatively large community during the COVID-19 pandemic. In particular, two non-adaptive group testing methods are considered, namely linear array and square array methods. We take into consideration the dilution effect that will increase the false negative rate in the group testing, and derive the optimal pool size that minimizes the daily total false negative number, under a constraint of testing capacity. In addition, we incorporate the daily model into a testing cycle, propose a testing-quarantine-infection model, and compute the optimal testing cycle length that minimizes the final prevalence rate at the end of the given time period. We find that under certain parameter setting, the shorter the testing cycle length is, the more infected people will be quarantined, and it leads to a lower final prevalence rate in spite of the increased false negative number. The sensitivity analysis shows that the simulation output is sensitive to the infection rate and the initial prevalence rate, while less insensitive to the testing capacity within a certain range.

The testing protocol is summarized as follows.

Algorithm 2: Testing protocol. Input: total population N total , estimate of initial prevalence rate p 0 , estimate of infection rate α, testing capacity C; Run the testing-quarantine-infection model (i.e., Algorithm 1); Output: optimal testing cycle length l * , optimal pool size n * t , t = 1, · · · , l * ; Test the total population in l * days, hence test N total l * in each day; while t ≤ l * do

In day t, take three swabs from each individual; Form N total l * (n *

The average number of tests for a single group of size n is E m L (n) . 

The total expected number of tests for the linear array group test is:

The average number of false negatives for a single group test of size n is E f L G (n) . 

The total expected number of false negatives for the linear array group test is:

B Derivation for the Square Array Group Test

The average number of tests for a single group of size n is E m S (n) . 

Where: 

P(A :,j tests positive | d of A :,j contain virus) 

The total average number of false negatives for square array test is:

General binary search (GBS) is the generalization of the binary splitting procedure (BSP). BSP aims to identify exactly one of the infected people in total sample N . BSP algorithm can be found in Appendix C.1. GBS simply attempts to perform the BSP δ times to identify at most δ infected samples in a given population of size N . Note that δ plays the similar role as pool size n in the square array test. With small δ we may underestimate the number of infected samples. In such case, GBS stops searching after finding δ infected samples, causing false negative in group test. We recommend to test the remaining group at the end of GBS. If we already find δ infected samples and the remaining group tests positive, we should conduct individual tests for the remaining group. If δ is too large, we would have small pool size in GBS test, which leads to inefficient binary searching but also small false negative rate in group test. Since the closed form solution for the expected total number of test needed for GBS is hard to derive, we resort to Monte Carlo simulation. The distribution of the total number of tests and the distribution of the false negative number are shown in the first two graphs of Figure 15 . The relationship between δ and the total number of tests, as well as the total false negative number, is shown in the last two graphs of Figure 15 . As δ increases, the total number of tests will increase and the false negative number will decrease, since the pool size will be smaller. GBS performs badly in our setting, due to the high splitting dilution effect, since we need to split each sample into C sub-samples(It is impossible to collect C sample from the subject all at once). In addition, GBS is adaptive, and we have to wait until the previous test result is revealed to continue the whole procedure. The long time it takes to conduct GBS group test makes it hard to implement in practice. Update N = N − |G| ( |G| is the number of uninfected items in G , remove these from the pool); end end if N > 0 and δ > 0 then Test the N samples individually; end

In Section 2 we mentioned a new proposed model about the false negative induced by pooling proposed by [10] . Here we briefly introduce some detailed information about this model. First, they examine how the RT-qPCR test is conducted.

• The sample is treated to transcribe the target RNA sequence into DNA sequence.

• The sample is placed in a PCR machine, which can measure the concentration by making the target DNA fluorescent. • A reaction is made to approximately double the DNA sequence. This is called a cycle of amplification.

• A time series of the concentration of DNA over time is recorded and then a linear regression is applied on this data. The initial DNA concentration is the linear regression value at the origin. • The return value of the test corresponds to − log 2 of the initial number of viral DNA in the sample.

Denote by C t the − log 2 of C, the number of virus DNA contained in a sample, which is interpreted as the number of amplification cycles needed to make the intensity reach a threshold. Besides, there is a parameter d cens , which is called the limit of detection, meaning that one positive sample will not be detected if its C t value is no less than d cens . Compared with the single individual case, the intensity curve of the pooled case is pushed rightward. Consequently, the number of amplification cycles to reach the threshold becomes large, which is more likely to be overpass the limit of detection, and that is why pooling operations will increase the false negative rate from the perspective of methodology.

Next, C t is modeled as a random variable X. Based on the re-simulated data derived by the clinical data in [26] , the distribution of X is established as follows, which is referred to as mixture model.

where the parameters are estimated as d cens = 37.2, π 1 = 0.33, π 2 = 0.54, π 3 = 0.13; µ 1 = 20.13, µ 2 = 29.41, µ 3 = 34.81; σ 1 = 3.60, σ 2 = 3.02, σ 3 = 1.31. {f µ k ,σ k (x), k = 1, 2, 3} are the probability density function (PDF) of Gaussian distribution N (µ k , σ 2 k ), and {F µ k ,σ k (x), k = 1, 2, 3} are the CDF of Gaussian distribution N (µ k , σ 2 k ). Note that here d cens = 37.2 is corresponding to the maximum of the re-simulated data of C t , meaning that the false negative rate caused by taking swab(i.e., the first source) is neglected.

With the model for random variable X, the false negative rate induced by pooling with N individuals can be deduced. It is assumed that there is one infected individual who is pooled with other N − 1 negative individuals. The virus of that infected individual need to carry at least N 2 −dcens virus DNA to allow it to be detected even if it is diluted in a group with pool size N . Denote by γ the false negative rate, it is estimated as γ = 1 − P (C ≥ N 2 −dcens ) = 1 − P (− log 2 C ≤ d cens − log 2 N ) = 1 − P (C t ≤ d cens − log 2 N )

For a given group test, if the number of infected individuals is greater than 1, we re-scale the pool size to treat it as if it has only one infected individual. For instance, if we have pool size N but with δ infected individuals among them, then we treat the case as there is one infected individual in N D individuals. The reason that we can apply re-scaling is, in RT-qPCR test, the total volume for the sample get tested is a pre-determined constant. In other words, no matter how large is the group size and how many infected individuals, the volume gets tested are unchanged. From the analysis above, we know that C t is determined by the viral load in the tested sample. Therefore, different test pools will have the same number of viral load as long as the ratio of infected individuals to the pool size is the same. E Analysis for p T with respect to y T Q t In Section 4.2 we establish the relationship between the initial prevalence rate p 0 and final prevalence rate p T . Furthermore, we deduce that the increase of y T Q t will reduce p T . We leave the detailed proof here.

First we show the Eq.6: = α(X T I t−1 + X N T I t−1 − y T Q t ) Therefore,

Let q t = y T Q t Nt−1 and apply Eq.9, we will obtain

Hence the final prevalence rate p(T ) can be expressed as

Notice that for t = 1, 2, · · · , T , we have T j=T −t+1

Bring Eq.11 into Eq.10 and we simplify Eq.10 into

which is exactly Eq.6.

For the Eq.7, first we recall a lemma that if x = A B ∈ (0, 1) and C ≥ D ≥ 0, then we have

if B − D > 0. In our case, notice that

Notice that

Also, N T − T t=1 k T −t+1 > 0 holds as well since the sum of the increased number of quarantined individuals can not be greater than the number of non-quarantined individuals at the end of day T in the original case(without increase of y T Q t ). Applying the above-mentioned lemma, we have p T ≤ p T which completes the proof of Eq.7.

The detection of defective members of large populations

Boolean compressed sensing and noisy group testing

Boolean compressed sensing: LP relaxation for group testing

Search for sparse active inputs: A review

Group testing: An information theory perspective

Analysis and applications of adaptive group testing methods for COVID-19. medRxiv

A comparison of group testing architectures for COVID-19 testing

Group testing against COVID-19. Working Papers 2020-02

A note on double pooling tests

Group testing as a strategy for the epidemiologic monitoring of COVID-19

Group testing with nested pools

COVID-19 mathematical modeling for cornell's fall semester

Pooling of samples for testing for SARS-CoV-2 in asymptomatic people. The Lancet Infectious Diseases

On accelerated testing for COVID-19 using group testing

Group testing for coronavirus-called pooled testing-could be the fastest and cheapest way to increase screening nationwide

Safer reopening will require millions more Covid-19 tests per day. One solution: 'pool testing

Group testing for COVID-19: How to stop worrying and test more

Conservative two-stage group testing

On the cut-off point for combinatorial group testing

Individual testing is optimal for nonadaptive group testing in the linear regime

Group testing with a dilution effect

Refinement of a viral transmission risk model for blood donations screened by nat in different pool sizes and repeat test algorithms

A methodology for deriving the sensitivity of pooled testing, based on viral load progression and pooling dilution

The sample average approximation method for stochastic discrete optimization

Sample average approximation of expected value constrained stochastic programs

An analysis of SARS-CoV-2 viral load by patient age. medRxiv

The authors gratefully acknowledge the support by the National Science Foundation under Grant CAREER CMMI-1453934 and the Air Force Office of Scientific Research under Grant FA9550-19-1-0283. The authors would like to thank Professor Peter Frazier from Cornell University for insightful discussions and feedback.