key: cord-0990802-dw6nt84r authors: nan title: Hierarchical Pooling Strategy Optimization for Accelerating Asymptomatic COVID-19 Screening date: 2020-11-06 journal: Ieee Open Journal of the Computer Society DOI: 10.1109/ojcs.2020.3036581 sha: f08ee72fe6952422ee4b99775c44f2bcef9e7fff doc_id: 990802 cord_uid: dw6nt84r Testing has been a major factor that limits our response to the COVID-19 pandemic. The method of sample pooling and group test has recently been introduced and adopted. However, it is still not clearly known how to determine the appropriate group size. In this paper, we treat asymptomatic COVID-19 screening acceleration as an optimization problem, and solve the problem using an analytical approach and an algorithmic procedure. We develop a two-level hierarchical pooling strategy for accelerating asymptomatic COVID-19 screening. In the first level, a population is divided into groups, which results in inter-group acceleration. In the second level, a group is divided into subgroups, which results in intra-group and inter-subgroup acceleration. By using our analytical methods and numerical algorithms, we determine the optimal group size and the optimal subgroup size, which minimize the total number of tests, maximize the speedup of the hierarchical pooling strategy, and minimize both time and cost of testing. It is discovered that the optimal group size and the optimal subgroup size are determined by the fraction of infected people. Furthermore, the optimal group size, the optimal subgroup size, and the achieved speedup grow sublinearly with the reciprocal of the fraction of infected people. Our research has important social implications and financial impacts. For example, if the fraction of infected people is 0.01, by using group size of 25 and subgroup size of 5, we can achieve speedup of at least 11, which means that months of testing time can be reduced to days, and over 91% of the testing cost can be saved. Such results have not been available in the known literature. The paper makes significant progress and great advance in pooling strategy optimization for accelerating asymptomatic COVID-19 screening, and represents the contribution of computer science to the global pandemic. A coronavirus test requires a number of time consuming steps in the laboratory, which can take several hours. Testing has been a major factor that limits our response to the COVID-19 pandemic [1] . As governments reopen more businesses and public spaces, the number of infected people will surge, especially when there are asymptomatic people [2] . The method of sample pooling and group test has recently been introduced [7] , [8] and adopted [4] , [5] . The strategy involves pooling samples from multiple people. If the test result of a group of k (k ≥ 2) samples is negative, we know that all the individual samples are negative. If the test result of a group of samples is positive, then the individual samples need to be tested one by one. If the percentage of infected people is low, this pooling method can potentially significantly reduce the required number of tests and substantially save the necessary cost of tests. For example, recently, the City of Wuhan successfully screened 300 asymptomatic individuals from 9,899,828 people in only 19 days (May 14 -June 1, 2020), by using the pooling method with group size of five, involving 63 testing laboratories, 1,451 scientists and professionals, and 701 examination equipments (24 hours a day without interruption), and reaching a peak testing capacity of 1 million per day 1 However, it is still not clearly known how to determine the appropriate group size, although some attempt has been made. For instance, it has been recommended that the batch size should be powers of two [9] , which depends on the frequency of positive samples out of all samples. It is clear that the choice of the best group size can reduce the time and cost of testing to the maximum extent, and therefore, will have tremendous practical impact to COVID-19 detection, prevention, response, and control. In this paper, we treat asymptomatic COVID-19 screening acceleration as an optimization problem, and solve the problem using an analytical approach and an algorithmic procedure. The main contributions of the paper can be summarized as follows. r We develop a two-level hierarchical pooling strategy for accelerating asymptomatic COVID-19 screening. In the first level, a population is divided into groups, which results in inter-group acceleration. In the second level, a group is divided into subgroups, which results in intragroup and inter-subgroup acceleration. r By using our analytical methods and numerical algorithms, we determine the optimal group size and the optimal subgroup size, which minimize the total number of tests, maximize the speedup of the hierarchical pooling strategy, and minimize both time and cost of testing. r It is discovered that the optimal group size and the optimal subgroup size are determined by the fraction of infected people. Furthermore, the optimal group size, the optimal subgroup size, and the achieved speedup grow sublinearly with the reciprocal of the fraction of infected people. r Our research has important social implications and financial impacts. For example, if the fraction of infected people is 0.01, by using group size of 25 and subgroup size of 5, we can achieve speedup of at least 11, which means that months of testing time can be reduced to days, and over 91% of the testing cost can be saved. Such results have not been available in the known literature. The paper makes significant progress and great advance in pooling strategy optimization for accelerating asymptomatic COVID-19 screening, and represents the contribution of computer science to the global pandemic. (Note: This paper is a substantially extended version of an earlier work reported in [6] , where only a one-level pooling strategy was studied.) In Section II, we consider inter-group (i.e., group level) acceleration, and find the optimal group size. In Section III, we consider intra-group (i.e., subgroup level or inter-subgroup) acceleration, and find the optimal subgroup size for a given group size. In Section IV, we conduct joint optimization for 1 http://www.xinhuanet.com/local/2020-06/03/c_1126066386.htm both inter-group acceleration and intra-group acceleration, and simultaneously find the optimal group size and the optimal subgroup size. In Section V, we address some practical issues. In Section VI, we conclude the paper. In this section, we develop our method to find the optimal group size by using inter-group acceleration. We define the following quantities. r p 0 : the probability that one individual test result is positive; r q 0 : the probability that one individual test result is negative; r p 1 : the probability that one group test result is positive; r q 1 : the probability that one group test result is negative. The value of p 0 is given and known in advance. It is clear that For a single group of samples, if the test result of the group is negative (which happens with probability q 1 ), only one test is enough; if the test result of the group is positive (which happens with probability p 1 ), (k + 1) tests are required, one for group test, and k for individual tests. Hence, the expected number of tests for one group using the pooling method is The total number of tests for a population of N (which is divided into N/k groups and N k) is Since the number of tests without using pooling is N, the speedup of the pooling strategy is Our objective is to maximize the speedup. It is clear that maximizing S(k) is equivalent to minimizing To have ∂F (k)/∂k = 0, we need Unfortunately, it is not clearly known how to find an analytical and closed-form solution to the above equation of k at this stage. We now develop a numerical algorithm to find k. We define Our purpose is to solve the equation which implies that G(k) is a concave function. Figure 1 illustrates G(k) for p 0 = 0.1. It is observed that G(k) is an increasing function of k when k is less than 35, and G(k) is a decreasing function of k when k is greater than 35. In other words, there are two solutions to the equation G(k) = 0. One is between 3 and 4, and the other is between 54 and 55. Figure 2 illustrates the speedup S(k) for p 0 = 0.1. It is observed that as k increases, S(k) increases and reaches its maximum value at k = 4, and then decreases. However, when k exceeds 55, S(k) increases again; nevertheless, the increment is very little and not noticeable. Furthermore, the speedup beyond k = 55 is less than 1, i.e., the pooling method is not effective any more. Therefore, we only need to find the smaller solution of k. Our numerical procedure to find k which satisfies G(k) = 0 is essentially the standard bisection method (which is described in [3] , p. 22), based on the observation is that G(k) is an increasing function of k around the smaller solution of k. Since the k found is a real value, we round it to the nearest integers, i.e., the optimal group size is k * = k . We say that the pooling method is effective, if there is at least one k ≥ 2, such that S(k) > 1; and that the pooling method is ineffective, if S(k) ≤ 1 for all k ≥ 2. Intuitively, if p 0 is too big, the pooling method becomes ineffective. Let p * 0 be the largest value of p 0 such that the pooling method is effective. Using numerical verification, we can find that the pooling method is effective when p 0 = 0.306 (with k = 3 and S(3) = 1.00092) and ineffective when p 0 = 0.307. Therefore, we can confirm that p * 0 is in the range (0.306,0.307). In Figure 3 , we show the speedup as a function of the group size for p 0 = 0.001, 0.002, 0.003, ..., 0.010. It is observed that as k increases, S(k) increases significantly, especially when p 0 is small; however, beyond certain point, S(k) decreases noticeably. Hence, there is an optimal choice k * , such that the speedup is maximized. In Table 1 , we demonstrate the optimal group size k * obtained by our numerical algorithm for p 0 = 10 −1 , 10 −2 , 10 −3 , ..., 10 −7 . We have the following important observations. r As p 0 becomes smaller, the probability q 1 = q k * 0 that a group test result is negative becomes higher. For instances, when p 0 = 0.01, the chance for a negative group test result is q 11 0 = 0.99 11 = 0.8953382. When p 0 = 0.001, the chance for a negative group test result is q 32 0 = 0.999 32 = 0.9684911. Such higher chance will balance the potential higher cost for individual tests in case a group test result is positive. r As p 0 decreases, the optimal group size and the achieved speedup increase rapidly. In particular, we have for That is, both k * and S(p 0 ) grow sublinearly with 1/p 0 , a quite impressive and nontrivial result. It is worth to mention that the optimal group size k * is determined by the fraction p 0 of infected people and independent of the size N of the population, since the equation G(k) = 0 only involves q 0 (actually p 0 ), not N. One important (and surprising) observation from Table 1 is that the achieved speed is approximately (and a little bit less than) k * /2, that is, S(k * ) ≈ k * /2. Equivalently, for k * , the expected number of tests for one group is approximately 2. This gives us an opportunity to derive a closed-form expression of k * . Let us consider the equation that is, Let x = q k 0 . Then, we have Since k * needs to be an integer, we set ( z means the nearest integer of z) which has been verified to be consistent with Table 1 . In this section, we develop our method to find the optimal subgroup size by using intra-group acceleration. The basic idea of intra-group acceleration is to divide a group into subgroups. When the test result of a group is positive, the group of size k is divided into subgroups of size m. For each subgroup, if the test result of the subgroup is negative, only one test is enough; if the test result of the subgroup is positive, (m + 1) tests are required, one for subgroup test, and m for individual tests. By using this method, the original k tests for individual samples can possibly be reduced. This method is more effective for small p 0 and large k. To develop and analyze intra-group acceleration, we define the following quantities. r p 2 : the probability that one subgroup test result is positive under the condition that the test result of a group is positive. r q 2 : the probability that one subgroup test result is negative under the condition that the test result of a group is positive. Note that both p 2 and q 2 are conditional probabilities. It is clear that p 2 = 1 − q 2 , and where p 1 is the probability that one group test result is positive (i.e., the condition), q m 0 is the probability that all samples in a subgroup are negative (i.e., one subgroup test result is negative), and (1 − q k−m 0 ) is the probability that at least one of the remaining (k − m) samples in the same group is positive (to keep the condition). Under the condition that the test result of a group is positive, the expected number of tests for one subgroup using the pooling method is Under the condition that the test result of a group is positive, the expected number of tests for one group using intra-group acceleration is Since the number of tests for one group without intra-group acceleration is k, the speedup of the intra-group acceleration method is To maximize S group (m), we need to minimize To have ∂F (m)/∂m = 0, we need Again, there is no analytical and closed-form solution to the above equation of m at this stage. However, m can be obtained by a numerical algorithm (i.e., the standard bisection method) based on the observation is that is an increasing function of m. We use m * to represent the solution to the equation G(m) = 0 rounded to the nearest integer. The expected number of tests for one group using intragroup acceleration is where T group can be more accurately expressed as The total number of tests for a population of N (which is divided into N/k groups) using both inter-group acceleration and intra-group acceleration is The speedup of the pooling strategy with both inter-group acceleration and intra-group acceleration is In Table 2 , we demonstrate the optimal subgroup size m * (given the optimal group size k * of Section II) obtained by our numerical algorithm for p 0 = 10 −2 , 10 −3 , ..., 10 −7 . We have the following important observations. r It is observed that by using the method of intra-group acceleration, the expected number T group of tests for one group is noticeably reduced and noticeable speedup S group (m * ) within a group can be obtained. Of course, such speedup is gained with probability p 1 , i.e., when the test result of a group is positive. r The speedup S(k * , m * ) of the pooling strategy with both inter-group acceleration and intra-group acceleration is noticeably improved. Furthermore, as p 0 becomes smaller, the ratio S(k * , m * )/k * increases, which means that S(k * , m * ) is closer to k * , and the speedup can almost be doubled (compared with Table 1 ). r It is worth to mention that when k * is too small (e.g., k * = 4 for p 0 = 10 −1 ), the method of intra-group acceleration is not effective and does not lead to fewer number of tests. One important observation from Table 2 is that the speedup of the intra-group acceleration method is approximately m * /2, that is, S group (m * ) ≈ m * /2. This gives us an opportunity to derive a closed-form expression of m * . Let us consider the equation Let x = q m 0 . Then, we have Since m * needs to be an integer, we set which has been verified to be consistent with Table 2 . Our optimal group size k * in Section II is obtained based on the assumption that the method of intra-group acceleration is not used. With the reduced number T group of tests for one group by using intra-group acceleration, it is likely that k * can be increased, which creates more room for improving the speedup. The increased group size certainly affects the choice of the optimal subgroup size m * . Fortunately, based on the analytical expressions of m * in Section III, it is possible to conduct joint optimization for both inter-group acceleration and intra-group acceleration, i.e., to simultaneously find the optimal group size and the optimal subgroup size, when both inter-group acceleration and intra-group acceleration are involved. Recall that and T group (k) = q k 0 + (T group (k) + 1)(1 − q k 0 ), and where S(k, m(k)), as well as m(k), T group (k), and T group (k) are all viewed as functions of k. In Figure 4 , we display the speedup S(k, m(k)) for p 0 = 0.01. It is clear that S(k, m(k)) is a concave function of k, and there is an optimal choice of k = 25, which maximizes S(k, m(k)). To maximize S(k, m(k)), we need ∂S(k, m(k))/∂k = 0, where ∂S(k, m(k)) ∂k and 4. Speedup S(k, m(k) ) vs. group size k (p 0 = 0.01). The equation ∂S(k, m(k))/∂k = 0 can be solved by using the standard bisection method, based on the observation is that ∂S(k, m(k))/∂k is a decreasing function of k. Once the optimal group size k * is available, the corresponding optimal subgroup size m(k * ) and the speedup S(k * , m(k * )) can be calculated easily. In Table 3 , we demonstrate the optimal group size k * , the optimal subgroup size m(k * ), and the speedup S(k * , m(k * )) for p 0 = 10 −1 , 10 −2 , 10 −3 , ..., 10 −7 . We have the following important observations. r The optimal group size k * is significantly greater than the optimal group size of inter-group acceleration (see Table 1 ). In particular, we have for p 0 = 10 −r , k * ≥ (25/16)4 r = 1.5625 × 4 log 10 (1/p 0 ) = 1.5625(1/p 0 ) log 10 4 = 1.5625(1/p 0 ) 0.602 . That is, the optimal group size grows sublinearly with 1/p 0 . r The optimal subgroup size m(k * ) is noticeably greater than the optimal subgroup size of intra-group acceleration (see Table 2 ). In particular, we have for p 0 = 10 −r , m * ≥ 2 r = 2 log 10 (1/p 0 ) = (1/p 0 ) log 10 2 = (1/p 0 ) 0.301 . That is, the optimal subgroup size grows sublinearly with 1/p 0 . r The achieved speedup S(k * , m(k * )) with joint optimization for both inter-group acceleration and intra-group acceleration is significantly greater than the speedup of intra-group acceleration with given and fixed group sizes (see Table 2 ). The most important factor that determines the gap is the optimality of k * . Furthermore, if the speedup is regarded as a function S(p 0 ) of p 0 , as r increases, we have S(10 −(r+1) )/S(10 −r ) > 4, and for p 0 = 10 −r , we have S(10 −r ) > (11/16)4 r = 0.6875 × 4 log 10 (1/p 0 ) = 0.6875(1/p 0 ) log 10 4 = 0.6875(1/p 0 ) 0.602 . That is, the speedup grows sublinearly with 1/p 0 . Also notice that S(k * , m(k * )) is approximately (and a little bit less than) k * /2 for small p 0 . r It is worth to mention that such two-level joint optimization is effective and applicable to any p 0 . We would like to mention the following issues related to the applicability of the research in this paper. r Availability of p 0 -In this paper, it has been assumed that the value of p 0 , i.e., the fraction of infected people, is available in advance. In reality, the value of p 0 can be estimated accurately by testing a group of n random samples, where n is reasonably large but still much less than the population size N, so that the time spent to obtain p 0 is negligible and does not reduce the speedup too much. r Independence of Samples -In this paper, it has been assumed that individual sample test results are independent of each other. In reality, there might be correlation among individual sample test results. Such correlation exists for people from the same family, the same company, the same school, and so on. One effective way to reduce the impact of sample correlation is to randomize the samples, so that people from the same social group are not tested together. Theoretically, the impact of such dependency on our methods, analysis, and algorithms needs deeper investigation. r Limitation on Group Size -In this paper, it has been assumed that the group size and the subgroup size can be arbitrarily large. In practice, there can be limitation on the number of samples that can be combined into one test. The impact of such restriction on the performance of our hierarchical pooling strategy deserves more careful study. r Applications in Real Testing -Although we believe that our hierarchical pooling strategy can be readily applied to accelerating asymptomatic COVID-19 screening of any scale, it is still exciting to actually use our methodology in a real community, city, or country. However, such effort which involves joint endeavor of social, medical, and governmental agencies, is certainly beyond the scope of this paper. r Generality of Our Strategy -Although our hierarchical pooling strategy has been developed for asymptomatic COVID-19 testing, we believe that our general-purpose analytical methods and numerical algorithms are also applicable to accelerating the testing of other deceases that have already been existing or may appear in the future. We have developed a two-level hierarchical pooling strategy for accelerating asymptomatic COVID-19 screening. We have also been able to determine the optimal group size and the optimal subgroup size, which minimize the total number of tests, maximize the speedup of the hierarchical pooling strategy, and minimize both time and cost of testing. It is found that the optimal group size, the optimal subgroup size, and the achieved speedup grow sublinearly with the reciprocal of the fraction of infected people. Our method is effective in supporting faster and cheaper asymptomatic COVID-19 screening. There are further research directions. One challenge is to derive a closed-form expression of the optimal group size for our two-level hierarchical pooling strategy. For another further investigation, we notice that the hierarchical testing system in this paper has only two levels, i.e., group and subgroup. It is interesting to consider a hierarchical acceleration system with more levels (e.g., for very small p 0 and not too small m(k * )), in which, there are groups, which are divided into subgroups, which are further divided into sub-subgroups, and so on, with group level, subgroup level, and sub-subgroup level acceleration. For such a multi-level testing system, it is necessary to determine the optimal group, subgroup, and sub-subgroup sizes. It is conceivable that such a multi-level acceleration mechanism is more powerful and more effective in producing higher speedup. We believe that the analytical approach and algorithmic procedure developed in this paper can be extended towards this direction. Pooling samples could accelerate new coronavirus testing Coronavirus test shortages trigger a new strategy: Group screening Numerical Analysis Statistical methods for batch screening of input populations by stage and group in COVID-19 nucleic acid testing FAST: A flexible, accurate and speedy test for COVID-19 Pooling strategy optimization for accelerating asymptomatic COVID-19 screening Pooling of samples for testing for SARS-CoV-2 in asymptomatic people Virologists show that sample pooling can massively increase coronavirus testing capacity Efficient and practical sample pooling for high-throughput PCR diagnosis of COVID-19 The author would like to thank the three anonymous reviewers for their constructive comments. Special thanks are due to Dr. Andrew Li of Carnegie Mellon University and Mr. Tiansheng Huang of the South China University of Technology for inspiring discussion on the derivation of q 2 and the value of p * 0 . This research was supported in part by the Hunan University Coronavirus Disease Special Research Project. A preliminary version of the paper was posted on Research Square, June 9, 2020.