key: cord-0486864-pynkmq0v authors: Srinivasavaradhan, Sundara Rajan; Nikolopoulos, Pavlos; Fragouli, Christina; Diggavi, Suhas title: Improving Group Testing via Gradient Descent date: 2022-01-28 journal: nan DOI: nan sha: 64705024f83534f6908ca25bacb208449821fd0d doc_id: 486864 cord_uid: pynkmq0v We study the problem of group testing with non-identical, independent priors. So far, the pooling strategies that have been proposed in the literature take the following approach: a hand-crafted test design along with a decoding strategy is proposed, and guarantees are provided on how many tests are sufficient in order to identify all infections in a population. In this paper, we take a different, yet perhaps more practical, approach: we fix the decoder and the number of tests, and we ask, given these, what is the best test design one could use? We explore this question for the Definite Non-Defectives (DND) decoder. We formulate a (non-convex) optimization problem, where the objective function is the expected number of errors for a particular design. We find approximate solutions via gradient descent, which we further optimize with informed initialization. We illustrate through simulations that our method can achieve significant performance improvement over traditional approaches. Group testing has recently attracted significant attention in the context of COVID ( [1]- [6] ), and several countries (including India, Germany, US, and China) have already deployed preliminary group-testing strategies ( [7] , [8] ). Group testing has a rich history in academia and a number of variations and setups have been examined so far ( [9] - [12] ). Simply stated, group testing assumes a population of N individuals out of which some are infected, and the goal is to design testing strategies and corresponding decoding algorithms to identify the infections from the test results. Most works revolve around proposing a particular hand-crafted test design (e.g. random Bernoulli design) coupled with a decoding strategy (e.g. Definite Defectives, Definite Non-Defectives), and guarantees are provided on the number of tests required to achieve vanishing probability of error. Additionally, order-optimality results have been proved for the asymptotic regime, where the population size tends to infinity. To the best of our knowledge, the following complementary question remains unexplored: Given a fixed decoding strategy and a given number of tests T (perhaps smaller than what is needed to achieve zero error), what is the best test design one may use? We examine this question in the context of nonadaptive group testing, and under the assumption of a Definite Non-Defectives (DND) decoder, which eliminates false negatives by construction. 1 In this paper, we show that the above problem can be formulated as a non-convex continuous optimization problem. More specifically, the problem requires finding a test-design matrix G that minimizes the expected number of erroneous identifications (i.e. false positives). This, however, presents two challenges: (a) the analytical computation of the expected number of false positives turns out to be computationally difficult; and (b) because G ∈ {0, 1} T ×N , we are faced with a combinatorial optimization problem. To address these challenges, we proceed as follows: First, we provide a lower bound on the expected number of errors, which we use as a proxy in the optimization problem; that bound can be computed in O(N 2 ) runtime. We then relax the combinatorial optimization problem based on an equivalence result; the objective function in that relaxed formulation as well as its gradient can be computed in O(N 2 ), thus enabling the use of Gradient Descent (GD). To further improve the performance of our method, we propose two approaches to GD: (i) an informed initialization with information from classic test designs, such as the Constant Column Weight (CCW) design and the Coupon Collector Algorithm (CCA); (ii) a stochastic re-initialization of the state of the solution every few gradient iterations (e.g. 100 iterations), in a way that allows GD to explore across various neighborhoods, while also ensuring that the objective value does not increase by much with each re-initialization. Numerical evaluations show that the GD based approaches can significantly outperform classical test designs, achieving up to 58% fewer errors with the DND decoder on simulated infection models. Rather surprisingly, GD based designs also significantly outperform classical test designs when the decoder is switched to definite defectives (DD), indicating transferability to other decoders. Related work: We here give a brief overview of group testing; the exact problem we consider in this work will be detailed in Section II-A. Three infection models are usually studied in the group testing literature: (i) in the combinatorial priors model, a fixed number of individuals k (selected uniformly at random), are infected; (ii) in i.i.d probabilistic priors model, each individual is i.i.d infected with some probability p; (iii) in the non-identical probabilistic priors model, each item i is infected independently of all others with prior probability p i , so that the expected number of infected members isk = N i=1 p i . Infection models (i) and (ii) have received attention from researchers for the most part arXiv:2201.12325v1 [cs.IT] 28 Jan 2022 (see for example, [13] - [23] ). Infection model (iii) is the most general, yet also the least studied one [24] ; we refer the reader to [10] for an excellent summary of existing work on the above infection models. Our work assumes infection model (iii) with non-identical probabilistic priors and accepts (ii) as a special case. Tangentially, recent works have considered correlated infection models; see, for example, [25] - [31] . In this section, we first precisely formulate the problem of interest, and then state a simple lemma on combinatorial optimization that is used in our work. We consider the noiseless nonadaptive group testing problem with non-identical priors. There are N individuals in the population, where individual i is infected independently with probability p i . We assume that the value of p i is known apriori 2 . Let U i be the infection status of individual i: U i = 1{Individual i is infected}. As a result, U i ∼ Ber(p i ). We will denote by U = (U 1 , U 2 , ..., U N ) the vector of infection statuses. Testing matrix: A testing matrix G ∈ {0, 1} T ×N is a T × N binary matrix. Row t in the testing matrix represents the individuals participating in test t, i.e., G ti = 1 represents individual i participating in test t. The test results corresponding to a particular realization of U = (U 1 , U 2 , ..., U N ) and G is defined as the vector In words, the test t gives a positive result if any of the individuals participating in the test are infected, otherwise it gives a negative result 3 . In (1) Y t = 1 if and only if there exists i such that both G ti = 1 and U i = 1 (individual i is infected). In order to infer U from Y , a decoding algorithm r : {0, 1} T → {0, 1} N constructs an estimate U of the infection statuses from the test results. In this work, we fix the decoding algorithm, which we describe next. DND decoder: The definite non-defective (DND) decoder is a well-known decoding algorithm that forms an estimate of U by identifying those individuals who have participated in at least one negative test as healthy and labeling every other individual as infected -i.e., it operates under the principle "every item is defective unless proved otherwise". More precisely, it outputs an estimate U where (2) U has zero false negatives by construction -it can be seen that U i = 1 whenever U i = 1. The number of errors (false positives) that the DND decoder makes for a particular realization U is given by and as a result the expected number of errors E(G) under the DND decoder for a given G is Further, when U i is fixed to be 0, U i is a function of G and U \ {i}, where U \ {i} (U 1 , ..., U i−1 , U i+1 , ..., U N ) denotes the vector U without its i th entry. Thus, fixing U i = 0, and using (1) and (2) we have, where (a) follows because of the following fact: in the above expression, we rewrite (3) as: Our Goal: We want to minimize E(G) across all binary matrices G of size T × N , i.e., solve Discussion: We first observe that γ t,i is not independent of γ t ,i for t = t as they potentially share common U j terms. As a result, the expectation of the product term in (4) is not trivially the product of expectations, which makes the computation of E(G) intractable in general (indeed one could estimate E(G) using Monte-Carlo methods, belief propagation etc.). In Section III we provide a lower bound for E(G) which can be computed efficiently, and which we use as a proxy for E(G). We also note that in principle, (5) could be formulated for any decoder, not just the DND decoder. However, the particular nature of E(G) for the DND decoder admits a nice form, for which we can propose an approximate solution using lower bounding techniques (Section III). For decoders such as the definite defective decoder or belief propagation based ones, we currently do not have an approach to calculate a non-trivial lower bound; this remains a challenging open problem. We now take a detour to prove a simple result that allows one to relax combinatorial optimization problems that aim to optimize over the vertices of an n-dimensional hypercube. One could extend this technique for optimization over other finite sets as well. it is sufficient to solve arg min where f (q) E X∼q g(X) can be envisioned as a continuous extension of g(x). The expectation in the above expression is taken w.r.t the distribution where each X i ∼ Ber(q i ), and the X i s are independent of each other. We refer the reader to Appendix A for the proof. Remark: There is a long history of using relaxation techniques to approximate solutions of combinatorial optimization problems (see [35] for an overview). Most of these focus on linear programming relaxation techniques. In Lemma 1, there is no assumption on g(·) whatsoever and the resulting relaxation may not be a linear program. Moreover, it may not be easy to compute f (·) in all cases and it may also not be easy to compute the gradient ∇f (·) as well. In cases where exactly computing or approximating the gradient is easy (as is indeed the case in this work), one can use first-order optimization techniques such as GD. In this section, we delineate our approach to find an approximate solution to (5) . Following the discussion at the end of Section II-A, our approach is three-fold: First, we lower bound E(G) by another function E LB (G), whose computation turns out to be tractable; we then use E LB (G) as a proxy for E(G). Next, we use Lemma 1 to show that it is sufficient to consider a continuous relaxation of the resulting combinatorial optimization problem. Finally, we show that the objective function in the continuous relaxation and its gradient can also be computed efficiently, thus enabling gradient descent. As a first step, the following theorem states and proves a lower bound for E(G). . For a given testing matrix G, and under the DND decoder, the expected number of errors (see (4)) satisfies Proof. First we recall the expression for E(G) in (4): Using the FKG inequality (see [36] - [38] or proof of Lemma 4 in [18] ) one could show that A rigorous proof of the above statement can be found in Appendix B. The idea is to show that γ t,i is an increasing function on U (assuming a partial ordering); using this observation, the result follows as an application of the FKG inequality. Thus, we have In all numerical evaluations we performed, E(G) and the lower bound E LB (G) were highly correlated -we provide example scatter plots in Figure 2 in Appendix C -which indicates that minimizing E LB (G) is a viable alternative to minimizing E(G). Given the above discussion, we now propose using E LB (G) as a proxy for E(G) -more precisely we propose to solve the following optimization problem: We next use Lemma 1 to argue that a continuous relaxation of (8) is equivalent to (8) . Before stating the main result, we give a definition: we say that the matrix G ∼ Q (read as "G is distributed according to the distribution matrix Q") if each G ti ∼ Ber(Q ti ) ∀ t, i and the G ti variables are independent of each other. In order to solve the optimization problem arg min it is sufficient to solve arg min This is a direct corollary of Lemma 1, where the objective function is E LB (G) and we associate a parameter Q ti corresponding to each G ti . Thus, we now have the following approximate formulation for which the objective function (and its gradient) can be computed in O(N 2 ) time complexity (see Section III-C). The hope is that solving (11) gives sufficiently good choices of G ∼ Q * ; our experimental results in Section V indicate that this is indeed the case. Approximate formulation: Solve for where f (Q) E G∼Q E LB (G). Given the above formulation, we can now use techniques such as gradient descent (GD) to select the testing matrix G. In essence, we are searching over the continuous space of distribution matrices Q. If the gradient of f (Q) can be efficiently computed, one could use GD to converge to a local minima Q * and pick a G ∼ Q * . We now give a closed-form expression for f (Q) and briefly discuss the computational complexity of computing f (Q) and its gradient; the details are deferred to Appendix D, Appendix E and Appendix F. We have, where in (a) the expectation is pushed inside the product terms as E LB (G) is linear when viewed as a function of a single G ti . In Appendix D we discuss an O(N 2 ) algorithm that simplifies the computation of f (Q) above. Given (12) , one could derive an expression for the gradient ∇f (Q) by calculating each partial derivative ∂f (Q) ∂Q lm . The details of the derivation can be found in Appendix E. Moreover, in Appendix F, we discuss the computation of ∇f (Q) in O(N 2 ) runtime. Leveraging the approximate formulation in (11), we here explore a GD approach to find good choices of G. Our proposed approach uses informed initialization with information provided by traditional group test designs. Thus, it can be viewed as a way to refine and improve existing designs via local search. Moreover, we propose a variation of GD that numerically seems to converge to good choices of G in many situations even without informed initialization. We use the following two group test design algorithms as baselines for comparison: • Constant column weight (CCW) design (see [17] , [39] ). This design was introduced in the context of group testing for identical priors 4 , but we adapt it to be applicable for non-identical priors as well, in addition to identical priors. Here we construct a randomized G assuming that all individuals have the same prior probability of infection p mean (this assumption is trivially true if the priors are identical), where p mean is defined as the mean prior probability of infection 1 N N i=1 p i . The testing matrix G is constructed column-by-column by placing each individual in a fixed number ( 0.69T N pmean ) of tests, uniformly at random. • Coupon Collector Algorithm (CCA) from [24] . The CCA algorithm was introduced in [24] for the case of nonidentical, independent priors. In short, the CCA algorithm constructs a random non-adaptive test design G by sampling each row independently from a distribution (we refer the reader to [24] for the exact description of this distribution). The idea is to place objects which are less likely to be infected in more number of tests and vice-versa. We are now ready to describe the gradient descent (GD) approaches to search for G. The high-level idea for our algorithms is as follows: • We consider the approximate formulation in (11) . Pick an initial point Q (0) . • At each gradient iteration l, update Q (l) ← Q (l−1) − ∇ Q f (Q), where is the step size. Project Q (l) onto [0, 1] T ×N by resetting negative entries to 0 and entries greater than 1 to 1. • Stop based on some stopping criteria (e.g. limit number of gradient steps or check for convergence). • Let Q * be the resulting output. Sample a matrix G * where G * ∼ Q * and return it. As it turns out, in our experiments, the choice of initialization plays a significant role in finding good choices of G. We propose the following initializations. • GD + CCW init. We first sample a testing matrix according to the CCW testing matrix and set Q (0) as this matrix. The GD proceeds with this initialization. • GD + CCA init. We first sample a testing matrix according to the CCA testing matrix and set Q (0) as this matrix. The GD proceeds with this initialization. Notably, any other state-of-the-art test design could have been used as initialization. In principle, the above approach can be perceived as a way to refine existing test designs via local search. Alternatively, we also propose a modification to the GD approach called GD + sampling that helps avoid getting stuck in a local minima by encouraging GD to explore multiple neighborhoods. The idea is use stochastic re-initialization of the solution state every few gradient iterations, while ensuring that the value of the objective function is approximately preserved. First note that the objective value f (Q) is the mean of f (G) with G ∼ Q. Therefore, it is reasonable to expect that typical realizations of G will be such that f (G) is close to f (Q). Given this idea, we propose the following: start from the all 0 initialization. However, every few gradient iterations, we replace the current solution state Q (l) by G s where G s is sampled from the distributed matrix Q (l) , i.e., G s ∼ Q (l) . This encourages GD to explore different neighborhoods while (approximately) preserving the monotonocity of GD. In this section, we show simulation results to demonstrate the improvement our GD based approaches provides. Test designs compared: We compare the testing matrices G obtained via each of the following methods: CCW, CCA, GD + CCW init., GD + CCA init., GD + sampling. For completeness, we consider also the trivial all 0-initialization for GD (which we call GD + 0 init), where the initial point Q (0) is set to all zeros. Set-up: We first fix the prior probabilities of infection (p 1 , p 2 , ..., p N ) -each p i is sampled from an exponential distribution with mean 0.05; if p i > 1, we set it to 1. We repeat for 10 such prior distributions. For each design, we estimate E(G) via Monte-Carlo simulations. Metrics: We use the false positive (FP) rate (defined as the fraction of uninfected individuals incorrectly determined to be infected) to measure the performance w.r.t the DND decoder. Recall that the DND decoder results in 0 false negatives (FN) by construction. Transferability to other decoders: As our GD methods aim for optimal designs with the DND decoder, a natural follow-up question is how they perform with other decoders. We compare the performance of each of the test designs w.r.t the Definite Defective (DD) decoder. One could also consider other decoders, such as ones based on belief propagation, but these result in both FP and FN, and consequently the comparison between different methods is not trivial; it requires weighing FP against FN, which can be application specific. We refer the reader to Section 2.4 in [10] for a precise description of DD decoder.Consequently, DD has 0 FP by construction. In this case, we use as performance measure the false negative (FN) rate. Observations: In Figure 1a , we plot the FP rate for each test design w.r.t DND decoder, as a function of T . We observe that the GD based methods significantly outperform CCW and CCA 5 . Notably, the improvement of our enhanced GD with informed initialization or sampling seems inversely proportional to T , which is of practical importance. Next, we plot the FN rate of each test design w.r.t the DD decoder, as a function of T in Figure 1b . The performance trend here is similar to what was observed with the DND decoder, which further supports the usefulness of our GD based approach and its transferability to other decoders. Proof. We first note that for any q ∈ [0, 1] n we have, since the expectation of a random variable is at least as large as its minimum value over its support. Since the above holds for any q, as a result we have Let x * be a minimizer of g(x) in (6) . The choice of q * = x * (i.e. X = x * with probability 1) gives From (13) and (14) we conclude that min q∈[0,1] n E X∼q g(X) = min x∈{0,1} n g(x). In order to obtain a solution to (6), we obtain a solution q * of (7) and any X ∼ q * (sample X i ∼ Ber(q i )) is a solution to (6). In the proof of Theorem 1 we claimed the following: . We prove this using the Fortuin-Kasteleyn-Ginibre (FKG) inequality (see [36] - [38] or proof of Lemma 4 in [18] ), restated here for convenience. Lemma 2 (FKG inequality). Consider a finite distributive lattice Γ with partial ordering ≺ and meet (∧) and join operators (∨). Consider a probability measure µ on Γ that is log-supermodular, i.e., Then, any two functions f and g which are non-decreasing on Γ are positively correlated, i.e., Remark: Consider Γ = {0, 1} N with partial ordering ≺, where a ≺ b if every coordinate of b is at least as large as a. When the meet and join operators coincide with logical AND and logical OR respectively, this is a distributive lattice. It can be verified that any product measure µ on Γ is log-supermodular. As a result, any two functions f and g which are non-decreasing on Γ are positively correlated, i.e., E µ (f g) ≥ E µ (f )E µ (g). Consequently, given any M non-negative, non-decreasing functions f 1 , f 2 , ..., f M one could inductively apply FKG inequality to obtain Given (15) what remains to be shown is that each γ t,i (U) is non-negative and non-decreasing as a function of U ∈ {0, 1} N . To see that it is non-negative is straight-forward -we have G tj U j ≥ 0 and hence (1 − G tj U j ) ≤ 1. Therefore, the product N j=1: j =i (1−G tj U j ) ≤ 1 and the non-negativity follows. To see that γ t,i (U) is non-decreasing, we first consider Thus, γ t,i (U) ≤ γ t,i (U ) and γ t,i is non-decreasing. Applying (15) , we have Here we give a O(N 2 ) algorithm to compute the objective function f (Q) in (11) . We assume T ≤ N so T = O(N ) throughout. We first restate the expression for f (Q) in (12): Note that this can be rewritten as: where the intermediate terms are defined as Thus, we first compute and store G[t, i] ∀ t, i, which is then used to compute F Here, we give an expression for the gradient ∇f (Q) by calculating each partial derivative ∂f (Q) ∂Q lm . where in (a) we separate out the term corresponding to t = l from the product term T t=1 and apply the derivative in (b). The computation of gradient follows a similar approach as the computation of the objective function f (Q). We assume T ≤ N so T = O(N ) throughout. We first restate the expression for the gradient in (16): As we did in the case of objective function computation, we first simplify and rewrite this in terms of intermediate terms: Group testing against covid-19 Coronavirus test shortages trigger a new strategy: Group screening Five people. one test. this is how you get there Group testing for sars-cov-2 allows up to 10-fold efficiency increase across realistic scenarios and testing strategies Tapestry: A single-round smart pooling technique for covid-19 testing Variation in false-negative rate of reverse transcriptase polymerase chain reaction-based sars-cov-2 tests by time since exposure The mathematical strategy that could transform coronavirus testing Pooled sample testing and screening testing for covid-19 The detection of defective members of large population Group testing: an information theory perspective Combinatorial Group Testing and Its Applications. Series on Applied Mathematics Revisiting nested group testing procedures: new results, comparisons, and robustness Boolean compressed sensing and noisy group testing Group testing algorithms: Bounds and simulations Non-adaptive group testing: Explicit bounds and novel algorithms Phase transitions in group testing Performance of group testing algorithms with near-constant tests per item Individual testing is optimal for nonadaptive group testing in the linear regime Optimal non-adaptive probabilistic group testing requires θ(min{k log n, n}) tests Optimal group testing," ser. Proceedings of Machine Learning Group testing with nested pools A fast binary splitting approach to nonadaptive group testing Information-theoretic and algorithmic thresholds for group testing Group testing with prior statistics Community aware group testing Group testing for connected communities Group testing for overlapping communities Noisy pooled pcr for virus testing Contact tracing enhances the efficiency of covid-19 group testing Adaptive group testing on networks with community structure Network group testing Dynamic group testing to control and monitor disease progression in a population An entropy reduction approach to continual testing Combinatorial optimization: exact and approximate algorithms Correlation inequalities on some partially ordered sets On the fkg-inequality for measures on a partially ordered space Encyclopedia of Mathematics Improved group testing rates with constant column weight designs