key: cord-0590185-gpkr87ae
authors: Ghosh, Sabyasachi; Agarwal, Rishi; Rehan, Mohammad Ali; Pathak, Shreya; Agrawal, Pratyush; Gupta, Yash; Consul, Sarthak; Gupta, Nimay; Goyal, Ritika; Rajwade, Ajit; Gopalkrishnan, Manoj
title: A Compressed Sensing Approach to Group-testing for COVID-19 Detection
date: 2020-05-16
journal: nan
DOI: nan
sha: ef9125a2db9ee309baaaa86254528fc93c2f5fd1
doc_id: 590185
cord_uid: gpkr87ae

We propose Tapestry, a novel approach to pooled testing with application to COVID-19 testing with quantitative Polymerase Chain Reaction (PCR) that can result in shorter testing time and conservation of reagents and testing kits. Tapestry combines ideas from compressed sensing and combinatorial group testing with a novel noise model for PCR. Unlike Boolean group testing algorithms, the input is a quantitative readout from each test, and the output is a list of viral loads for each sample. While other pooling techniques require a second confirmatory assay, Tapestry obtains individual sample-level results in a single round of testing. When testing $n$ samples with $t$ tests, as many as $k= O(t / log n)$ infected samples can be identified at clinically-acceptable false positive and false negative rates. This makes Tapestry viable even at prevalence rates as high as 10%. Tapestry has been validated in simulations as well as in wet lab experiments with oligomers. Clinical trials with Covid-19 samples are underway. An accompanying Android application Byom Smart Testing which makes the Tapestry protocol straightforward to implement in testing centres is available for free download.

The coronavirus disease of 2019 (COVID-19) crisis has led to widespread lockdowns in several countries, and has had a major negative impact on the economy. Early identification of infected individuals can enable quarantining of the individuals and thus control the spread of the disease. Infected individuals are often asymptomatic for many days. Widespread testing with the RT-PCR (reverse transcription polymerase chain reaction) method can help identify the infected individuals. However, widespread testing is not an available option in many countries due to constraints on resources such as time, basic equipment, skilled manpower and reagents.

The current low rate of COVID-19 infection in the world population means that most samples tested are not infected, so that most tests are wasted on uninfected samples. Group testing is a process of pooling together samples of n different people into multiple pools, and testing the pools instead of each individual sample. A negative result on the mixture implies that all n samples were negative. This saves a huge amount of testing resources, especially with low infection rates. Group testing for medical applications has a long history dating back to the 1940s when it was proposed for testing of blood samples for syphilis [1] . Simple group testing schemes have already been applied in the field by several research labs [2, 3, 4] for COVID-19 testing.

Simple group testing schemes require pooling of samples and a second round of RNA extraction for all samples in positive pools. This second round of RNA extraction can increase the time to result and be laborious to perform in laboratories where the RNA extraction process is done manually. In situations where the result needs to be delivered fast or a second round of RNA extraction must be avoided, these schemes are less attractive.

Tapestry has a number of salient features which we enumerate below.

The organization of the paper is as follows. We first present a brief overview of the RT-PCR method in Sec. 2. The precise mathematical definition of the computational problem being solved in this paper is then put forth in Sec. 3.1.

We describe traditional and CS-based group-testing algorithms for this problem in Sec. 3.2, 3.3 and 3.4. The sensing matrix design problem is described in Sec. 3.5. Results on synthetic data are presented in Sec. 4. This is followed by results on a limited amount of data from lab experiments performed with oligomers to mock the clinical situation as closely as possible. In Sec. 5, we compare our work to two recent related approaches. We conclude in Sec. 6 with a glance through different scenarios where our work could be deployed.

In the RT-PCR method [9] , a sample is collected from a patient from a biological matrix that would with high likelihood contain copies of the virus should the patient be infected. For COVID-19, nasal swabs are most common, but other biological matrices like saliva, oropharyngeal swab, bronchial lavage, stool, blood sample, etc may also be collected in certain cases. The sample is then dispersed into a liquid medium. The RNA molecules of the virus present in this liquid medium are converted into complementary DNA (cDNA) via a process called reverse transcription. DNA fragments called primers complementary to cDNA from the viral genome are then added. They attach themselves to specific sections of the cDNA from the viral genome if the virus is present in the sample. The cDNA of these specific viral genes then undergoes a process of exponential amplification in an RT-PCR machine. Here, they are put through several cycles of alternate heating and cooling in the presence of Taq polymerase and appropriate reagents. This triggers the creation of many new identical copies of specific portions of the target DNA, roughly doubling in number with every cycle of heating and cooling. The reaction volume contains sequence-specific fluorescent markers which report on the total amount of amplified DNA of the appropriate sequence. The resulting fluorescence is measured, and the increase can be observed on a computer screen in real time. The time when the amount of fluorescence exceeds the threshold level is known as the threshold cycle C t , and is a quantitative readout from the experiment. A smaller C t indicates greater number of copies of the virus. Usually C t takes values anywhere between 16 to 32 cycles in real experiments. PCR can detect even single molecules. A single molecule typically would have C t value of around 37 cycles or so, but there can be wide variation. The test takes about 3-4 hours to run.

3 Testing Methods

Let x denote a vector of n elements where x i is the viral load (or virus concentration) of the i th person. Throughout this paper we assume that only one sample per person is extracted. Hence x contains the viral loads corresponding to n different people. Due to the low infection rate for COVID-19 as yet, x is considered to be a sparse vector with at the most k ≪ n positive-valued elements. Note that x i = 0 implies that the i th person is not infected. In nonadaptive group testing, small and equal volumes of the samples of a subset of these n people are pooled together according to a sensing or pooling matrix A = (A ji ) m×n whose entries are either 0 or 1. The viral loads of the pools will be given by:

where A ji = 1 if a portion of the sample of i th person is included in the j th pool. In all, some m < n pools are created and individually tested using RT-PCR, so that 1 ≤ j ≤ m. We now have the following relationship:

where z is the m-element vector of viral loads in the mixtures, and A denotes a m × n binary 'pooling matrix' (also referred to as a 'sensing matrix' in compressed sensing literature). In particular, z j = A j x where A j is the j th row of A and placing the two vectors adjacent to each other denotes matrix multiplication of the row vector with the colum vector x. Note that each positive RT-PCR test will yield a noisy version of z j , which we refer to as y j . The relation between the 'clean' and noisy versions is given as:

where e j ∼ N (0, σ 2 ) and q ∈ (0, 1) are system-specific constants. The factor (1 + q) ej reflects the stochasticity in the growth of the numbers of DNA molecules during PCR. Here σ is known and constant. We have found that a number around .1 is typical. For q, we find a number around .95 to be typical. Equivalently for positive tests, we have:

In case of negative tests, y j as well as z j are 0-valued, and no logarithms need be computed. The core computational problem is to estimate x given y and A. It should be noted that though we have treated each element of x to be a fixed quantity, it is in reality a random variable of the form x i ∼ Poisson(λ i ) where λ i ≥ 0. If matrix A contains only ones and zeros, this implies that z j ∼ Poisson(A j x), making use of the fact that the sum of Poisson random variables is also a Poisson random variable.

Combinatorial Orthogonal Matching Pursuit (COMP) is a Boolean nonadaptive group testing method. Here one uses the simple idea that if a mixture y j tests negative then any sample x i for which A ji = 1 must be negative. In other words, all samples contributing to a mixture that tests negative must be negative (or non-infected). The other samples are all considered to be positive. This algorithm guarantees that there are no 'false negatives'. However it can produce a very large number of 'false positives'. For example, a sample x k will be falsely reported to be positive if every mixture y j it is part of also contains at least one (other) genuinely positive sample. The COMP algorithm is largely insensitive to noise. Moreover a small variant of it can also produce a list of 'sure positives', after identifying the sure negatives. This happens when a positive mixture y j contains only one sample x i , not counting the other samples which were declared sure negatives in the earlier step 1 . The performance guarantees for COMP have been analyzed in [8] and show that COMP requires ek(1 + δ) log n tests for an error probability less than n −δ . This analysis has been extended to include the case of noisy test results as well [8] . However COMP can result in a large number of false positives, and it also does not predict viral loads.

Group testing is intimately related to the field of compressed sensing (CS) [10] , which has emerged as a significant sub-area of signal and image processing [5] . In compressed sensing, an image or a signal x with n elements, is directly acquired in compressed format via m linear measurements of the form y = Ax + η. Here, the measurement vector y has m elements, and A is a matrix of size m × n, and η is a vector of noise values. If x is a sparse vector with k ≪ n non-zero entries, and A obeys the so-called restricted isometry property (RIP), then exact recovery of x from y, A is possible [11] if η = 0. In the case of measurement noise, the recovery of x produces a solution that is provably close to the original x. A typical recovery problem P0 consists of optimizing the following cost function:

where ε is an upper bound (possibly a high probability upper bound) on η 2 , and x 0 is the number of non-zero elements in x. In the absence of noise, a unique and exact solution to this problem is possible with as few as 2k measurements in y if x has k non-zero elements [11] . Unfortunately, this optimization problem P0 is NP-Hard and the algorithm requires brute-force subset enumeration. Instead, the following problem P1 (often termed 'Basis Pursuit Denoising' or BPDN) is solved in practice:

P1 is a convex optimization problem which yields the same solution as the earlier problem (with similar conditions on x, A) at significantly lower computational cost, albeit with O(k log n) measurements (i.e. typically greater than 2k) [5, 11] . The order k restricted isometry constant (RIC) of a matrix A is defined as the smallest constant δ k , for which the following relationship holds for all k-sparse vectors x (i.e. all vectors with at the most k non-zero entries):

The matrix A is said to obey the order k restricted isometry property if δ k is close to 1. This property essentially implies that no k-sparse vector (other than the zero vector) can lie in the null-space of A. Unique recovery of ksparse signals requires that no 2k-sparse vector lies in the nullspace of A [11] . A matrix A which obeys RIP of order 2k satisfies this property. It has been proved that matrices with entries randomly and independently drawn from distributions such as Rademacher or Gaussian, obey the RIP of order k with high probability [12] , provided they have at least O(k log n) rows. The solution to the optimization problems P0 and P1 in Eqns. 5 and 6 respectively, are provably robust to noise [5] , and the recovery error worsens with increase in noise magnitude. The error bounds for P0 in Eqn. 5 are of the form, for solutionx [13] :

whereas those for P1 in Eqn. 6 have the form [13] :

x −x 2 ≤ εC(δ 2k ).

Here C(δ 2k ) is a monotonically increasing function of δ 2k ∈ (0, 1) and has a small value in practice. Over the years, a variety of different techniques for compressive recovery have been proposed. We use some of these for our experiments in Sec. 3.4. These algorithms use different forms of sparsity and incorporate different types of constraints on the solution.

Our approach toward group-testing for COVID-19 involves a two-stage procedure 2 . In the first stage, we apply the COMP algorithm described in Sec. 3.2, to identify the sure negatives (if any) in x to form a set X . Let Y be the set of zero-valued measurements in y (i.e. negative tests). Moreover, we defineX ,Ȳ as the complement-sets of X , Y respectively. Also, let yȲ be the vector of m − |Y| measurements which yielded a positive result. Let xX be the vector of n − |X | samples, which does not include the |X | surely negative samples. Let AX ,Ȳ be the submatrix of A, having size (m − |Y|) × (n − |X |), which excludes rows corresponding to zero-valued measurements in y and columns corresponding to negative elements in x. In the second stage, we apply a CS algorithm to recover xX from yȲ , AX ,Ȳ . To avoid symbol clutter, we henceforth just stick to the notation y, x, A, even though they respectively refer to yȲ , xX , AX ,Ȳ .

Note that the CS stage following COMP is very important for the following reasons:

1. COMP typically produces a large number of false positives. The CS algorithms help reduce the number of false positives as we shall see in later sections.

2. COMP does not estimate viral loads, unlike CS algorithms.

3. In fact, unlike CS algorithms, COMP treats the measurements in y as also being binary, thus discarding a lot of useful information.

However, the COMP algorithm prior to applying the CS algorithm is also very important for the following reasons:

1. It identifies the sure negatives in x from the negative measurements in y. Therefore, it effectively reduces the size of the problem to be solved by the CS step from (m, n) to (m − |Y|, n − X |).

2. It can be seen from Eqn. 4, that the multiplicative noise we have in y is essentially heteroscedastic. This is because the 0-valued measurements in y are noiseless, and the others are noisy. In such a case, direct application of CS algorithms without the preceding COMP step will be fraught with challenges. It is instead easier to discard the obvious negatives before applying the CS step.

For CS recovery, we employ the following algorithms: the non-negative LASSO (NN-LASSO), orthogonal matching pursuit (OMP), and Sparse Bayesian Learning (SBL). For problems of small size, we also apply a brute force search algorithm to solve problem P0 from Eqn. 5 combinatorially.

The LASSO (least absolute shrinkage and selection operator) is a penalized version of the constrained problem P1 in Eqn. 6. It is widely used in statistics and signal processing, and seeks to minimize the following cost function:

Here λ is a regularization parameter which imposes sparsity in x. The LASSO has rigorous theoretical guarantees for recovery of x as well as recovery of the support of x (i.e. recovery of the set of non-zero indices of x). We refer the reader to chapter 11 of [14] for the relevant theorems. Given the non-negative nature of x, we implement a variant of LASSO with a non-negativity constraint, leading to the following optimization problem:

Selection of λ: There are criteria defined in [14] for selection of λ under iid Gaussian noise, so as to guarantee statistical consistency. However, in practice, cross-validation (CV) can be used for optimal choice of λ. For this, the measurements in y are divided into two randomly chosen disjoint sets: one for reconstruction (R) and the other for validation (V). The NN-LASSO is executed independently on multiple values of λ from a candidate set Λ. For each λ value, an estimatex λ is produced using measurements only from R, and the CV error v e (λ) := i∈V (y i − A ix λ ) 2 is computed. The value of λ which yields the least value of v e (λ) is chosen, and an estimate of x is obtained by executing LASSO on all measurements from R ∪ V. If V is large enough, then v e (λ) is shown to be a good estimate of the actual error x −x λ 2 , as has been shown for Gaussian noise [15] . Nonetheless, it should be noted that CV is a method of choice for parameter selection in CS even under a variety of other noise models such as Poisson [16] , etc, and we have experimentally observed that it works well even in the case of our noise model in Eqn. 4. However, in this particular application, we may potentially deal with situations where m and n are small, for example m = 16, n = 40. In such a scenario, CV will yield unstable results since |V| will be very small. In such a case, one resorts to the so-called 'discrepancy principle' (DP) [17] . As per our noise model in Eqn. 4, the expected value of R(y, Ax) := log y − log Ax 2 should be close to σ log(1 + q) √ m. Moreover, its variance is quite small and independent of m. Hence, as per the discrepancy principle, we seek to find λ ∈ Λ such that |R(y, Ax λ )−σ log(1+q) √ m| is minimized. Previous work in [18] has employed such a technique in the context of image deblurring under Poisson-Gaussian noise with a square-root based data-fidelity term of the form y + 3/8 − Ax λ + 3/8 2 .

Orthogonal Matching Pursuit (OMP) [19] is a greedy approximation algorithm to solve the optimization problem in Eqn. 5. Rigorous theoretical guarantees for OMP have been established in [20] . OMP proceeds by maintaining a set H of 'selected coefficients' in x corresponding to columns of A.In each round a column of A is picked greedily, based on the criterion of maximum absolute correlation with a residual vector r := y − k∈H A kxk . Each time a column is picked, all the coefficients extracted so far (i.e. in set H) are updated. This is done by computing the orthogonal projection of y onto the subspace spanned by the columns in H. The OMP algorithm can be quite expensive computationally. Moreover, in order to maintain non-negativity of x, the orthogonal projection step would require the solution of a non-negative least squares problem, further adding to computational costs. However, a fast implementation of a non-negative version of OMP (NN-OMP) has been developed in [21] , which is the implementation we adopt here. For the choice of ε in Eqn. 5, we can use CV or DP as described in Sec. 3.4.1.

Sparse Bayesian Learning (SBL) [22, 23] is a non-convex optimization algorithm based on Expectation-Maximization (EM) that has empirically shown superior reconstruction performance to most other CS algorithms with manageable computation cost [24] . In SBL, we consider the case of Gaussian noise in y and a Gaussian prior on elements of x, leading to:

Since both x and ϕ (the vector of the {ϕ i } n i=1 values) are unknown, the optimization for these quantities can be performed using an EM algorithm. In the following, we shall denote Φ := diag(ϕ). Moreover, we shall use the notation Φ (l) for the estimate of Φ in the l th iteration. The E-step of the EM algorithm here involves computing Q(Φ|Φ l ) := E x|y;Φ (l) log p(y, x; Φ). It is to be noted that the posterior distribution p(x|y; Φ (l) ) has the form N (µ, Σ) where µ = ΣA T y/σ 2 and Σ = (A T A/σ 2 + (Φ (l) ) −1 ) −1 . The M-step involves maximization of Q(Φ|Φ (l) ), leading to the update Φ (l+1) = diag(µ 2 i + Σ ii ). The E-step and M-step are executed alternately until convergence. Convergence to a fixed-point is guaranteed, though the fixed point may or may not be a local minimum. However, all local minima are guaranteed to produce sparse solutions for x (even in the presence of noise) because most of the ϕ i values shrink towards 0. The SBL procedure can also be modified to dynamically update σ 2 , the Gaussian noise variance, if it is unknown. All these results can be found in [23] . Unlike NN-LASSO or OMP, the SBL algorithm from [23] expressly requires Gaussian noise. However we use it as is in this paper for the simplicity it affords, and choose the standard deviation of the noise (required for the SBL updates) simply via CV or DP as described in Sec. 3.4.1. Unlike NNOMP or NNLASSO, there is no explicit non-negativity constraint imposed in the basic SBL algorithm. In our implementation, the non-negativity is simply imposed at the end of the optimization by setting to 0 any negative-valued elements in µ, though more principled, albeit more computationally heavy, approached can be adopted [25] .

The sensing matrix A must obey some properties specific to this application such as being non-negative. For ease of pipetting, it is desirable that the entries of A be binary, where A ji = 0 indicates that sample i did not contribute to pool j, and A ji = 1 indicates that a fixed unit volume of sample i was pipetted into pool j. Also for ease of pipetting including saving pipetting time, it is desirable that A be sparse. This additionally ensures that not too many samples contribute to a pool, and that a single sample does not contribute to too many pools. The former is important because typically the volume of sample that is added in a PCR reaction is fixed. Increasing pool size means each sample contributes a smaller fraction of that volume. This leads to dilution which manifests as a shift of the C t value towards larger numbers. If care is not taken in this regard, this can affect the power of PCR to discriminate between positive and negative samples. The latter is important because contribution of one sample to a large number of pools could lead to depletion of sample.

The Restricted Isometry Property (RIP) of sensing matrices is a sufficient condition for good CS recovery as described in Sec. 3.3. However the matrices which obey the aforementioned physical constraints are not guaranteed to obey RIP. Instead, we consider sensing matrices which are adjacency matrices of expander graphs. A left-regular bipartite graph G((V I , V O ), E ⊆ V I × V O ) with degree of each vertex in V I being d, is said to be a (k, 1 − α) expander graph for some integer k > 0 and some real-valued α ∈ (0, 1), if for every subset S ⊆ V I with |S| ≤ k, we have |N (S)| ≥ (1 − α)d|S|. Here N (S) denotes the union set of neighbors of all nodes in S. Intuitively a bipartite graph is an expander if every 'not too large' subset has a 'large' boundary. It can be proved that a randomly generated left-regular bipartite graph is an expander, with high probability [26] . Moreover, it has been shown in [6] that the adjacency matrix A of a (k, 1 − α) expander graph obeys the RIP-1 property (a version of the RIP, but with ℓ 1 norm). That is, for any k-sparse vector x, the following relationship holds if A obeys the RIP-1 property of order k:

This property again implies that the null-space of A cannot contain vectors that are 'too sparse' (apart from the zero-vector). This summarizes the motivation behind the use of expanders in compressive recovery of sparse vectors, and also in group testing. Although randomly generated left-regular bipartite graphs are expanders, we would need to verify whether a particular such graph is a good expander, and prove theorems for it. In the application at hand, this can prove to be a critical limitation since matrices of various sizes may have to be served, depending on the number of samples arriving in that batch at the testing centre, and the number of tests available to be performed. Hence, we have chosen to employ deterministic procedures to design such matrices, based on objects from combinatorial design theory known as Kirkman triples [27] . Generalizations of these kind of objects under the name of the 'social golfer problem' is an active area of research.

We first recall Kirkman triple systems which are Steiner triple systems with an extra property. Steiner triple systems consist of n = m 2 /3 column vectors of length m with entries either 0 or 1 such that each column has exactly three 1s, and no two columns have dot product more than 1 [28] . If the sum of columns from i to i + m/3 − 1 equals 1 ∈ R m for every i ≡ 1 modulo m/3 then the Steiner triple system is called a Kirkman triple system [27] .

Kirkman triple systems have many nice combinatorial properties that make them ideally suited for our application. The most obvious is that they are sparse matrices since each column has only three 1s. The dot product between two columns being at most 1 ensures that no two samples participate in more than one test together. This has favourable consequences in terms of placing an upper bound on the mutual coherence of the matrix, defined as:

where A i refers to the i th column of A. Matrices with lower µ(A) values have lower values of worst case upper bounds on the reconstruction error [29] . These bounds are looser than those based on the RIC that we saw in previous sections. However, unlike the RIC, the mutual coherence is efficiently computable. Another benefit is that the Kirkman matrix bipartite graph of samples and tests has high girth, i.e., the smallest cycle in the graph has large diameter. Hence the graph is locally treelike, and has good expansion properties. As a result, the RIP-1 property holds and the matrix has good reconstruction properties under compressed sensing.

A practical benefit of Kirkman triples that is not shared by Steiner triples is that Kirkman triples can be served for number of samples far less than n = m 2 /3 while keeping pools balanced. In fact, we can choose n to be any integer multiple of m/3, and ensure that every pool gets the same number of samples. This allows us to characterize the properties of the full Kirkman matrix, and use that analysis to predict how it will behave in the clinical situation where the pooling matrix to be served may require very specific values of m, n depending on the prevalence rate.

As mentioned earlier, the mutual coherence is efficient to compute and optimize over. Hence, there is a large body of literature on designing CS matrices by minimizing µ(A) w.r.t. A, for example [30] . We followed such a procedure for designing sensing matrices for some of our experimental results in Sec. 4.2. For this, we follow simulated annealing to update the entries of A, starting with an initial condition where A is a random binary matrix. For synthetic experiments, we compared such matrices with random matrices, biregular random sparse graphs, and Kirkman matrices. We found that matrices of Kirkman triples perform very well empirically in the regime of sizes we are interested in, and hence the results are reported using only Kirkman or Steiner triple matrices.

In the class of 'adaptive group testing' techniques, the n samples are distributed into two or more groups, each of smaller size, and the smaller groups are then individually tested. In one particular adaptive method called generalized binary splitting (GBS) [31] , this procedure is repeated (in a binary search fashion) until a single infected sample is identified. This requires O(log n) sequential tests, where each test requires mixing up to n/2 samples. This sample is then discarded, and the entire procedure is performed on the remaining n − 1 samples. Such a procedure does not introduce any false negatives, and does not require prior knowledge of the number of infected samples k. It requires a total of only O(k log n) tests, if k is the number of infected samples. However such a multi-stage method is impractical to be deployed due to its sequential nature, since each RT-PCR stage requires nearly 3-4 hours. Moreover, each mixture that is tested contains contributions from as many as O(n) samples, which can lead to significant dilution or may be difficult to implement in the lab. Hence in this work, we do not pursue this particular approach further. Such an approach may be very useful if each individual test had a quick turn-around time.

In this section, we show a suite of experimental results on synthetic data as well as on real data.

Signal/Measurement Generation: For the case of synthetic data, we generated signal vectors x of dimension n = 105, with varying levels of sparsity. In all cases, m = 45 noisy measurements in y were simulated following the noise 00,0.00 0.0,0.0 26.0,6.3 1.00,0.00 0.71,0.07 1.00,0.00 0.0,0.0 62.8,14.6 1.00,0.00 0.93,0.02  20 1.00,0.00 0.0,0.0 31.9,7.8 1.00,0.00 0.62,0.09 1.00,0.00 0.0,0.0 91.2,17.7 1.00,0 .00 0.90,0.02 Table 1 √ λ/λ = 1/ √ λ which becomes smaller and smaller as λ increases. Kirkman triple sensing matrices were used in these experiments for generating the measurements. The recovery algorithms were tested on Q = 100 randomly generated signals. The different signals had different supports, and the magnitudes of the non-zero elements were uniformly randomly generated in the range [1, 1000] .

Algorithms tested: The following algorithms were compared:

1. COMP (see Table 1) 2. COMP followed by NN-LASSO (see Table 2) 3. COMP followed by SBL (see Table 3) 4. COMP followed by NN-OMP (see Table 4) 5. COMP-BF, i.e. COMP followed by brute-force search, only for small sample sizes (see Tables 5 and 7) Comparison Criteria: In the following,x denotes the estimate of x. Most numerical algorithms do not produce vectors that are exactly sparse, due to issues such as choice of convergence criterion. Instead, the recovered signals contain many entries which have a tiny magnitude. Since in this application, support recovery is of paramount importance (to identify which samples in x were infected), we employ the following post-processing step: All entries inx whose magnitude falls below a threshold τ := 0.02 × x min are set to zero, yielding a vectorx. Here x min refers to the least possible value of the viral load, and this can be obtained offline from practical experiments (even on single samples). In these synthetic experiments, we simply set x min := 1. The various algorithms were compared with respect to the following criteria: It should be noted that all algorithms were evaluated on 100 randomly generated sparse signals, given the same sensing matrix. The average value as well as standard deviation of all quality measures (over the 100 signals) are reported in the Tables 1, 2 , 3, 4, 5, 7. A comparison of Table 1 to Tables 2, 3 , 4, 5, 7 indicates that COMP followed by NN-LASSO/SBL/NNOMP/NN-BF significantly reduces the false positives at the cost of a rare false negative. The RMSE is also significantly improved, since COMP does not estimate viral loads. We note that the experimental results reported in these tables are quite encouraging, since these experiments are challenging due to small m and fairly large k, n, albeit with testing on synthetic data. Brute-force method: We refer to COMP-BF as a method where we apply COMP followed by a brute-force search to solve problem P0 in Eqn. 5. This is computationally feasible only if C(n, k) is 'reasonable' (note that the effective n is often reduced after application of COMP), and so we employ it only for small-sized matrices. The method essentially enumerates all possible supports of x which have size k. For each such candidate support set Z, the following cost function is minimized using the fmincon routine of MATLAB which implements an interior-point optimizer 3 :

Results with the COMP-BF method are shown in Table 5 . The special advantage of the brute-force method is that it requires only m = 2k mixtures, which is often less than O(k log n). However, such a method requires prior knowledge of k, or an estimate thereof. We employ a method to estimate k directly from y, A. This is described in Sec. 4.3. The results in Table 5 assume that the exact k was known, or that the estimator predicted the exact k. However, we observed that the estimator from Sec. 4.3 can sometimes over-estimate k. Hence, we also present results with COMP-BF where the brute-force search assumed that the sparsity was (over-estimated to be) k + 1 instead of k. These are shown in Table 7 . A comparison of Tables 5 and 7 shows that RMSE deteriorates if k is incorrectly estimated. However there is no adverse effect on the number of false negatives, and only a small adverse effect on the number of false positives. Comparison with Dorfman Pooling: We also performed a comparison of our algorithms with the popular two-stage Dorfman pooling method (an adaptive method), with regard to the number of tests required. In the first stage of the Dorfman pooling technique, the n samples are divided into n/k pools, each of size √ nk. Each of these n/k pools are tested, and a negative result leads to all members of that pool being considered negative (i.e. non-infected). However, the pools that are tested positive are passed onto a second stage, where all members of those pools are individually tested. The comparison w.r.t. the number of tests is shown in Table 6 , assuming (a) that the number of infected samples k is known in advance, and (b) that the k infected samples are distributed across pools (a worst case situation). Comparisons of Tables 1, 2, 3, 4 with the two-stage Dorfman pooling method in 6 show that our methods require much fewer tests, albeit with a slight increase in number of false negatives. Moreover, all our methods are single-stage methods unlike the Dorfman method which requires two stages of testing. If the value of k is unknown, which will usually be the case on the field, then the method from Sec. 4.3 can be used.

We acquired real data in the form of test results on pooled samples from two labs: one at the National Center of Biological Sciences (NCBS) in India, and the other at the Wyss Institute at the Harvard Medical School, USA. In both cases, viral RNA was artificially injected into k of the n samples where k ≪ n. From these n samples, a total of m mixtures were created. For the datasets obtained from NCBS that we experimented with, we had m = 16, n = 40, k ∈ {1, 2, 3, 4}. For the data from the Wyss Institute, we had m = 24, n = 60, k = 3. The results for all these datasets are presented in Table 8 . The pooling matrices in both cases were obtained by performing a simulated annealing procedure to minimize the mutual coherence (see Sec. 3.5), starting with a random sparse binary matrix as initial condition. We see that the CS algorithms reduce the false positives, albeit with an introduction of occasional false negatives for higher values of k. We also refer the reader to our work in [32] for a more in-depth description of results on real experimental data.

CS algorithms require O(k log n) measurements for successful recovery assuming RIP-1 4 or RIP-2 properties of the sensing matrix. However, in practice k is always unknown, which leads to the question as to how many measurements are needed as a minimum for a particular problem instance. To address this, we adopt the technique from [34] to estimate k on the fly from the compressive measurements. This technique does not require signal recovery for estimating k. The relative error in the estimate of k is shown to be O( log m/m) [35] , which diminishes as m increases (irrespective of the true k). The advantage of this estimate of k is that it can drive the COMP-BF algorithm, as well as act as an indicator of whether there exist any false negatives.

There exists very little literature on applying CS for COVID-19 testing. While preparing this manuscript, we came across two references on arxiv: [36] and [37] . Both these references adopt a nonadaptive CS based approach. However compared to them, our work is different in the following ways (also see [32] ):

1. Real/Synthetic data: The work in [36] is largely theoretical, whereas our work as well as that in [37] have tested results on real data. 2. Noise model : Our work uses the physically-derived noise model in Eqn. 4 (as opposed to only Gaussian noise). This noise model is not considered in [36, 37] .

3. Algorithms: The work in [36] adopts the BPDN technique (i.e P1 from Eqn. 6) as well as the brute-force search method for reconstruction. The work in [37] uses the LASSO, albeit without explicit imposition of a non-negativity constraint. On the other hand, we use the LASSO with a non-negative constraint, the brute-force method, as well as other techniques such as SBL and NNOMP, all in combination with COMP. The work in [36] assumes knowledge of the (Gaussian) noise variance for selection of ε in the estimator in Eqn. 6, whereas we use cross-validation for all our estimators. The technique in [37] uses a slightly different form of cross-validation for selection of the regularization parameter in LASSO.

Sensing matrix design: [36] use randomly generated expander graphs, whereas we use Kirkman triple matrices. The work in [37] uses randomly generated sparse Bernoulli matrices or Reed-Solomon codes. Each sample in our matrix participates in 3 pools as opposed to 6 pools as used in [37] which is advantageous from the point of view of pipetting time. 

Sparsity estimation: Our work uses an explicit sparsity estimator and does not rely on any assumption regarding the prevalence rate

3%) of samples with 48 tests, with an average number of false positives that was less than 2.75, and an average number of false negatives that was less than 0.33. On synthetic simulations with their 48 × 384 Reed-Solomon code based matrix (released by the authors) for a total of 100 x vectors with ℓ 0 norm of 5 using COMP-NNLASSO, we obtained 1.51 false positives and 0.02 false negatives on an average with a standard deviation of 1.439 and 0.14 respectively. Using COMP-SBL instead of COMP-NNLASSO with all other settings remaining the same, we obtained 1.4 false positives and 0.0 false negatives on an average with a standard deviation of 1.6 and 0.1 respectively. As such

The detection of defective members of large populations

Covid-19 screening and testing

An introduction to compressive sampling

Combining geometry and combinatorics: A unified approach to sparse signal recovery

Viral dynamics in mild and severe cases of covid-19

Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms

How is the COVID-19 virus detected using real time RT-PCR

Group testing and sparse signal recovery

The restricted isometry property and its implications for compressive sensing

A simple proof of the restricted isometry property for random matrices

Introduction to compressed sensing

Statistical Learning with Sparsity: The LASSO and Generalizations

On the theoretical analysis of cross validation in compressive sensing

Minimax optimal convex methods for Poisson inverse problems under lq-ball sparsity

Morozov's discrepancy principle and tikhonov-type functionals

Variance-stabilization-based compressive inversion under Poisson or Poisson-Gaussian noise with analytical bounds

Orthogonal matching pursuit: recursive function approximation with application to wavelet decomposition

Orthogonal matching pursuit for sparse signal recovery with noise

Fast non-negative orthogonal matching pursuit

Sparse Bayesian learning and the relevance vector machine

Sparse Bayesian learning for basis selection

A review of sparse recovery algorithms

Rectified gaussian scale mixtures and the sparse non-negative least squares problem

A fast noniterative algorithm for compressive sensing using binary measurement matrices

Kirkman's schoolgirl problem

Steiner triple systems

Stable restoration and separation of approximately sparse signals

On optimization of the measurement matrix for compresive sensing

Combinatorial group testing and its applications

Tapestry: A single-round smart pooling technique for covid-19 testing

Deterministic matrices matching the compressed sensing phase transitions of gaussian random matrices

On the fly estimation of the sparsity degree in compressed sensing using sparse sensing matrices

Sparsity estimation from compressive projections via sparse random matrices

Low-cost and high-throughput testing of covid-19 viruses and antibodies via compressed sensing: System concepts and computational experiments

Efficient high throughput SARS-CoV-2 testing to detect asymptomatic carriers

We have designed an Android app named Byom Smart Testing to make our Tapestry protocol easy to deploy in the future. The app can be accessed at the following link: https://drive.google.com/file/d/1tfrzKfbewVqwOOKhpCb1WplMfsUfAOe0/view?usp=sharing. We are also sharing our code and some amount of data at the link below: https://github.com/atoms-to-intelligence/tapestry. Future work: Future work will involve extensive testing on real data, and extensive implementation of a variety of algorithms for sensing matrix design as well as signal recovery, keeping in mind the accurate statistical noise model.