key: cord-0118458-f9s1x2t3
authors: Cheraghchi, Mahdi; Nakos, Vasileios
title: Combinatorial Group Testing and Sparse Recovery Schemes with Near-Optimal Decoding Time
date: 2020-06-15
journal: nan
DOI: nan
sha: 863fa1c068c2c0fb95fcb612b8f89a97d8cec036
doc_id: 118458
cord_uid: f9s1x2t3

In the long-studied problem of combinatorial group testing, one is asked to detect a set of $k$ defective items out of a population of size $n$, using $m ll n$ disjunctive measurements. In the non-adaptive setting, the most widely used combinatorial objects are disjunct and list-disjunct matrices, which define incidence matrices of test schemes. Disjunct matrices allow the identification of the exact set of defectives, whereas list disjunct matrices identify a small superset of the defectives. Apart from the combinatorial guarantees, it is often of key interest to equip measurement designs with efficient decoding algorithms. The most efficient decoders should run in sublinear time in $n$, and ideally near-linear in the number of measurements $m$. In this work, we give several constructions with an optimal number of measurements and near-optimal decoding time for the most fundamental group testing tasks, as well as for central tasks in the compressed sensing and heavy hitters literature. For many of those tasks, the previous measurement-optimal constructions needed time either quadratic in the number of measurements or linear in the universe size. Most of our results are obtained via a clean and novel approach which avoids list-recoverable codes or related complex techniques which were present in almost every state-of-the-art work on efficiently decodable constructions of such objects.

The study of combinatorial group testing dates back to the Second World War, suggested by Dorfman [Dor43] in the context of testing blood samples collected from a large population of draftees. In an abstract formulation, a population of n individuals contains up to k, for a known parameter k, defectives and tests are conducted to identify the exact set of defectives. Each test identifies a subset of the individuals and returns positive if and only if the set contains at least one defective individual. The basic combinatorial goal is to minimize the number of tests required to identify the exact set of defectives in the worst case. This article focuses on non-adaptive tests where the tests are all pre-determined and can be conducted in parallel. In this case, the test design can be identified by a binary incidence matrix with n columns and one row per test.

Since its inception, group testing has found countless uses both in theory and practice. Practical applications include a wide range of areas such as molecular biology and DNA library screening (cf. [BKB + 95, CD08, KM95, FKKM97, Mac99, ND00, STR03, WHHL06, WLHD08] and the references therein), Human Genome Project (cf. [CD06, Section VI.46]), multiple access communication [Wol85] , data compression [HL00] , pattern matching [CEPR07] , secure key distribution [CDH07] , network tomography [CKMS12] , quality control [SG59] , among others. The reader is invited to consult [DH00, DH06] for a more comprehensive discussion of the application areas. Finally, the original idea of using group testing for pooling samples in medical tests has recently gained renewed interest during the COVID-19 pandemic due to the prevalent shortage of test kits (cf. [ZRB20, BK20, YAST + 20, Tec20, Eur20, Oma20, Ben20, NHL20]).

In theoretical computer science, group testing falls under the broader umbrella of sparse recovery, where the general framework deals with recovery of sparse structures (such as high-dimensional vectors with few nonzero entries or their approximations) via queries from a restricted class (such as linear queries, as in compressed sensing [FR13] , disjunctive queries which define group testing, or by sampling Fourier coefficients of the underlying vector [CT06, HIKP12a, Kap17, NSW19] ). The general area of sparse recovery provides a fundamental toolkit for the study of streaming and sublinear time algorithms, and technology from that area lies at the heart of the latest improvements for Subset Sum [ABJ + 19, BN20] and Linear programming [vdBLSS20] . As a combinatorial construct, sparse recovery and more specifically group testing is related to the notion of selectors [CGÖR00] and related pseudorandom objects.

Disjunct Matrices. The combinatorial guarantee for a test design to allow for identification of the set of defectives is studied in the literature under several essentially equivalent notions, such as superimposed codes, cover-free families (or codes), and disjunct matrices (Definition 2.1 (see [GRS19, Chapter 19] , [DH00, Chapter 4 ] and [DR83] for a detailed discussion). Roughly speaking, a disjunct test matrix for k defectives satisfies the following: for every set S of k columns and a column i / ∈ S, there is a row at which the columns in S have zeros whereas the ith column has a 1. A lower bound of Ω(k 2 log k n) on the number of rows has been proved several times in the literature [DR82, DRR89, Rus94] . The best known construction achieves m = O(k 2 min{log n, (log k n) 2 }) number of rows (by a combination of the Kautz-Singleton construction [KS64] and Porat-Rothschild [PR08] ). The notion of disjunctness can be naturally extended to also allow for accurate recovery in presences of false positives and negatives in the test outcomes.

Two central problems in group testing are explicit construction of test designs and efficient recovery of the defectives from test outcomes. While a simple probabilistic argument can achieve an upper bound of O(k 2 log n) tests (cf. [DH00, Chapter 4] ), an explicit construction (in polynomial time in the matrix size) matching this upper bound [PR08] can be significantly more challenging. From the recovery perspective, any disjunct matrix allows recovery in nearly linear time in the size of the matrix: the decoder can simply output the subset of the columns of the test matrix whose supports are contained in the support of the test outcomes. For large population sizes, however, it is desirable to have a sublinear time recovery algorithm that runs in polynomial time (or even nearly linear time) in the number of tests, which can potentially be exponentially faster than the naive decoder above.

List-Disjunct Matrices. Another important combinatorial object, introduced independently in [Che12, INR10] , is that of a list-disjunct matrix. That object guarantees recovery of a small superst of the defective items, but features the advantage that the number of rows can be much smaller than what a disjunct matrix would allow (essentially by a factor of k). Furthermore, it has a number of additional notable advantages and applications. Using list-disjunct matrices, one can design two-stage group testing schemes, by first narrowing the universe down to a small set, and then performing a test on each one independently. Thus, in scenarios where two-stage testing is possible, for example in DNA library screeening or data forensics [GAT05] , this results in a huge shaving. Moreover, list-disjunct matrices can be used for constructions of monotone encodings and multi-user tracing families [AH09] , vote storage systems [MNS07] , and for designing state of the art heavy hitter sketches (as we show in this work). Last but not least, they can be used as an interemediate tool towards the construction of (efficiently decodable) disjunct matrices; indeed this was the main motivation in [Che12, INR10] .

At least for noiseless testing, there is a simple bit masking trick that can augment any disjunct or list-disjunct matrix with additional rows to enable sublinear recovery (e.g., see [LCPR19] ). The augmentation blows up the number of rows by a logarithmic factor in n, and thus a long line of work has been devoted to obtaining better trade-offs between rows and recovery time.

From now on, we shall refer to decoding time, as the time needed for recovery of the defectives or a small list containing them. We also stress the difference between the "for-all" guarantee (uniform) and the "for-each" guarantee (non-uniform). A matrix satisfies the first guarantee if it enables recovery for all vectors simultaneously, while a randomized matrix (i.e. a distribution over matrices) satisfies the for-each guarantee if it enables recovery of a fixed vector with some target probability. Disjunct and list-disjunct matrices are defined with the for-all guarantee in mind.

Work on sublinear-time group testing and on related problems. Sublinear-time decoding on group testing (including disjunct, list-disjunct matrices, the probabilistic and the nonuniform case) has been explored in [GI04, CM05, CH08, INR10, NPR11, CJBJ17, VJN17, LCPR19, BCS + 19b]. In the context of the similar tasks of heavy hitters and compressed sensing (see below), sublinear-time has been investigated in [GMS05, GLPS10, PS12, HIKP12a, HIKP12b, GNP + 13, LNNT16, Kap16, Nak17b, Nak17a, Kap17, CKSZ17, GLPS17, LN18b, NS19], to name a few.

There is also a decent amount of literature on variants of the group testing problem, such as sparse group testing [GGJZ19] , graph-constrained group testing [Che13, SW19] and threshold group testing [Che13] . Our focus in this paper is the most standard setup of the problem, although our techniques could potentially apply to the aforementioned settings as well.

Heavy Hitters and Compressed Sensing. A closely related problem is the task of finding heavy hitters in data streams. Given a long stream of updates (i, ∆) to a vector x ∈ R n causing x i ← x i + ∆, upon query detect the coordinates i ∈ [n] which satisfy |x i | ≥ (1/k) x p (heavy hitters). The goal is to keep a small-space representation of x which allows finding the heavy hitters quickly, as well as rapid updates. The most interesting and well-studied cases correspond to p = 1 and p = 2. The heavy hitter problem is one of the core problems in streaming algorithms and has also served implicitly or explicitly as a subroutine in many streaming and compressed sensing algorithms; cf. [GMS05, HNO08, KNPW11, HIKP12a, HIKP12b, Iwe13, IK14, GLPS17, JW18] to name a few. It has also been an active area of research with many important results being discovered in the 2000s [CCF02, CM04, CH09] , as well as more recently [BCIW16, Woo16 , BCI + 17, LNNT16, LN18a, AIV19, BGL + 18, BDW19] .

Another closely related area is compressed sensing [CT06, Don06, GLPS10, HIKP12a] , which focuses on understanding the design of a set of linear measurements Φ ∈ R m×n , such that given y = Φx it is possible to recover an approximation to the best k-sparse approximation of x with respect to some ℓ p norm. This problem is analogous to the heavy hitters problem, albeit with the difference that one desires to recover most of the heavy hitters in an ℓ p sense, rather than all of them. Since the literature on the topic is vast, we refer the reader to a survey of Indyk and Gilbert [GI10] , the introduction in [NS19] , and the text [FR13] . Henceforth group testing, heavy hitters and compressed sensing may be referred to using the umbrella term sparse recovery.

Our Contributions. We give several schemes for the sparse recovery problem, almost all of which feature near-optimal (nearly-linear ) decoding time, improving upon several results in the literature, and setting the record straight for some of the most well-studied variants of the problem. We thus show that previous trade-offs in measurement complexity and decoding time can be greatly improved. In particular, we contribute the following.

• Combinatorial Group Testing 1. A construction of list-disjunct matrices with the optimal O(k log(n/k)) number of rows and O(k log 2 (n/k)) decoding time. The best previous sublinear-time scheme in terms of measurements suffered from quadratic decoding time in k and did not achieve optimal number of rows. We thus essentially settle the measurement and the decoding time complexity of list-disjunct matrices.

2. A construction of k-disjunct matrices with m = O(k 2 min log n, (log k n) 2 ) rows and O(m + k log 2 (n/k)) decoding time. Moreover, our construction can use an off-the-shelf construction of disjunct matrices (which may have an inefficient decoder) as a black box, so any improvement on the construction of disjunct matrices will immediately improve our result as well, resulting in a construction of disjunct matrices with the same number of rows and near-optimal decoding time.

3. An explicit construction of k-disjunct matrices with m = O(k 2 log n) rows with decoding time nearly linear in m.

4. State of the art error-correcting disjunct and list-disjunct matrices, associated with decoding procedures which are faster by almost a factor k from previous schemes with the same number of rows.

A scheme with O(k log n) decoding time and measurements for the "for-each" version of the group testing problem, improving upon recent work which obtained the same number of measurements but with quadratic in k decoding time. This result essentially settles the "for-each" complexity of the group testing problem.

• Heavy Hitters 6. A "for-all" streaming algorithm with s = O(k log(n/k)) space usage for the heavy hitters problems in the strict turnstile model, allowing finding a list of size O(k) that contains all (1/k)-heavy hitters. The query time is near-linear in s and the update time is O(log 2 k · log(n/k)). In contrast, the previous algorithm with the same space required Ω(n log n) query time and Ω(k log(n/k)) update time.

A "for-all" streaming algorithm for the standard version of the heavy hitters problem in the strict turnstile model, matching the space usage s of previous constructions and allowing queries in time near linear in s. Previous constructions suffered from Ω(nk) query time.

• Compressed Sensing 8. A significantly stronger ℓ 2 /ℓ 2 weak identification system than what was available before in the compressed sensing literature.

The most efficient previous sublinear-time schemes employed list-recoverable codes technology and the list-decoding view of pseudorandom objects such as expanders and extractors, or related ideas. On the other hand, most of our results stem from a very elegant technique and a unifying result which morally is the following: "There exists a row-optimal (k, 5k log(n/k))-list-disjunct matrix associated with a very efficient decoding procedure". Interestingly, in contrast to list-recoverable codes type of arguments which come with relatively large constants and many parameters to finetune, the aforementioned result and its implications require minimal understanding of coding theory, being of potential practical impact.

When referring to group testing, all matrices and vectors have entries in {0, 1}, with 0 corresponding to "false" and 1 to "true". Without loss of generality, we can assume that k, n are powers of 2 by rounding to the closest power of 2 from above, unless noted otherwise. We will associate [n] with {0, 1} log n via the obvious bijection. Moreover, all matrices are vectors are zero-indexed, that is for vector x ∈ {0, 1} n the entries are x 0 , x 1 , . . . , x n−1 . We also denote supp(x) = {i ∈ [n] : x i = 1}. For binary strings r, s, we write r s to be the concatenation of r and s by writing r followed by s. For a test matrix M ∈ {0, 1} m×n (where the number of columns n is called the population or the universe size), and a vector x ∈ {0, 1} n , the vector y = M ⊙ x corresponds to tests

For i ∈ [n] we denote by M i the i-th column of M and M i to be the i-th row of M . For S ⊆ [n],

When referring to heavy hitters or compressed sensing we will work with the standard notion of addition and multiplication on numbers of Θ(log n) bits. We will say that i is a (1/k)-heavy hitter

For non-negative integers α, ℓ such that α ≤ 2 ℓ −1, we denote by bpref ℓ (α) to be the integer that is obtained from the first ℓ bits in the binary representation of α. For example, bpref 2 ((1100) 2 ) = (11) 2 = 3, and bpref 3 ((11011) 2 ) = (110) 2 = 6, where we have used the notation (·) 2 for the binary representation of an integer).

Definition 2.1 (Disjunct Matrices). A matrix M ∈ {0, 1} m×n is called k-disjunct if for every set S ⊆ [n] of size k, and every j ∈ [n] \ S there exists a row q ∈ [m] such that M q,j = 1 and M q,j ′ = 0, ∀j ′ ∈ S. Definition 2.2 (List-Disjunct Matrices). A matrix M ∈ {0, 1} m×n is called a (k, ℓ)-list-disjunct matrix if for every two disjoint sets S, T ⊆ [n] with |S| = k, |T | = ℓ + 1 there exists an element j ∈ T and a row q ∈ [m] such that M q,j = 1 and ∀j ′ ∈ S, M q,j ′ = 0. Definition 2.3 (Associated list for list-disjunct matrices). Given y = M ⊙ x with M being (k, ℓ)list-disjunct matrix and |supp(x)| ≤ k, we will refer to L as the associated list of x with respect to M as the list of elements i ∈ [n] satisfying the following:

Put simply, L corresponds to the elements i ∈ [n] which appear to be "defective" under measurements defined by M . Note that |L| ≤ k + ℓ.

The above notions can be strengthened to tolerate errors as follows:

of size k and ℓ, respectively, the following holds. Let M S and M T respectively denote the unions of supports of the columns of M picked by S and T . Then, for every set

We could similarly define error-correcting disjunct matrices ([Che12, Definition 1]), but since our results behave differently in the case of false positives and false negatives, the definition in [Che12] is not the most suitable for our needs. We refer the reader to Theorems 3.8,3.9 for error-correcting k-disjunct matrices.

It is shown in [NPR11, Proposition 2] that any (k, ℓ, e 0 , e 1 )-list disjunct matrix guarantees recovery of a set of size less than k + ℓ containing all defective items in presence of up to e 0 false positives and e 1 false negatives in the test outcomes. Lower bounds in [Che12, NPR11] show that (k, Θ(k), e 0 , e 1 )-error-correcting list-disjunct matrices require Ω(k log(n/k) + e 0 + ke 1 ) rows. Similarly, any k-disjunct matrix that can tolerate e 0 false positives and e 1 false negatives requires Ω(k 2 log k n + e 0 + ke 1 ) rows.

In this section, we present our results on disjunct matrices, list-disjunct matrices, group testing and heavy hitters, based on the definitions given in the preliminaries. In what follows, C, C L , C F P > 1 are absolute constants. All our results assume, without loss of generality that k ≤ γn for some absolute constant γ, otherwise storing the identity matrix is asymptotically the best solution. Our starting point and one of our strongest tools is the following theorem.

Theorem 3.1. There exists a (k, C L k log(n/k))-list-disjunct matrix M ∈ {0, 1} m×n with m = C · k log(n/k), allowing decoding in time O(k log(n/k)). M is the vertical concatenation of log(n/k) matrices M (log k) , . . . , M (log n) such that (i) each such submatrix has Ck rows and exactly 1 non-zero per column, (ii) every such submatrix can tolerate up to C F P · k false positives.

Furthermore, M can be stored in O(k log(n/k)) space, and for every ℓ and every choice of B = O(k log(n/k)) columns i 1 , i 2 , . . . , i B ⊆ [n], we can find the rows q i 1 , . . . , q i B ⊆ [Ck] where the aforementioned columns have the non-zero element in M (ℓ) in time O k log 2 n k · log 2 (k log n k ) · log log k log n k .

The last sentence of the above theorem, namely the claim about storing the matrix M in small space and the fast batch location, is particularly important for our application to the heavy hitters problem. For the group testing applications, this property will be irrelevant. The matrix M will be also called identification matrix.

Theorem 3.2 (List-Disjunct Matrices). There exists a (k, k)-list-disjunct matrix M ∈ {0, 1} m×n with m = O(k log(n/k)), that allows decoding in O(k log 2 (n/k)) time 1 .

In comparison, the best previous construction of efficiently decodable list-disjunct matrices requires either k 2 poly(log n) decoding time and O(k log n·log log k n) rows [NPR11] , or O(k log 2 (n/k)) rows and decoding time ( [NPR11] gives another construction using Parvaresh-Vardy codes with much less clean time and measurement bounds and polynomial in k decoding time).

Theorem 3.3 (Disjunct Matrices). There exists a k-disjunct matrix M ∈ {0, 1} m×n with m = O(k 2 min{log n, (log k n) 2 }), that allows decoding in O(m + k log 2 (n/k)) time.

The best previous constructions of efficiently decodable disjunct matrices are: i) [INR10] , which achieves m = O(k 2 log n) rows and Ω(k 4 log n) decoding time, ii) [NPR11] which achieves O(k 2 log n + k log n · log log k n) rows and m log 2 n decoding time. We strictly improve upon the measurement and the decoding time complexity of previous work, obtaining the cleanest bounds.

Theorem 3.4 (Explicit Disjunct Matrices). We can construct in polynomial time in the universe size a k-disjunct matrix with m = O(k 2 log n) rows that allows decoding in time m · poly(log n),

Of course, the small intermediate range of k where the above result does not apply can be eliminated by slightly rounding k up to k 1+o(1) , resulting in m = k 2+o(1) log n rows and m·poly(log n) decoding time for all k.

Theorem 3.5. ("For-all" Heavy Hitters) There exists a streaming algorithm with space usage O(k log(n/k)), which keeps a (non-linear) representation of a vector x ∈ R n , and upon query, if x ∈ R n + then always returns a list L of size O(k) which contains every (1/k)-heavy hitter. The query time is O(k poly(log n)) and the update time is O(log(n/k) · log 2 k).

In contrast to the result appearing in [LNW18] which achieved Ω(n log n) query time and O(k log(n/k)) update time, our algorithm achieves nearly optimal query and update time. The non-linearity of the sketch does not play a role in the number of measurements, but only to achieve the desired update time. It is shown in [LNW18, Theorems 4 ,5] that if we drop the assumption of the strict turnstile model or additionally demand accurate estimates (up to (1/k) x 1 ) of the coordinates in L, then there exists no such linear sketch unless it has Ω(k 2 ) rows.

The next result is a streaming algorithm for the more common version of the heavy hitters problem, where one wants to find every (1/k)-heavy hitter and no i with x i ≤ (1/(2k)) x 1 . This greatly improves upon the scheme appearing in [NNW12] which has the same space usage but requires Ω(nk) query time 2 .

Theorem 3.6. ("For-all" Heavy Hitters with Estimates) There exists a streaming algorithm using space usage O k 2 · min log n, log n log k + log log n 2 , which keeps a (non-linear) representation of a vector x ∈ R n , and upon query, if x ∈ R n + then always returns a list L containing every (1/k)-heavy hitter, and no i ∈ [n] with x i ≤ (1/(2k)) x 1 . Moreover, for every i ∈ L it returns an estimate

where c is an arbitrarily small absolute constant c < 1. The query time is k 2 poly(log n) and the update time is O(k · min log n, log n log k+log log n 2 ) + O(log 3 n).

Remark 3.7. One could also ask whether the update time of k on the above theorem is necessary, or even more interestingly one can decode k-disjunct matrices and perform queries for heavy hitters faster than quadratic time in k. After all, as will be revealed in Section 8, we need to point query only O(k) coordinates, so it is not immediately evident that that the quadratic time bound in k is necessary (we might need Ω(k 2 ) measurements, but an algorithm might not need to read all of them). However, it seems that performing point-queries is indeed a bottleneck, since i) an easy argument (which we leave to the reader) shows that any k-disjunct matrix must have at least n − m columns of sparsity at least k, and ii) any (1/k)-incoherent matrix (from which known heavy hitters sketches follow) must have column sparsity Ω(k) as long as m ≤ n/ log k [NN13, Theorem 10]. This constitutes strong evidence that it is impossible to beat quadratic decoding/ query time and linear (in k) update time, unless using a near-linear in n number of measurements.

It is also worth noting that any subsequent improvement of sketches that enable ℓ 1 point-queries immediately translates, via our framework, to a streaming algorithm with sublinear query time. Thus, we may consider sublinear-time query time essentially closed up to log factors.

We give the following two constructions of efficiently decodable matrices. The first is particular efficient for false negatives, while the second for false positives. Both results are significantly faster than what was attainable by previous techniques using the same number of rows. We find it quite interesting that while we are able to construct fast error-correcting disjunct matrices with respect to either false positives or to false negatives, we cannot construct fast error-correcting disjunct matrices that can facilitate both simultaneously, see an explanation in Section 10. It would be interesting to have a combination of these methods which could give the best of both worlds. The first result of the preceding theorem improves by almost a k factor what is attainable by the techniques in [NPR11, BCS + 19a] (see also the comment following); the techniques in [CJBJ13, CJBJ17, LCPR19] result in schemes with strictly larger number of rows 3 . For both results in Theorem 3.8, the argument in [NPR11] obtains near-linear decoding albeit with a slight loss of log log k n; [CJBJ13, CJBJ17, LCPR19] can be modified to obtain near-linear decoding but with a log n factor overhead in the measurement complexity.

Theorem 3.9. (False Negatives) There exists a k-error-correcting disjunct matrix achieving m = O(k 2 log n + k · e 1 ), which can tolerate up to ke 1 false negatives and allows decoding in time O(m · poly(log n)).

This theorem improves upon what was known and achievable using previous techniques, both in terms of measurements and decoding time, and achieves the optimal dependence in terms of e 1 , the number of false negatives.

Comment on the conversion in [NPR11] . The authors in [NPR11] , using a black-box conversion (Section 3 and subsection 5.2), claim a construction of (k, n, e 0 , e 1 )-list-disjunct matrices with m = O(k log n · log log k n + log log k n · e 0 + k log log k n · e 1 ) rows and m 2 poly(log m) decoding time; this can be derived by Corollary 12 in that paper by plugging in an optimal construction of error-correcting list-disjunct matrix (in exactly the same spirit as Corollaries 13 and 14, but without the demand of explicitness). However, we note that Theorem 11 in that paper is incorrect (in that the disjunctness property with the given adversarial error bounds e 0 , e 1 indeed hold combinatorially, but the given decoding algorithm cannot attain the reported error tolerance), and thus Corollaries 12, 13, and 14, as well as Corollary 22 and Theorem 24 are not totally correct with respect to all parameters. The reason is that in Theorem 11 the authors write down the recursive relation in terms of the universe size, but it should also incorporate the number of false positives and false negatives, i.e. t(i, e 0 , e 1 ) instead of t(i). This translates to an existence of a (k, n, e 0 , e 1 )-list-disjunct matrix with m = O(k log n · log log k n + log k n · e 0 + log k n · e 1 ) rows instead; the log log k n factors in the false positives and false negatives in Corollaries 12, 13, and 14 (which are written down as log log n factors) should be log k n instead, and the log log n factors in Corollary 22 and Theorem 24 should be log n. To be more descriptive, the conversion in Theorem 12, Corollary 12 from [NPR11] builds a binary recursion tree with log k n nodes, and uses O(k log(n 1/2 i /k) + e 0 + ke 1 ) rows for every node in the i-th level of the tree; thus, the factor k log n is indeed multiplied by the number of levels, i.e. log log k n, but the factor that multiplies e 0 + ke 1 is the total number of nodes, which is log k n. Another way to see why the bound claimed in [NPR11] is not achievable by that construction, observe that an adversary can put all their available false negatives and/or false positives in the list-disjunct matrix in one of the leaves of the recursion tree, totally sabotaging the decoding procedure.

Theorem 3.10. There exists a randomized construction of a matrix {0, 1} ∈ R m×n with m = O(k log n) such that the following holds. Given y = M ⊙ x with |supp(x)| ≤ k, we can find x in time O(k log n), with failure probability e −Ω(k) + 1 poly(n) . This theorem improves upon the recent work of [BCS + 19b], which achieved the same number of rows but required quadratic running time in k. Our result essentially settles the non-uniform case of the group testing problem.

One of the central problems in compressed sensing is the design of an ℓ 2 /ℓ 2 scheme, which is a matrix Φ ∈ R m×n , such that given y = Φx we can find x ′ satisfying

The goal is to randomly design Φ satisfying the above with the optimal number of rows, and enabling computation of such an x ′ in sublinear-time (it can be proved that it suffices to pick x ′ to be O(k)-sparse). Almost all sublinear-time algorithms (precisely, all but [NS19] ) proceed by reducing the problem to the construction of an ℓ 2 /ℓ 2 weak identification)system 4 . This is a matrix Ψ ∈ R m×n such that given y = Ψx we can find x ′ satisfying (x − x ′ ) −k/2 2 ≤ (1 + ǫ) x −k 2 with probability 1 − δ; here x −k is the vector that occurs after zeroing out the largest in magnitude coordinates in k. For yet another intriguing consequence of our techniques we give the strongest weak identification ℓ 2 /ℓ 2 system available in the literature. On how that translates to ℓ 2 /ℓ 2 schemes, we refer the reader to Section 11.

Theorem 3.11. There exists a randomized construction of an ℓ 2 /ℓ 2 weak identification system with

which allows finding the desired x ′ in time O(m log 2 m).

Comparison with previous work follows.

• The construction in [GLPS10] requires m = Θ((k/ǫ) log(n/k) · log(1/δ)).

• The construction in [LNW18] achieves m = O((k/ǫ) log(n/k) + ǫ −1 log(n/k) log(1/δ)), but in order to run in near-linear time in m storing an additional inversion table of size Ω(n) is required.

• The strongest result in [PS12, GNP + 13] obtains

and decoding time Ω((k/ǫ) 2 1/α ) poly(log n)), for any a < 1. The main source of sub-optimality is the invocation of a list-recoverable code based on the Loomis-Whitney inequality [NPRR18] .

To the best of our knowledge, our work is the first to construct a near-optimal weak system with near-optimal decoding time (without using an additional inversion table as in [LNW18] ). In fact, we are able to obtain stronger results for the general ℓ 2 /ℓ 2 problem, but the argument is very lengthy and somewhat outside the scope of the technical contribution of this paper, so we decided to leave the most general result for a future publication.

First of all, using bit-masking it is possible to augment the standard linear-time constructions with the identity code in order to obtain a disjunct matrix with O(k 2 log 2 n) rows and similar running time, and a list-disjunct matrix with O(k log 2 (n/k)) rows and similar running time, as for example in [LCPR19] . This argument applies also to the heavy hitters problems and the variants of compressed sensing, by resorting to the expander code [Spi96] . Many endeavors in the the sparse recovery literature have been dedicated to obtaining the "ultimate goal" of optimal measurements and near- The approach of [Che12] lies in connecting disjunct and list-disjunct matrices (and related objects that suffice for the group testing problem) to extractors and expanders, and then using the nice list-recoverability properties of specific instantiations of those objects. The approach of [INR10] is again closely related to list-recoverable codes. In a nutshell, the authors consider code concatenation, where the inner code is a random code that gives a list-disjunct matrix, and the outer code is a Reed-Solomon code. The concatenated code enables list-recovery, returning a list of size O(k 2 ), which can then be filtered out using Lemma 5.3. One of the crucial observations of that work is that it is sufficient to use an inner code which forms a list-disjunct matrix instead of a disjunct matrix, an approach that would require Ω(k 3 log n/ log k) rows.

The first approach of [NPR11] for list-disjunct is again a reduction to list-recoverable codes, making use of the list-decoding view of Parvaresh-Vardy codes. The second approach is to apply recursively the trivial list-recoverable code of block size 2. In particular, they group together coordinates that agree in the first (log n)/2 bits, and use a list-disjunct matrix in this instance to obtain a list L 1 . Second, they group together coordinates that agree in the second (log n)/2 bits and obtain a list L 2 using, again, a list-disjunct matrix. Thus, they can guarantee that the set of defective items lies in the set L 1 × L 2 , which is much smaller than n. Applying the same idea recursively and observing that the universe shrinks by a square root factor in each recursive call, yields in total a running time of O(k 2 poly(log n)) and a slight sub-optimality in the number of rows. This technique has also found use in subsequent works, such as [PS12, GNP + 13, LN18a, IKWÖ19]. Instead of the trivial list-recoverable code, a slightly more efficient one based on the Loomis-Whitney inequality [GNP + 13, NPRR18] can be used, but this leads to a much less clean trade-off between the number of rows and the decoding time, and the total scheme is still sub-optimal in both measurement and time complexity.

There are many other papers which (possibly with minor modifications) implicit or explicitly construct efficiently decodable list-disjunct matrices, [CJBJ13, CJBJ17, LCPR19, BCS + 19a] to name a few. However, all of those approaches fall short of bypassing the barriers mentioned in the first paragraph of this section and thus obtaining the "ultimate goal" of group testing.

Our approach departs from previous work and is inspired by the solution for the case k = 1. First of all, we observe that it suffices to solve a slightly weaker-but as it turns out crucially easierproblem: find a list of size O(k log(n/k)) that contains all defective items, or in other words construct a (k, O(k log(n/k)))-list disjunct matrix, the decoding routine of which appears to be linear in the number of rows, as we show below. In particular, for every ℓ ∈ {log k, . . . , log n}, we combine every coordinate which agrees in the first ℓ bits into a single coordinate, and hash this to O(k) buckets; this yields in total O(k log(n/k)) samples. Our algorithm then starts from ℓ = log k and processes prefixes in increasing length ℓ, trying to gradually find the prefixes of all the defective items, by discarding prefixes that do not participate in a positive test. Our proof roughly proceeds by showing that the number of possible trajectories of the non-defective items can be upper-bounded by the Catalan number of order Θ(k log(n/k)). The hashing scheme then allows for a union-bound over all possible supports and all possible trajectories. This immediately gives that the output list might contain up to O(k log(n/k)) coordinates, which completes the proof. Our argument can facilitate up to O(k) false positives, or even up to O(k log(n/k)) if they are not "very" adversarially chosen; the latter guarantee (more than) suffices for our application to the heavy hitters problem.

For the error-correcting schemes and the ℓ 2 /ℓ 2 weak system we need delicate twists in the hashing scheme and more involved analyses which make use of the generalized Catalan numbers to bound the running time. Unfortunately for ℓ 2 /ℓ 2 compressed sensing the argument is not as clean (though the algorithm is still quite compact), but happens to be quite subtle on the technical level, mostly because one needs to handle dependent events in a careful manner. Our approach is distinct from every previous work on the topic in several ways. Lastly, we also provide a novel reduction from near-linear decodable error-correcting disjunct matrices to "less" efficiently decodable error-correcting list-disjunct matrices, which does not require list-recoverability as the reductions in [CH09, INR10] .

From a coding theoretic perspective, one can view that the key to our progress is bypassing the "for-all" demand of list-recoverable codes: list-recovery ensures that for all choices of lists corresponding to symbols of the codeword, some "desirable" condition holds, namely that the set of possible codewords agreeing with all symbol-lists is small. In our case, we show that this is not needed, since the lists not only depend on the hidden set of defective items, but also are formed in a serial fashion, while one learns the bits of the defective items in chunks of appropriate size.

Connection to Tree Codes. The above discussion might bear similarities to the constructions of tree codes [Sch96] . For alphabets Σ, Γ and parameters s, δ, a (truncated) tree code is a function

In our case, one could imagine setting s = log n, Σ = {0, 1} and Γ = {0, 1} log k+O(1) to pass to an efficiently decodable k-disjunct matrix. However, it turns out that the structural condition demanded by tree codes is too strong (and rather inflexible when trying to perform a disjunct or list-disjunct matrix construction) for group testing applications, as tree codes demand a worst-case Hamming distance bound between any two strings x, x ′ . On the other hand, what our approach requires is a Hamming distance bound that holds on average.

Sublinear-Time Sparse Recovery Frameworks. In the sparse recovery literature, there are in principle two available frameworks for sublinear-time decoding. The first includes, as mentioned in the first paragraph of this section, bit-masking with an error-correcting code. This approach is particularly effective in compressed sensing tasks (such as ℓ 2 /ℓ 2 ) where it is not necessary to detect every heavy coordinate of the vector x ∈ R n . In that case, one can recover a constant fraction of the coordinates and then exploit the linearity of the sketch in order to set up a clean-up process. This can be achieved by subtracting from x the detected coordinates [BI11, GLPS10, HIKP12a, HIKP12b, GNP + 13, Kap16, Kap17, LNW18, CKSZ17, CI17], in order to recover a heavy enough subset of the top k coordinates rather than all of them. This bit-masking plus clean-up process is less effective for heavy hitters, because one desires to recover all of them rather than a constant fraction. It is not applicable to group testing, for subtraction is not possible in that model. Moreover, bitmasking always results in measurement-sub-optimal schemes in the "for-all" case in every variation of the sparse recovery problem.

The other available framework is based on list-recoverable codes and related techniques [Che12, INR10, NPR11, GNP + 13, LNNT16, GLPS17, LN18a] . This type of machinery is more powerful in the "for-all" model, yielding better bounds in terms of measurement complexity, but it usually comes with polynomial in k decoding time and complicated algorithms.

Our approach adds one more framework to the sparse recovery toolkit, and we demonstrate its power by obtaining a sequence of new and essentially optimal results that were not possible using previous arguments. We believe that further progress in the field could stem from hybrid approaches which combine more than one framework.

We will use the following technical tools throughout the work.

Definition 5.1 (Naive Decoder). Given y = M ⊙ x, for every i ∈ [n] declare i defective if and only if y r = 1 for every r such that M r,i = 1.

The following are two well-known corollaries on the performance of the naive decoder in disjunct and list-disjunct matrices.

Lemma 5.2 (Naive Decoder and Point-Queries for Disjunct Matrices). We shall the following well-known constructions of k-disjunct matrices, which follows from standard constructions of incoherent matrices (based on either Reed-Solomon codes by Kautz and Singleton [KS64] or codes on the Gilbert-Varshamov bound [PR08] by Porat and Rothschild) 5

Theorem 5.4 (Disjunct Matrices). There exists a k-disjunct matrix with

rows. In particular, there exist explicit k-disjunct matrices with i) O(k 2 log n) rows and O(k log n) non-zeros per column (via [PR08] ), and strongly explicit k-disjunct matrices with ii) O(k(log k n) 2 ) rows and O(k log k n) non-zeros per column (via [KS64] ).

Theorem 5.5 ( [Che12] ). There exists a (k, k)-list-disjunct matrix with O(k log(n/k)) rows and O(log(n/k)) non-zeros per column.

Lemma 5.6 (generalized Catalan numbers, [Slo07] ). For natural integers d, n ≥ 2, the number of rooted d-ary trees with exactly n nodes is

In this section we present our construction of the identification matrix M ∈ {0, 1} m×n . This matrix has m ≤ Ck log(n/k) rows and n columns. As has already been stated, it is the vertical concatenation of log(n/k) matrices

each consisting of Ck rows.

As also stated in the Preliminaries, we will pick n to be a power of 2 and identify [n] with {0, 1} log n via the obvious bijection.

The construction appears in 1. In each matrix M (ℓ) , all columns that agree in the first ℓ bits of their binary representation will have a 1 in the same row. Then a random function h ℓ : {0, 1} ℓ → {0, . . . Ck − 1} is chosen 6 , which maps (groups of) elements to rows.

One can view each M (ℓ) matrix as a hashing scheme: all i ∈ {0, 1} log n are first mapped via bpref ℓ to 2 ℓ values, which are in turn hashed to O(k) buckets; this means that if two i, i ′ ∈ {0, 1} log n agree in their first ℓ bits, then they will necessarily contribute to the same measurement in M (ℓ) .

Algorithm 1 Construction of the identification matrix M 1: procedure CreateMatrix(k) 2:

for ℓ = log k to log n do 3: In what follows, we will not explicitly use the definition of list-disjunct matrices the way they are stated, but we will argue the desired guarantees of our identification matrix ad-hoc, from first principles. In the following we define B = C · k. We also remind the reader that M consists of the vertical concatenation of M (log k) , . . . , M (log n) . We will say that a prefix p of length ℓ contains an item i if bpref ℓ (i) = p. We assume that we work in a machine with word size Ω(log n), such that indexing and concatenating strings of length Θ(log n) takes O(1) time.

We are now ready to proceed with the proof of Theorem 3.1. For that, we continue with two lemmas. The first lemma proves the easy fact no defective will be left out of the output list L.

Lemma 7.1. If i is defective, then at the end of the execution of Algorithm 2, i ∈ L.

Proof. For ℓ ∈ {log k, . . . , log n}, the value of z in 6 depends on h ℓ (bpref ℓ (i)) and will always be 1. Hence, in the next line the test will always evaluate to false, and i will not be discarded from the list. In the end, when ℓ = log n, we conclude that i will be in L.

What remains is to bound the size of the list L that our algorithm outputs. We will show that with probability 1 − e −C 1 k log(n/k) the list will always have C L k log(n/k) items in it, where C 1 is an absolute constant. To prove Lemma 7.2, we note that the execution of our decoding algorithm produces a binary forest, consisting of k trees rooted at level log k. First of all, define the tree binary tree T of length log n rooted at the empty string; taking a path to the left appends 0 to the current string, otherwise it appends 1. A string p of length ℓ has children p 0 and p 1 of length ℓ + 1. Moreover, the relation T ⊆ tree T will denote the fact that T is a connected sub-tree of T rooted at the empty string.

Given the above definitions, we can think of our algorithm as performing the following natural procedure: It starts with all strings of length log k, maintaining a list L of possible prefixes. In the beginning all k possible binary strings are in L. For each length ℓ it considers every p ∈ L, and checks whether it participates in a negative test. If this is not the case, p is replaced by the two prefixes that can be obtained by appending a 0 or a 1 to it. The prefix p is always discarded. For a given k-sparse vector x ∈ {0, 1} n , we define the trajectory τ x of the items to be the set of all possible prefixes p of x that might be inserted in L at some point during the execution of the algorithm and are not discarded in Line 8. We also refer to the trajectory of the defective items, and denote it, for a vector x ∈ {0, 1} n , as τ ′

x , as the set of possible prefixes p containing a defective item which will be inserted in L at some point during the execution of the algorithm, for some ℓ ∈ {log k, . . . , log n}. It should be clear that τ x (and τ ′ x ) is a forest consisting of k trees rooted at level log k, and with at least |supp(x)| leaves because of Lemma 7.1. The proof proceeds by showing that for any x ∈ {0, 1} n with |supp(x)| ≤ k it holds that |τ x | + |τ ′ x | = O(k log(n/k)) with probability 1 − e −Ω(k log(n/k)) , which then implies Lemma 7.2.

Proof of Lemma 7.2. Define the forest f x = τ x \τ ′ x , i.e. the trajectory of the non-defective items once they are separated from the defective items. It obviously suffices to prove that |f x | ≤ C L k log(n/k), since L at the end can contain at most k + |f x | ≤ k + C L k log(n/k) elements. Similarly, the prefixes inserted in L are in total |τ x | + 2|τ ′ x | ∈ O(k log(n/k)), hence the bound on the running time. Fix E (log k) , . . . , E (log n) ⊆ [Ck] of size at most C FP · k, which correspond to the false positives in each of the matrices M (log k) , . . . , M (log n) . For a prefix p of length ℓ ≥ log k, define a binary random variable Y p , such that Y p = 0 if and only if

Observe that for p not containing a defective item we have that

In words Y p captures an event where p participates in a positive test (i.e., either by existence of a defective or a false positive). For p of length smaller than log k we deterministically set Y p = 1. We now have that

In the above, (1) follows by the observation that f x can be extended to a binary tree rooted at 0 by adding k + k log(n/k) additional nodes, (2) follows by a union bound over all binary trees with (C L + 1)k log(n/k) + k internal nodes, and (3) by Lemma 5.6. We now have that

where we have used the fact that n k is increasing for 0 ≤ k ≤ n/2 and the standard inequality a b ≤ (ae/b) b for integers a, b. By choosing C ≥ 2 · (4 3 (1 + C FP )) and C L such that 2 C L ≥ e 3+C 1 C, we obtain that the latter bound is at most e −C 1 k log(n/k) . This completes the proof of the lemma.

We are now ready to prove Theorem 3.1.

Proof of Theorem 3.1. The proof, apart from the last sentence of the statement, follows immediately by combining Lemmas 7.2 and 7.1. For the other part, since as mentioned every h ℓ is a O(k log(n/k))-wise independent hash function, storing all of them in a straightforward way would require Ω(k log 2 (n/k)) words of space, which is prohibitive. However, we may observe that h ℓ have the same range and domains of exponentially increasing size, so we can pack them in a single hash function. Let g : {0, 1} log n+1 → {0, . . . , Ck − 1} be a random ((C L + 1)k log(n/k))-wise independent hash function. For log k ≤ ℓ ≤ log n define h ℓ as the restriction of g on the binary strings the value of which in the decimal system is in the set {2 ℓ , 2 ℓ +1, . . . , 2 ℓ+1 −1}. Now, all we need for Lemma 7.2 to go through is that every (C L +1)k log(n/k) points are independently mapped to {0, 1, . . . , Ck−1} under application of potentially different h ℓ ; this is guaranteed by the ((C L + 1)k log(n/k))-wise independence of g. Furthermore, we shall use as g the standard construction of κ-wise independent hash functions with κ degree polynomials, for κ = (C L +1)k log(n/k). 

In this Section we show how to use Theorem 3.1 to obtain Theorems 3.2, 3.3, 3.5, 3.6 and 3.10.

Proof of Theorem 3.2. We augment the matrix guaranteed by Theorem 3.1 with the matrix guaranteed by Theorem 5.5. By the first matrix we can find a list L of size O(k log(n/k)), which in turn can be filtered out by the decoding algorithm in time O(log(n/k)) per element, using Lemma 5.3.

Proof of Theorem 3.3. We augment the matrix guaranteed by Theorem 3.2 with the matrix guaranteed by Theorem 5.4. By the first matrix we can find a list L of size 2k, which in turn can filtered out by the decoding algorithm in time O(k log k n) per element, using Lemma 5.2. The total number of rows is O(k log(n/k) + k 2 min{log n, (log k n) 2 }) = O(k 2 min{log n, (log k n) 2 }) and the running time is O(k log 2 (n/k) + m). In particular, peforming point-queries using part i) of Theorem 5.4 can be done in time O(k) · O(k log n) = O(k 2 log n), whereas using part ii) of Theorem 5.4 can be done in time O(k) · O(k log k n) = O(k 2 log k n). In both cases, we shall obtain the claimed running time.

Proof of Theorem 3.5. Let us first prove the theorem in case where we can store the whole matrix M , and afterwards turn the obtained scheme to a streaming algorithm with the desired guarantees. We augment the matrix M guaranteed by Theorem 3.1 with the matrix guaranteed by [LNW18] , along with a single-row matrix consisting of the all 1s vector. The third matrix gives us x 1 . The second matrix has O(log(n/k)) non-zeros per column. Moreover, it allows filtering out (via pointqueries) any list L of i ∈ [n] in time |L| log(n/k), similarly to list-disjunct matrices, such that we are left with a list of size O(k) that contains all i ∈ L with x i ≥ (1/k) x 1 . Using the first matrix, we shall show that given y = M x with x ∈ R n + we can find a list of size O(k log(n/k)) that contains every (1/k)-heavy hitter. Combining with the second matrix we shall obtain the desired result. We set each measurement (bucket) to 1 if y q ≥ x 1 /k, and 0 otherwise; we can implement this test since we know x 1 exactly. Thus, we obtain a y ∈ {0, 1} Ck log(n/k) , on which we run the group testing procedure guaranteed by Theorem 3.1. Note that if a (1/k)-heavy hitter participates in a measurement q then y q = 1. Otherwise, in each of the sub-matrices M (log k) , . . . , M (log n) there can be at most k false positives, since in each sub-matrix every i ∈ [n] participates in exactly one measurement. The guarantee of Theorem 3.1 yields the desired result.

We now turn the above scheme to an efficient low-space data structure. We first pick the lowspace representation of M guaranteed by Theorem 3.1 and observe that for a fixed ℓ all the fetches in Line 6 of Algorithm 2 can be performed in time O(k log(n/k) log 2 (k log(n/k)) · log log(k log(n/k))) by the last sentence of Theorem 3.1. Since there are only log(n/k) levels, we get a k poly(log n) decoding time. Implemented naively, the update time is O(k log 2 (n/k)), as we have to evaluate h log k , . . . , h log n , each being O(k log(n/k))-wise independent.

To improve the update time, we invoke an argument from [KNPW11, AY20] (which will result in a non-linear sketch). First of all, we can keep a buffer of size Θ(k log(n/k)) which stores updates (i, ∆), performing all of them (and flushing the buffer) when it fills up or when a query comes. Note that an update is not performed upon arrival, but only when the buffer is full or upon query. All the updates can be performed in time O(k log 2 (n/k) log 2 (k log(n/k)) log log(k log(n/k))), using the fast batch location of Theorem 3.1. This yields an amortized cost of O(log(n/k) log 2 k). We shall show how to de-amortize this cost, for a worst case cost of O(log(n/k) log 2 k). We shall keep two buffers buf 0 , buf 1 of size B in words, for B = O(k log(n/k)). Each time the algorithm receives an update, it puts it to buf 0 . Once buf 0 reaches its maximum size, the next update is put in buf 1 and in parallel buf 0 is flushed by performing all the updates for O(k log 2 (n/k) · log 2 (k log(n/k)) log log k/B) = O(log(n/k) · log 2 (k log(n/k)) log log k) steps using the fast batch location detection guaranteed by Theorem 3.1 (this means that we perform a certain number of operations and then pause, waiting the next update, and continue the execution once the next update comes). Once buf 1 becomes full, the roles of buf 0 and buf 1 are switched. Upon query, we may flush both buffers by using the fast batch location detection guaranteed from Theorem 3.1. This de-amortizes the update time, spreading it over O(k log(n/k)) steps, and increases the query time by an additive O(k log 2 (n/k) log 2 (k log(n/k)) log log(k log(n/k))) = O(k poly(log n)) factor. We can similarly de-amortize the update time of the algorithm in [LNW18] . Putting everything together gives the desired result.

Proof. (Theorem 3.6) It is proved in [NNW12] that there exists a matrix with k 2 min log n, log n log log n + log k 2 rows and k log n column sparsity, which allows deterministic ℓ 1 point queries for the heavy hitter problem, i.e. find x ′ i such that |x i − x ′ i | ≤ (c/k) x 1 . Those constructions are the same as the standard constructions of k-disjunct matrices, i.e. via a random code (hashing to Θ(k) buckets, and repeating Θ(k log n) times) or via Reed-Solomon codes with alphabet and block size equal to Θ(k log n/(log k + log log n)). Moreover, they have low-space representations, as for the former only constant-wise independent hash functions per repetitions are required, while the latter is strongly explicit. Moreover, we can perform a point query and an update using the first matrix in time O(k log n). For the Reed-Solomon construction, every column i ∈ [n] corresponds to a polynomial of degree O(log n/(log log n + log k)), and every repetition corresponds to a symbol of the codeword; thus we can perform updates and point-queries for i ∈ [n] in time O(k(log n/(log k + log log n)) 2 ).

These matrices are 0, 1 matrices, and thus for the strict turnstile model we easily have that

Augmenting the sketch constructed in Theorem 3.5 with the aforementioned sketch of [NNW12] , the query algorithm finds the list L guaranteed by Theorem 3.5, and then discards all i ∈ L with x ′ i < (1/k) x 1 . This yields the desired result. We note that using our approach we can also obtain a streaming algorithm with s = O(k 2 log 2 n) for ℓ 1 heavy hitters in the turnstile model where we still care about all i ∈ [n] with |x i | ≥ (1/k) x 1 and we do not have the assumption that x i ≥ 0; the query time is O(m log n). We sketch how. We pick O(k log n) permutations of [n], and run in each permuted universe a variant of the decoding algorithm in Theorem 3.5 (starting from level ⌈log(Ck)⌉ for some large enough constant C), where instead of implementing the test we keep the top O(k log n) coordinates with the largest estimates for every ℓ ∈ {⌈log(Ck)⌉, . . . , log n}. It can be proved that this algorithm returns a list of size O(k log n) that contains all ℓ 1 heavy hitters; part of the argument is common to the one in Section 11. Using the sketches in [NNW12] we can find point estimates for every x i , i ∈ L, and we may keep the top 2k coordinates. This result is not novel, since a scheme with the same guarantee can be obtained by bootstrapping the incoherent matrix of [NNW12] with a linear-time decodable errorcorrecting code that can correct a constant fractions of errors [Spi96] . What is possibly novel is that error-correcting code machinery is not necessary in order to obtain O(k 2 log 2 n) running time. It is known how to achieve a smaller number of rows [LN18a] , but the dependence of the decoding time on k was k 7 poly(log k) and the argument was significantly more complicated.

Proof. (Theorem 3.10) We augment the matrix M guaranteed by Theorem 3.1 with two matrices M ′ , M ′′ . The matrix M ′ has O(k log(n/k)) rows and n columns, with each column having exactly one 1 in a random position. The matrix M ′′ has O(k log n) rows and it is the vertical concatenation of O(log n) matrices, each one with O(k) rows: each column in each sub-matrix has exactly one 1 at a random position. As before, using M we obtain L of size O(k log(n/k)) which contains all defective items. Using M ′ we can next filter out the non-defective items from L to obtain a list of size O(k) in O(k log(n/k)) time using standard point-queries. The probability that a non-defective item participates in the same measurement with a defective item in M ′ is O(1/ log(n/k)). Thus, by an application of the additive form of the Chernoff Bound we shall discard all but O(k) nondefective items in L, with probability 1 − e −Ω(k) . Filtering out the rest of items using M ′′ to obtain the defective items can be done in O(k log n) time; the proof of correctness is standard.

It is natural to ask whether our construction can be derandomized in order to obtain explicit listdisjunct and disjunct matrices. For k = O(1), for example, there are polynomially many events involved in the analysis of Theorem 3.1, and we can use the method of conditional expectations to derandomize the hashing scheme in polynomial time. For large k, it could be possible to perform the method of conditional expectations by exploiting the tree-like structure of the events. However, we note that it seems getting list-disjunct matrices with optimal number of rows may be out of reach at the moment, since the notion is closely connected with explicit constructions of extractors and unbalanced bipartite expanders (more generally, [Che12] allows the whole spectrum of "condenser graphs" ranging from lossless expanders to extractor graphs). Indeed, all explicit constructions of list-disjunct matrices we are aware of follow essentially by such expander graphs, and the notion of list-disjunctness is essentially a notion of expansion on sets of size k. In fact, this can be made rigorous at least for certain natural cases. When the decoder algorithm is the naive decoder that performs point queries, this can be seen as the list-decoding view of expander graphs (where the test matrix is the adjacency matrix of a left-regular bipartite graph) as described by Vadhan 7 [Vad10, Section 5]. For disjunct matrices, however, explicit constructions are obtained via incoherent matrices where things are better understood. In particular, we know a polynomial-time construction of a (1/k)incoherent matrix [PR08] with O(k 2 log n) rows, and thus of a k-disjunct matrix. Trying to derandomize the scheme in Theorem 3.1, we found a surprisingly simple construction of explicit k-disjunct matrices. It should still be the case, however, that one can use the additional k factor allowed by k-disjunct matrices to derandomize a variant of Theorem 3.1, by resorting to a bound over a smaller number of events. The challenge is to formalize the correct notion of Hamming distance on average after the split point, as also mentioned in the discussion on tree codes in Section 4, if that's possible.

We shall make use of the following two Theorems. Theorem 9.2 (Corollary 14 in [Che12] ). There exists a strongly explicit (k, O(k))-list-disjunct matrix with O(k · 2 O((log log k) 2 ) log n) rows. The column sparsity is 2 O((log log k) 2 ) log n.

We are now ready to proceed with the proof of Theorem 3.4. The idea is that for small k there is a small number of events to perform the method of conditional expectations, while as k gets larger we can construct a (sub-optimal) efficiently decodable list-disjunct matrix exploiting the additional multiplicative k factor we have in our possession. 14: end procedure

Proof. (Theorem 3.4) For k = O(log n/ log log n) Theorem 9.1 gives O(k 2 log n) rows and decoding time poly(k, log n) = k 2 poly(log n). We shall focus on k = ω((log n/ log log n) 1+o(1) ). The algorithm 7 Vadhan's characterization [Vad10, Proposition 7] is qualitatively the following: For an unbalanced bipartite graph, and a right vertex T , let LIST(T ) denote the set of left-vertices whose neighbors all fall in T . Then, the graph is an expander iff for every T , the set LIST(T ) is small. On the other hand, the naive decoder for list disjunct matrices, when seen as a graph, precisely computes the set LIST(T ) from the test outcomes T , and list disjunctness ensures that this list size is small. supported on T , and list disjunctness ensures that this list size is small. It is worthwhile to mention that for the related notion of binary compressed sensing matrices, an equivalence with expander graphs has been proved in [BGI + 08, Theorem 2].

uses a similar hierarchical decomposition as in Theorem 3.1, albeit with a different argument and functionality. Let D = log n/ log log n, rounded to a power of 2, d = log D and H = Θ(D) be chosen such that n ≤ D H = Θ(n). For each ℓ ∈ [H] we shall keep an appropriate disjunct matrix M (ℓ) to guide the search of the defective items, in the following way. A (k, O(k))-list-disjunct matrix is initially constructed over the universe {0, 1} d·ℓ (Line 3 in Algorithm 3) using Theorem 9.2, and then extended to a matrix M (ℓ) over {0, 1} log n , by grouping together coordinates i ∈ {0, 1} log n with the same value bpref dℓ (i) (Line 3 in Algorithm 3). The total number of rows is

for our regime of interest. The decoding algorithm processes the prefixes in a similar way as Algorithm 2. The algorithm proceeds in iterations, by processing strings of length ℓd in the ℓ-th iteration. At all times, it maintains a list L of size O(k), such that the end L contains all defective items. In the ℓ-iteration, for p ∈ L it performs point-query on p using M (ℓ) and Lemma 5.3 in order to decide whether to proceed and examine the d elements p ′ ∈{0,1} d p p ′ . Correctness is immediate by the definition of list-disjunct matrices. The total running time is

In the end of the algorithm, we will find a list L of size O(k) which contains every defective item. Using the standard explicit construction of k-disjunct matrices with O(k log n) column sparsity from [PR08] we can find exactly the defective items by Lemma 5.2.

This section is devoted to proving Theorems 3.8 and 3.9.

Theorem 10.1 (Error-Correcting Disjunct and List-Disjunct Matrices [Che12, NPR11] ). There exist 1. A (k, k, e 0 , e 1 )-error-correcting list-disjunct matrix with m = O(k log(n/k) + e 0 + ke 1 ) rows.

The column sparsity is O(log(n/k) + e 0 /k + e 1 ).

A k-disjunct matrix with m = O(k 2 log n + e 0 + ke 1 ) rows which can tolerate up to e 0 false positives and e 1 false negatives. The column sparsity is O(k log n + e 0 /k + e 1 ).

First of all, let us reduce to the case where e 0 ≤ k log k (this reduction applies also to the case of presence of false negatives, as can be easily inferred). Indeed, to construct a (n, k, e 0 , 0)-list-disjunct matrix we may vertically concatenate 1 + e 0 k log k error-correcting (n, k, e ′ 0 , 0)-list-disjunct matrices with e ′ 0 ≤ k log k. Then, in order to decode we run in parallel the decoding procedure on all those matrices, and halt when the first stops, returning the list L it returns. Note that there exists at least one matrix which receives at most k log k false positives, and its decoding procedure will run correctly as desired. Increasing the number of false positives can only increase the running time of the decoding algorithm we are going to present, and thus the matrix that finishes first will be the 

O(k 1+α poly(log n) · 1 + e 0 k log k = O(k α · m · poly(log n)).

Let us now drop the notation on e 0 , assuming that we have at most k log k false positives. We shall show how to construct a matrix M with O(k log(n/k)) rows associated with a decoding procedure that allows us find a list of size k 1+α poly(log n) which contains every defective item, i.e., an analog of Theorem 3.1. Then, the list can be filtered out at the same cost using the matrix guaranteed by Theorem 10.1.

The construction of M appears in Algorithm 4, and the decoding procedure in Algorithm 5. Both the construction and the decoding algorithm is in the same spirit as Algorithm 1 and Algorithm 2 respectively, by learning d = ⌈α log k⌉ bits at a time, instead of 1. It is not hard to infer that no defective item will be left out of the L, the list output by Algorithm 5. What remains is to bound |L|. We will show that with probability 1 − e −C 1 k log(n/k) , the list will always have C L k log k (n/k) items in it, where C 1 is an absolute constant. Proof. For a given x ∈ {0, 1} log n , we define the trajectory τ x of the items to be the set of all possible prefixes p that might be inserted in L at some point during the execution of the algorithm and will not get discarded in Line 20. We also refer to the trajectory of the defective items, and denote it as τ ′

x , as the set of possible prefixes p containing a defective item and not inserted in L at some point during the execution of the algorithm. A prefix p of length ℓ has D = 2 d children p ′ ∈{0,1} d p p ′ of length ℓ + 1. We will imagine a degree-D tree T rooted on the empty string. Moreover, the relation T ⊆ tree T will denote the fact that T is a connected sub-tree of T rooted at the empty string. It obviously suffices to prove that |τ x \ τ ′ x | ≤ C L k log k (n/k), for any x and any choices of false positives. This ensures that the output of Identify-Under-Errors(M ⊙ x) has of size O(k log k (n/k)).

We will prove a stronger fact. 

Call such a collection of sets "bad".

The set (ℓ,r)∈[H]×[R] E (ℓ,r) will correspond to the set of false positives, i.e., the measurements that an adversary can corrupt in order to sabotage the decoding algorithm. Note that it suffices to prove the lemma with equality in condition 4, for all ℓ ∈ [H], since adding more false positives can only hurt the decoding algorithm. For a fixed ℓ, counting r∈[R] E (ℓ,r) corresponds to putting k log k balls in R bins such that in each bin we can have at most Ck balls. By relaxing the condition on the upper bound on the capacity of the bin, we get that the number of valid choices for the sets E ℓ,r is upper-bounded by

and a prefix p of length log k + ℓ · d, define Bernoulli random variable Y p,r such that

In words, Y p,r captures an event where p participates in a positive test in M (ℓ,r) (i.e., either by existence of a defective or by a false positive).

Moreover, let binary Bernoulli random variable Y p such that Y p = maj r∈[R] Y p,r . For a fixed ℓ there can be at most R/4 = C log k/4 repetitions r ∈ [R] with |E (ℓ,r) | > 4k; call any r which does not satisfy this "desirable". For p not containing a defective item and a desirable r, we have that

An application of the Chernoff Bound for variables {Y p,r } r desirable yields that Y p = 1 with probability at most (10/C) log k for sufficiently large C. Given the above considerations, we have that

·2 8αC L k log(n/k) 10 C C L k log(n/k) ≤ (k + 1) · n k · 2 (k log k+R−1)·(log D (n/k)+1) · 2 8αC L k log(n/k) 10 C C L k log(n/k) ≤ (k + 1)e k log(en/k) · 2 3k log(n/k) · 2 8αC L k log(n/k) 10 C

It is not hard to see that for any α we can pick C, C L with C < C L such that the latter inequality is at most e −Ck log(n/k) . This ensures that in L at most

prefixes will be inserted during the execution of the algorithm, and thus the bound on the running time follows. This finishes the proof.

Thus, the first claim of Theorem 3.8 follows as discussed. We may obtain the second claim of Theorem 3.8 by augmenting the matrix guaranteed by the first claim with the disjunct matrix guaranteed by Theorem 10.1 to filter out the non-defective items. 

We shall show how to setup a hashing scheme and then invoke the framework of [LNNT16] . That framework is similar in spirit with list-recoverable codes, albeit with linking between the lists, allowing faster decoding and optimal measurement complexity in that work, which focused on the heavy hitters problem in the for-each case. The idea of linking has also been used before in [GLPS17] . What we are essentially doing here is to construct a (k, O(k), 0, e 1 )-list-disjunct matrix with O(k 2 log n + ke 1 ) rows. This is of course unacceptable for list-disjunct matrices, but fine for disjunct matrices. Our approach is closely related to the implicit reduction of [INR10] to list-disjunct matrices, albeit with the techniques of [LNNT16] and an added twist (two-layer hashing) to make the decoding time nearly linear (instead of polynomial) in m. The necessity for two-layer hashing will add additional complexity to the whole argument.

Formally, the technical contribution of this subsection is the following Lemma.

Lemma 10.4. There exists a (k, O(k), e 0 , 0)-error-correcting list-disjunct matrix with m = O(k 2 log n+ ke 1 ) rows, which allows decoding in time m · poly(log n).

Indeed, using Lemma 10.4 the non-defective items can be filtered out in the desired time bound by using Theorem 10.1. We thus focus on proving Lemma 10.4. We re-iterate that the above result, although quite undesirable for list-disjunct matrices, suffices for our application to disjunct matrices.

Proof of Lemma 10.4. First of all, by a similar reduction as in Theorem 3.8 we can assume that e 1 ≤ k log n. Let C be a large enough absolute constant constant and let R = Ck and Q = log n/ log log n.

Let enc : {0, 1} log n → {0, 1} O(log n) be the encoding function of a constant-rate error-correcting code that corrects a constant fraction of errors in linear time; such a code is available in [Spi96] . We split enc(i) into Q blocks of length C log log n, and denote the q-th block by enc(i) q . Let us pick also a d-regular connected expander, which we shall call F , on the vertex set [Q] for some constant d. For q ∈ [Q] we let Γ(q) ⊆ [Q] be the set of neighbours of q.

We shall pick the following two collections of hash functions.

For a set S ⊆ {0, 1} O(log n) , we denote h ρ,q (S) = i∈S h ρ,q (i). Moreover,

The following lemma constitutes the crux of the reduction. Proof. We shall pick every function at random. Fix S, j as in the condition of the lemma. For ρ ∈ [R], b ∈ [Ck/ log n] note that |g −1 ρ (b)∩S| is the sum of k Bernoulli random variables with expectation |S|·(log n/(Ck)) ≤ log n/C. By an application of the Chernoff bound we get |g −1 ρ (g ρ (j))∩(S\{j})| ≤ log n with probability 1 − e −Θ(log n) . The probability that in more than R/4 − 1 repetitions, we have that |g −1 ρ (g ρ (j)) ∩ (S \ {j})| > log n is e −Θ(R log n) = e −Θ(k log n) by a Chernoff bound; the constant in the exponent can be tuned arbitrarily large by setting C sufficiently large. A union bound over all pairs (j, S) with

the number of which is at most

yields item 1. Let us condition on item 1 being true. Let us choose one of the at least 3R/4 + 1 repetitions ρ ∈ [R] for which bullet item 1 holds for pair (j, S). For that repetition, we have that item 2 fails with probability e −(C−1) log log n . Hence with probability e −(C−1) log log n·Ω(Q) = e −Θ(log n) , item 2 holds for at least 9/10 of the indices q ∈ [Q]. Thus, by the Chernoff bound over all 3R/4 + 1 aforementioned repetitions we get that with probability e −Ω(k log n) there exist at least R/2 repetitions such that items 1 and 2 simultaneously hold for pair (j, S). We may now take a union-bound over all (j, S) such that S ⊆ {0, 1} log n , |S| ≤ k, j ∈ S to conclude the claim.

For every ρ ∈ [R], b ∈ [Ck/ log n], and q ∈ [Q] let us now partition {0,

i.e., every i ∈ {0, 1} O(log n) ∩ g −1 ρ (b) belongs to partition O ρ,b,q (i). Note that for each ρ, b there are 2 (d+2)C log log n such partitions.

For

we now pick a (log n, 2 (d+2)C log log n , 0, 20 log log n) error-correcting list-disjunct matrix M (ρ,b,q) guaranteed by Theorem 10.1 on universe size 2 (d+2)C log log n and extend it over {0, 1} O(log n) by putting every i ∈ g −1 ρ (b) to the measurement corresponding to O ρ,b,q (i). In other words, for each (ρ, b, q) we group together coordinates with the same Q ρ,b,q (i) value, and pick a list-disjunct matrix over the smaller universe of size 2 (d+2)C log log n . The total number of rows is

Claim 10.6. For any adversarial choice of k log n false negatives and for every j ∈ {0, 1} log n , there exist at least 3R 4 + 1 indices ρ ∈ [R] for which the following holds. Denoting b = g ρ (enc(j)), at least 3Q/4 of the list-disjunct matrices M (ρ,b,q) with q ∈ [Q] receive less than 20 log log n adversarial errors.

Proof. Assume that this was not the case. Then the number of false negatives would be strictly more than R 4 · Q 4 · (20 log log n) > k log n, which is a contradiction.

We are now in a position to discuss the decoder. For every (ρ, b) ∈ [R] × [Ck/ log n], we shall first show how to find a list L ρ,b of size O(k). Then we shall keep the elements j that appear in at least R/2 of the lists L ρ,b . For each such j we may return i = enc −1 (j), which can be done in O(log n) time by the guarantee of [Spi96] . Since for fixed R and across b every j ∈ {0, 1} O(log n) is associated with exactly one choice of b, an averaging argument shows that the number of returned j can be O(k). What needs to be shown is that we do not miss any j = enc(i) where i is defective.

Obtaining L ρ,b . We run the naive decoder of Definition 5.1 on every list disjunct matrix M (ρ,b,q) to get a list L ρ,b,q of size at most 2k. Our approach closely follows [LNNT16] from now on. Fix ρ, b. We build a graph G on vertex set [Q] × [2 (d+2)C log log n ]. For every q ∈ [Q] we split every s ∈ L ρ,b,q (which is a binary string of length (d + 2)C log log n) to d + 2 sub-strings of length C log log n, henceforth denoted s 1 , s 2 , . . . , s d+2 . We say that s 2 is the "name" of s. If there are multiple elements s with the same name we just keep one of those. We say that s 2 "suggests" an edge e connecting (q, s 2 ) to (q, s ℓ ), ∀ℓ ∈ {1, 3, 4, . . . , d}. If s ℓ is the name of some other element in L (ρ,b,q) and suggests also e, we add e to G. This will result in a graph with at most (d/2) · Q · (2 log n) = O(log 2 n/ log log n) edges. We restrict G to the union of non-isolated vertices, ensuring that it has O(log 2 n/ log log n) vertices. As in [LNNT16] we may now perform spectral clustering ([LNNT16, Theorem 1]) in G to find all ǫ 0 -spectral clusters for some ǫ 0 constant. As in [LNNT16] this will yield a set of corrupted codewords in {0, 1} O(log n) , from which we shall decode to obtain L ρ,b .

To prove correctness, fix a set of defective items S and i ∈ S. Using Claim 10.5 and Claim 10.6, we may infer that there exist at least R/2 repetitions ρ ∈ [R] such that j = enc(i) ∈ L ρ,gρ(j) ; the proof is totally analogous to the proof of [LNNT16, Theorem 2] . This happens in particular because there exist at least R/2 indices ρ ∈ [R] for which there exist at least 3Q/4 elements q ∈ [Q] such that i) the list-disjunct matrix M (ρ,gρ(j),q) will be decoded correctly for it shall receive less than 20 log log n false negatives (Claim 10.6), and ii) j = enc(i) will not "collide" with another defective item (Claim 10.5) in that matrix. The above two conditions constitute the analog of [LNNT16, Lemma 1] . This ensures that enc(i) will be present in L for every i ∈ S. Hence, no defective item will be lost, and since |L| = O(k) as argued, this fnishes the proof of the lemma.

Putting together Lemma 10.4 with Theorem 10.1 we obtain the Theorem 3.9. We re-iterate that the above approach does not perform well with respect to false positives, see also discussion at the end of this Section. It is also quite specific to the construction of disjunct matrices, since it yields a quadratic bound in the number of measurement (roughly speaking, this happens because one needs to repeat Θ(k) times in order to boost the success probability so that a union-bound over all possible sets of defectives is possible).

Does there exist a simpler approach for decoding disjunct matrices? It is natural to wonder whether there exists a simpler way of proving Theorem 3.9, avoiding the spectral graph theory framework of [LNNT16] and its ad-hoc incorporation via the shown complicated two-layer hashing scheme. We present the following stand-alone reduction from disjunct matrices that can correct up to e 1 false negatives to (k, n, 0, e 1 )-error-correcting list-disjunct matrices, which could be useful in such an attempt. Implicit and explicit reductions from disjunct to list-disjunct matrices appear also in [Che12, INR10] , but in order to work out they demand list-recovery technology to combine the answers from the list-disjunct matrices. However, the reduction we present here does not demand any such combination.

Lemma 10.7. Assume there exists a (k, O(k), 0, e 1 )-error-correcting list-disjunt matrix with R(k, n, e 1 ) rows which is decodable in time poly(R(k, n, e 1 )), in the regime k ≤ c log n. Then there exists a kdisjunct matrix that can tolerate up to e 1 false negatives, with m = O((k 2 / log n) · R(log n, n, e 1 /k)) for k > c log n rows. Furthermore, the matrix is decodable in time O((k 2 / log n) · poly(R(log n, n, e 1 /k))).

-In particular, if R(k, n, e 1 ) = O(k log(n/k) + ke 1 ), we obtain an error-correcting (k, n, e 1 )-disjunct matrix with m = O(k 2 log n + ke 1 ) rows and which is efficiently decodable in m poly(log n) time (for all values of k). Ck · (10k/ log n) · R(log n, n, e 1 /k).

Fix a set S ⊆ [n] of defective items. The probability that i ∈ S in repetition ρ satisfies |h −1 ρ (h ρ (i)) ∩ S| > log n is at most e−2 log by the Chernoff bound (as in the previous proff, |h −1 ρ (h ρ (i)) ∩ S| is controlled by the sum of k − 1 independent Bernoulli random variables of expectation log n 10k ). The probability that in most repetitions ρ we have |h −1 ρ (h ρ (i)) ∩ S| > log n is at most e −Ck log n/10 . By a union-bound over all k · k j=0 n k ≤ k 2 n k pairs (i, S) with |S| ≤ k, i ∈ S, we get that for every such pair (i, S) it holds that |h −1 ρ (h ρ (i)) ∩ S| ≤ log n in most repetitions ρ. Let us call this property "expansion". For every i ∈ S there can be at most k repetitions ρ for which M (ρ,hρ(i)) receives more than e 1 /k errors. This means that we can perform the following in order to find a list L that contains all defective items: Run the decoding procedure associated with every M (ρ,b) , halting after poly(R(log n, n, e 1 /k)) steps, returing all elements that appeared at list Ck/2 times. By the expansion property, for every i ∈ S and every ρ ∈ [Ck], the decoding procedure of M (ρ,hρ(i)) will run on a valid instance of sparsity less than log n and with at most e 1 /k errors. Hence, it will run correctly, yielding the desired result. The total running time is at most (Ck) · (10k/ log n) · poly(R(log n, n, e 1 /k)).

We shall bound the size of L by O(k). Note that the number of pairs (i, ρ), where i is not defective but i was returned by the decoding procedure on M (ρ,hρ(i)) is O(log n) · (10k/ log n) · Ck = O(k 2 ). Thus, a simple averaging argument shows that there can be at most O(k) coordinates i which are non-defectives but were returned by M (ρ,hρ(i)) in the majority of repetitions ρ. From that we conclude that L = O(k). Using the error-correcting disjunct matrix guaranteed by Theorem 10.1 (part 2), we may find exactly all the defective items using point-queries.

When R(n, k, e 1 ) = O(k log(n/k) + ke 1 ), the above reduction in the regime k > c log n gives the desired result. In the regime k ≤ c log n we may use this list-disjunct matrix and the reduction from e 1 false negatives to k log n false negatives, to find a list of size 2k that can be filtered out using the matrix in Theorem 10.1. The main observation for establishing the running time is again that poly(k, log n) = k 2 poly(log n), which gives the desired result.

Note that the above reduction, as well the hashing scheme in Theorem 3.9, fail in presence of false positives. An adversary which has e 0 = Θ(k 2 log n) false positives in their possession can choose a defective item i and put Ω(k log n) false positives in each matrix M (ρ,hρ(i)) . Note that each such matrix has O(log 2 n) rows, and hence for the interesting case of k ≫ log n the adversary can mess up all the list-disjunct matrices where i participates in. It is unclear how to decode from such a scenario, if at all possible. A natural approach would be to discard each such M (ρ,hρ(i)) , because it would appear as a matrix overloaded with false positives. However, this would lead to not detecting i.

This section is devoted to proving Theorem 3.11. In what follows, for a vector x ∈ R n and a set S, we let x S be the vector that is obtained by zeroing out every i / ∈ S. We let x −k to be the vector that is obtained after zeroing out the largest k in magnitude coordinates in x, breaking ties arbitrarily. We shall use anti-concentration of Gaussians and in particular the following fact.

Claim 11.1. There exists an absolute constant C g > 1 such that P g∼N (0,1) 1 G g ≤ |g| ≤ C g ≥ 9/10. Notation Statement C g Governs the concentration and anti-concentration of Gaussians C Governs the number of rows per matrix M (ℓ,r) C T Governs the number of non-discarded elements in Line 16 B

Governs the size of sets E (ℓ,r) in Claim 11.4 C F Governs the size of the forest |F| in Claim 11.4 Table 1 : Summary of constants in Section 11. We shall set C g ≪ B ≪ C ≪ C F .

Anti-concentration of Gaussians for ℓ 2 /ℓ 2 compressed has been also used in [NS19] , but for a crucially different reason. That work also demands a much stronger anti-concentration property of the Gaussian distribution (the point there was to avoid arguing via weak identification systems in the first place in order to achieve optimality in decoding time and column sparsity, something which is not a focus here). Though they might seem relevant at a first glance, our approach and the one in [NS19] are quite different on many levels, the most obvious being the fact that [NS19] does not build a weak identification system and does not perform the standard iterative loop employed in the compressed sensing literature. Note also that Claim 11.1 does not really use anything but the fact that Gaussian is a continuous distribution.

It is possible to use the machinery of FT-mollification [KNW10] to work with discretized versions of Gaussians which can be stored in small space. However, in order to keep the exposition as elementary as possible, here we do not elaborate on how to do so.

Lemma 11.2. There exists a randomized construction of a matrix M ∈ R m×n with m = O((k/ǫ) log(n/k) + 1 ǫ · log(n/k) log log(n/k) log(1/δ)), for ℓ = 0 to H do 8:

for i ∈ L do 9:

L p ← ∅ 10:

for r = 0 to R − 1 do 11:

q ← h ℓ,r (p) ⊲ Find in which row of M (ℓ,r) element i is set to 1 12: h ℓ,r (p) x| will be at least (1/G g ) · ǫ/k x −k 2 with probability 9/10, and hence a Chernoff bound across all r ∈ [R] gives that with probability η := 1 − e −c log(1/δ)/k−c log(n/k)/ log log(n/k) it holds that est p ℓ will be at least (1/G g ) · ǫ/k x −k 2 , for some absolute constant c. Now, a union bound over all ℓ ∈ [H] shows that with probability η, it will be the case that

for all ℓ ∈ [H] and p = bpref log k+ℓ·d (i). Consider now Bernoulli random variables Y i for i ∈ H such that

The above discussion gives that Y i = 1 with probability η. We would like to apply the additive form of the Chernoff bound in order to argue that with probability 1 − δ 6 , at most (k/10) of the Y i are 1, from which the claim would then follow. What prevents us from doing so is the fact that the Y i are not independent. To circumvent that, we shall make a coupling argument and stochastically dominate the {Y i } i∈H by another set of Bernoulli random variables {Z i } i∈H which have larger expectation and are independent. As an intuitive explanation for that, note that if two i, i ′ ∈ H satisfy p ℓ = bpref log k+ℓ·d (i) = bpref log k+ℓ·d (i ′ ), then the probability that est p ℓ is smaller than the threshold only increases, since it is a median of the absolute values of R Gaussians with twice the variance. A similar conclusion holds in the case where they have different prefixes and their prefixes participate in the same measurement. Thus, we may first define variables W i,ℓ for

where the Gaussians are independent across the R repetitions. As a next step let us define

Note that Z i are jointly independent, and they stochastically dominate Y i , i.e. P {Y i = 1} ≤ P {Z i = 1} . Thus, the same analysis as above holds for Z i , and we can now safely apply the Chernoff bound on Z i to conclude the claim.

We also need the following claim.

Claim 11.4. A prefix p of length log k + ℓ · d is called a "potentially false prefix" if 1. (light prefix) x i∈{0,1} log n : bpref log k+ℓ·d (i)=p 2 < 1 √ 2Cg

ǫ k x −k 2 , and 2. (large estimator) est p > 1 Cg ǫ k x −k 2 .

With probability 1− δ 2 , the conclusion of Claim 11.3 holds and there are at most O((k/ǫ) log(n/k)) potentially false prefixes inserted in L during the execution of the algorithm which are not discarded in Line 16.

Proof. In order to analyze est p for potentially false prefixes, let us define est p = median r∈ [R] x h −1 ℓ,r (h ℓ,r (p)) 2 2 , where ℓ is the length of p. Note that the random variable est p is exactly the variance of the random variable est p , i.e. est p ∼ N (0, est p ).

We may proceed along the same lines as in the arguments in Theorem 3.1 and Theorem 3.8, albeit with some careful, yet non-trivial twists. First of all, we set D = Θ(log(n/k)/ log log(n/k)) and R = Θ(log log(n/k)) whereas the scheme in Theorem 3.1 we chose D = 2, R = 1 and in Theorem 3.8 we chose D = Θ(k α ), R = Θ(log k).

Consider sets (to be specified later) E (ℓ,r) ⊆ [Ck/ǫ] with |E (ℓ,r) | ≤ Bk/ǫ, for all (ℓ, r) ∈ [H]×[R]. Here B is an absolute constant to be chosen later. A prefix of length ℓ is called "undesirable" if h ℓ,r (p) ∈ E (ℓ,r) for more than R/10 repetitions r ∈ [R]. Walking through the proof of Theorem 3.8, we can infer that the the probability that there exist choices of the sets E (ℓ,r) and a (C F (k/ǫ) log(n/k)))-sized forest F of k trees rooted at all binary strings of length k, such that at least |F|/10 prefixes p ∈ F are undesirable is at most Ck/ǫ Bk/ǫ Similarly to the proof of Theorem 3.1 and the proof of Theorem 3.8, using Lemma 5.6 and the fact that a b ≤ (ae/b) b we can set the constants so that the latter probability is less than δ 6 . In particular, we may set C/B to be an arbitrarily large constant, and C F sufficiently large with respect to C. From now on, let us condition on this event.

Given the above let us define for (ℓ, r) ∈ [H] × [R] the set E (ℓ,r) as

Clearly, |E (ℓ,r) | ≤ k + 2C 2 g · (k/ǫ), so in this point we may choose the constant B such that B ≥ 3C 2 g . As long as C F > 100 · 2C 2 g , any (C F (k/ǫ) log(n/k))-sized forest F of prefixes has at least 99 100 |F| prefixes which are light, and hence at least (99/100)|F| − (10/100)|F| = (89/100)|F| prefixes which are both light and desirable (non-undesirable). Lightness and desirability of a prefix p implies that est p < 1 C 2 g ǫ/k x −k 2 . Given the above, the probability that all of those prefixes satisfy est p ≥ 1 Cg ǫ/k x −k 2 can be calculated as e −Ω(R) (89C F /100)·(k/ǫ) log(n/k) ≤ δ 6 .

Thus, the above two conditions and the condition of Claim 11.3 hold with probability 1 − δ 2 . Conditioning on the above, the execution of Algorithm 7 ensures that all but k/10 coordinates i ∈ H will not be displaced by any other prefix in Line 16, as long as C T > 1 + C 2 g + C F . This yields the proof of the claim.

Given the above considerations, we are now ready to prove the lemma. Claim 11.4 ensures that with probability 1 − δ 2 there can be at most O((k/ǫ) log(n/k)) prefixes p that will have an estimate larger than 1 Cg ǫ/k x −k 2 : i) potentially those not satisfying item 1 of the claim, which are O(k/ǫ) · H = O((k/ǫ) · log(n/k)) in total, and ii) those who satisfy item 1, which are again O((k/ǫ) log(n/k)) as proved in the claim. The execution of the algorithm along with Claim 11.3 ensures that that all but (k/10) elements in H will be returned in the list L at the end; let those elements constitute the set T . A standard argument as in previous work [GLPS10, NS19] shows that T indeed satisfies the conclusion of the Lemma, finishing the proof.

Remark 11.5. In fact, we can prove that the list L output by the Algorithm 7 has size O (k/ǫ) · log(n/k) log log(n/k) .

However, this will not make an overall difference in our argument.

Given the list L from Lemma 11.2, we shall prune it down to 2k coordinates. The construction is exactly the same as in previous work [GLPS10, GNP + 13, LNW18] , and the proof of correctness follows from [LNW18] . The additional sensing matrix we keep is a CountSketch matrix, i.e., a vertical concatenation of R ′ = O(log(1/ǫ) + log(n/k) + log(1/δ)/k) = O(log(n/k) + log(1/δ)/k) matrices C (1) , . . . , C (R ′ ) ∈ {−1, 0, +1} O(k)×n , each one having a random sign at one random position in every column. For every i ∈ L, we compute the approximations x ′ i = median r∈[R ′ ] (C (r) x) q(r,i) , where q(r, i) is the position of the non-zero element of the i-th column in C (r) . We then keep the coordinates i ∈ L with the largest in magnitude approximations. With probability 1 − δ 2 , for all but (4k/10) elements i ∈ L their approximations satisfy |x i −x ′ i | ≤ c ǫ/k x −k 2 (good approximation), for some absolute constant c ≪ 1. Note that in expectation, the amount of non-well-approximated coordinates i ∈ L is at most e −Ω(R) · |L|; using martingale arguments to handle dependency issues, one can show that the probability that more than k/10 coordinates i ∈ L are not well-approximated is at most δ 2 [LNW18] . As in [GLPS10] we may zero out x ′ i for every i for which x ′ i is not among the top 2k coordinates. If we output x ′ which is 2k-sparse, we obtain the desired guarantee (for correctness see [GLPS10] ). This yields an overall failure probability of δ.

The standard way to obtain a scheme for the general ℓ 2 /ℓ 2 problem one can vertically pack a sequence of weak systems, associated with matrices Ψ (1) , Ψ (2) , . . . with (sparsity, fineness of approximation)

respectively. Using the first weak system we may obtain a (2k)-sparse vector x (1) , and use it to gain access to the sketch Ψ (2) (x − x (1) ). In turn, this will return vector x (2) , which we will feed to the next weak system to gain access to Ψ (3) (x − x (1) − x (2) ), so on so forth. Choosing (k j , ǫ j ) = (k/2 i , ǫ/(1.5) i ) we may obtain the analogous conversionas in [GLPS10, HIKP12a, GNP + 13, GLPS17, CI17, LNW18]. Other ways of obtaining ℓ 2 /ℓ 2 schemes from weak systems appear in [LNW18, Section D.2].

Sublinear-time sparse recovery is an extensively studied area with several open problems, most of which we seem to resist attacks by the available machinery. Thus, new techniques are needed in order to bypass the current barriers and improve the known trade-offs between the measurement complexity and the decoding time. In this work, we made a major step in this direction by introducing a new sublinear-time algorithmic framework, enabling us to close many of the outstanding open problems and make significant progress on others. It is interesting to consider interactions between our ideas and the list-recovery framework, notably the closely related framework of [LNNT16] . For list-disjunct matrices, list-recovery based arguments, including [LNNT16] , are unsuitable on their own for optimal measurement complexity, let alone schemes with near-optimal decoding time. Our intuition is that further progress should be possible by setting up an appropriate error-correcting mechanism on top of our construction. It should also be noted that the constructions of efficiently decodable list-disjunct matrices are significantly more challenging than those for disjunct matrices. This is expected since the former can be used to construct the latter. Furthermore, one should first aim at designing very efficient schemes for ℓ 1 /ℓ 1 and ℓ 2 /ℓ 1 compressed sensing before trying to improve error-correcting group testing schemes (and, in particular, list-disjunct matrices), since the latter seem to be much harder for two reasons: i) the errors are completely adversarial, whereas in compressed sensing they depend on the underlying structure of the vector x, and ii) one cannot perform a subtract-and-repeat iterative process and thus has to identify all defective items in one stage, in contrast to compressed sensing where it suffices to identify most heavy hitters, and this makes a major difference.

Fast modular subset sum using linear sketching

Optimal monotone encodings

(learned) frequency estimation algorithms under Zipfian distribution

Faster update time for turnstile streaming algorithms

Bptree: an ℓ 2 heavy hitters algorithm using constant memory

Beating countsketch for heavy hitters in insertion streams

Sublinear-time nonadaptive group testing with o(k log n) tests via bit-mixing coding

Sublinear-time non-adaptive group testing with o(k log n) tests via bit-mixing coding

An optimal algorithm for ℓ 1 -heavy hitters in insertion streams and related problems

A german exception? why the countryâĂŹs coronavirus death rate is low

Combining geometry and combinatorics: a unified approach to sparse signal recovery

Nearly optimal distinct elements and heavy hitters on sliding windows

Sparse recovery with partial support knowledge

A note on double pooling tests

Efficient pooling designs for library screening

Top-k-convolution and the quest for near-linear outputsensitive subset sum

Finding frequent items in data streams

Handbook of Combinatorial Designs, Second Edition (Discrete Mathematics and Its Applications

New constructions of one-and two-stage pooling designs

An unexpected meeting of four seemingly unrelated problems: graph testing, DNA complex screening, superimposed codes and secure key distribution

k-mismatch with don't cares

Deterministic radio broadcasting

A survey on nonadaptive group testing algorithms through the angle of decoding

Finding the frequent items in streams of data

Noise-resilient group testing: Limitations and constructions

Improved constructions for non-adaptive threshold group testing

Nearly optimal deterministic algorithm for sparse Walsh-Hadamard Transform

GROTESQUE: noisy group testing (quick and efficient)

Efficient algorithms for noisy group testing

Graph-constrained group testing

An adaptive sublinear-time block sparse Fourier transform

An improved data stream summary: the count-min sketch and its applications

What's hot and what's not: tracking most frequent items dynamically

Near-optimal signal recovery from random projections: Universal encoding strategies?

Pooling Designs and Nonadaptive Group Testing

Compressed sensing

The detection of defective members of large populations

Bounds on the length of disjunctive codes

A survey of superimposed code theory

Superimposed distance codes. Problems of Control and Information Theory-problemy Upravleniya i Teorii Informatsii

Pool testing of SARS-CoV-02 samples increases worldwide test capacities many times over

Group testing problems with sequences in experimental molecular biology

A Mathematical Introduction to Compressive Sensing

Indexing information for data forensics

Nearly optimal sparse group testing

Linear-time list decoding in error-free settings: (extended abstract)

Sparse recovery using sparse matrices

Approximate sparse recovery: optimizing time and measurements

For-all sparse recovery in near-optimal time

Improved time bounds for nearoptimal sparse Fourier representations

ℓ 2 /ℓ 2 -foreach sparse recovery with low risk

Essential coding theory

Nearly optimal sparse Fourier transform

Simple and practical algorithm for sparse Fourier transform

Group testing for image compression

Sketching and streaming entropy via approximation theory

Sample-optimal Fourier sampling in any constant dimension

On the optimality of the Kautz-Singleton construction in probabilistic group testing

Efficiently decodable non-adaptive group testing

Improved approximation guarantees for sublinear-time Fourier algorithms

Perfect lp sampling in a data stream

Sparse Fourier transform in any constant dimension with nearly-optimal sample complexity in sublinear time

Sample efficient estimation and recovery in sparse FFT via isolation on average

Group testing problems in experimental molecular biology

Fast moment estimation in data streams in optimal space

On the exact space complexity of sketching and streaming small norms

Nonrandom binary superimposed codes

Dimension-independent sparse Fourier transform

Saffron: A fast, efficient, and robust framework for group testing based on sparse-graph codes

Deterministic heavy hitters with sublinear query time

Sublinear-time algorithms for compressive phase retrieval

Heavy hitters via clusterpreserving clustering

On low-risk heavy hitters and sparse recovery schemes

Probabilistic nonadaptive group testing in the presence of errors and DNA library screening

Deterministic history-independent strategies for storing information on write-once memories

Almost optimal phaseless compressed sensing with sublinear decoding time

On fast decoding of high-dimensional signals from one-bit measurements

A survey on combinatorial group testing algorithms with applications to DNA library screening

On accelerated testing for COVID-19 using group testing

Sparsity lower bounds for dimensionality reducing maps

On deterministic sketching and streaming for sparse recovery and norm estimation

Efficiently decodable error-correcting list disjunct matrices and applications

Worst-case optimal join algorithms

Stronger ℓ 2 /ℓ 2 compressed sensing; without iterating

nearly) sample-optimal sparse Fourier Transform in any dimension; ripless and filterless

Ricketts provides update on coronavirus testing

Explicit non-adaptive combinatorial group testing schemes

Sublinear time, measurement-optimal, sparse recovery for all

On the upper bound of the size of the r-cover-free families

Coding for interactive communication

Group-testing to eliminate efficiently all defectives in a binomial sample

The on-line encyclopedia of integer sequences

Linear-time encodable and decodable error-correcting codes

Group testing with DNA chips: Generating designs and decoding experiments

Unconstraining graph-constrained group testing

Pooling method for accelerated testing of COVID-19

The unified theory of pseudorandomness

Group testing using left-and-rightregular sparse-graph codes

Modern computer algebra

On error-tolerant DNA screening

Molecular biology and pooling design

Born-again group testing: multiaccess communications

New algorithms for heavy hitters in data streams

Evaluation of COVID-19 RT-qPCR test in multi-sample pools. medRxiv

Noisy pooled pcr for virus testing

for i ∈ {0, 1} log n do 10:q ← h ℓ,r (bpref log k+ℓ·d (i))

Pick g ∼ N (0, 1) with fresh randomness 17: end procedureProof. The construction of M appears in Algorithm 6. The number of rows is H · R · (Ck/ǫ) = O((k/ǫ) log(n/k) + 1 ǫ · log(n/k) log log(n/k) log(1/δ)).The decoding procedure on M is depicted in Algorithm 7. The running time is H · (D · C T (k/ǫ) log(n/k)) · R · O(1) = O(m log 2 n).We now prove correctness. Leti.e., those coordinates among the top k whose magnitudes are at least ǫ/k x −k 2 2 . For the sake of the analysis (and for avoiding making a lengthy induction argument which would be needed otherwise), we will define est p for every length-ℓ prefix p ∈ {0, 1} log k+ℓ·d , exactly the same way as in Algorithm 6. Note that the algorithm avoids computing all est p by considering a very small subset of all prefixes p, namely only O((k/ǫ) log 3 (n/k)) prefixes.Claim 11.3. With probability 1 − δ 6 , the following holds for all but (k/10) elements i ∈ H. For prefix p ℓ = bpref log k+ℓ·d (i) we have that est p ℓ ≥ (1/G g ) ǫ/k x −k 2 for all ℓ ∈ [H].Proof. Fix i ∈ H and consider p ℓ such that bpref log +ℓ·d (i) = p. For r ∈ [R], the random variable M (ℓ,r) h ℓ,r (p) x is distributed as a Gaussian with variance at least (ǫ/k) x −k 2 2 . Claim 11.1 implies that