key: cord-0145509-erx84m3l authors: Dvovr'ak, Pavel; Loff, Bruno title: Lower Bounds for Semi-adaptive Data Structures via Corruption date: 2020-05-05 journal: nan DOI: nan sha: d8fd761db4c8d7bd14962aac680d41c0eee7c94d doc_id: 145509 cord_uid: erx84m3l In a dynamic data structure problem we wish to maintain an encoding of some data in memory, in such a way that we may efficiently carry out a sequence of queries and updates to the data. A long-standing open problem in this area is to prove an unconditional polynomial lower bound of a trade-off between the update time and the query time of an adaptive dynamic data structure computing some explicit function. Ko and Weinstein provided such lower bound for a restricted class of {em semi-adaptive/} data structures, which compute the Disjointness function. There, the data are subsets $x_1,dots,x_k$ and $y$ of ${1,dots,n}$, the updates can modify $y$ (by inserting and removing elements), and the queries are an index $i in {1,dots,k}$ (query $i$ should answer whether $x_i$ and $y$ are disjoint, i.e., it should compute the Disjointness function applied to $(x_i, y)$). The semi-adaptiveness places a restriction in how the data structure can be accessed in order to answer a query. We generalize the lower bound of Ko and Weinstein to work not just for the Disjointness, but for any function having high complexity under the smooth corruption bound. queries require t = Ω (log n/ log log n) 2 , and certain problems with Boolean queries require t = Ω (log n/ log log n) 3/2 . The major unsolved question in this area is to prove a polynomial lower bound on t. For example, consider the dynamic reachability problem, where we wish to maintain a directed n-vertex graph in memory, under edge insertions and deletions, while being able to answer reachability queries ("is vertex i connected to vertex j?"). Is it true that any scheme for the dynamic reachability problem requires t = ω(n δ ), for some constant δ > 0? Indeed, such a lower bound is known under various complexity-theoretic assumptions 1 , the question is whether such a lower bound may be proven unconditionally. In an influential paper [19] , Mihai Pătraşcu proposed an approach to this unsolved question. He defines a data structure problem, called the multiphase problem. Let us represent partial functions f : {0, 1} n × {0, 1} n → {0, 1} as total functions f : y) is not defined. Then associated with a partial Boolean function f : {0, 1} n × {0, 1} n → {0, 1, * }, and a natural number k ≥ 1, we may define a corresponding multiphase problem of f as the following dynamic process: Phase I -Initialization. We are given k inputs x 1 , . . . , x k ∈ {0, 1} n , and are allowed to preprocess this input in time nk · t p . Phase II -Update. We are then given another input y ∈ {0, 1} n , and we have time n · t u to read and update the memory locations from the data structure constructed in Phase I. Phase III -Query. Finally, we are given a query i ∈ [k], we have time t q to answer the question whether f (x i , y) = 1. If f (x i , y) is not defined, the answer can be arbitrary. Typically we will have k = poly(n). Let us be more precise, and consider randomized solutions to the above problem. For each y ∈ {0, 1} n , U y : {0, 1} w s → {0, 1} w u is a decision-tree of depth ≤ n · t u , which reads E(x) and produces a sequence For all x ∈ {0, 1} n k , y ∈ {0, 1} n , and i ∈ [k], In a randomized scheme for the multiphase problem of f , each U y and Q i are distributions over decision trees, and it must hold that for all x ∈ {0, 1} n k , y ∈ {0, 1} n , and i ∈ [k], 1 See [17, 1] . Strictly speaking, these conditional lower bounds only work if the preprocessing time, which is the time taken to encode the data into memory, is also bounded. But we will ignore this distinction. 2 In the usual way of defining the update phase, we have a read/write decision-tree Uy which changes the very same cells that it reads. But when w = Ω(log s), this can be seen to be equivalent, up to constant factors, to the definition we present here, where we have a decision-tree Uy that writes the updates on a separate location. In order to simulate a scheme that uses a read/write decision-tree, we may use a hash table with O(1) worst-case lookup time, such as cuckoo hashing. Then we have a read-only decision-tree U y (E(x)) whose output is the hash table containing all the i ∈ [s] which were updated by Uy(E(x)), associated with their final value in the execution of Uy(E(x)). 3 All our results will hold even if Q i is allowed to depend arbitrarily on x i . This makes for a less natural model, however, so we omit this from the definitions. The value ε is called the error probability of the scheme. Pătraşcu [19] considered this problem where f = DISJ is the Disjointness function: He conjectured that any scheme for the multiphase problem of DISJ must have max{t p , t u , t q } ≥ n δ for some constant δ > 0. Pătraşcu shows that such lower bounds on the multiphase problem for DISJ would imply polynomial lower bounds for various dynamic data structure problems. For example such lower bounds would imply that dynamic reachability requires t = Ω(n δ ). He also shows that these lower bounds hold true under the assumption that 3SUM has no sub-quadratic algorithms. Finally, Pătraşcu then defines a 3-player Number-On-Forehead (NOF) communication game, such that lower bounds on this game imply matching lower bounds for the multiphase problem. The game associated with a function f : {0, 1} n × {0, 1} n → {0, 1} is as follows: 1. Alice is given x 1 , . . . , x k ∈ {0, 1} n and i ∈ [k], Bob gets y ∈ {0, 1} n and i ∈ [k] and Charlie gets x 1 , . . . , x k and y. 2. Charlie sends a private message of 1 bits to Bob and then he is silent. 3. Alice and Bob communicate 2 bits and want to compute f (x i , y). Pătraşcu [19] conjectured that if 1 is o(k), then 2 has to be bigger than the communication complexity of f . However, this conjecture turned out to be false. The randomized communication complexity of DISJ is Ω(n) [20, 11, 3] , but Chattopadhyay et al. [7] construct a protocol for f = DISJ where both 1 , 2 = O √ n · log n . So the above communication model is more powerful than it appears at first glance. 4 However, a recent paper by Ko and Weinstein [12] succeeds in proving lower bounds for a simpler version of the multiphase problem, which translate to lower bounds for a restricted class of dynamic data structure schemes. They manage to prove a lower bound of Ω( √ n) for the simpler version of the multiphase problem which is associated with the Disjointness function f = DISJ. The main contribution of our paper is to generalize their lower bound to any function f which has large complexity according to the smooth corruption bound, under a product distribution. Disjointness is such a function [2] , but so is the Inner Product, Gap Orthogonality, and Gap Hamming Distance [21] . Our proof method is significantly different: Ko and Weinstein use information complexity to derive their lower-bound (similar to [3, 4] ), whereas we construct a large nearly-monochromatic rectangle. Our proof is reminiscent of [6] , but via a more direct bucketing argument. We furthermore show that this lower bound holds also for randomized schemes, and not just for deterministic schemes. Lower Bounds for Semi-adaptive Data Structures via Corruption of f is called semi-adaptive if any path on the decision-tree Q i : {0, 1} w s × {0, 1} w u → {0, 1} first queries the first part of the input (the E(x) part), and then queries the second part of the input (the U (E(x)) part). If D is randomized, then this property must hold for every randomized choice of Q i . We point out that the reading of the cells in each part is completely adaptive. The restriction is only that the data structure can not read cells of E(x) if it already started to read cells of U (E(x)). Ko and Weinstein state their result for deterministic data structures, i.e., ε = 0 thus the data structure always returns the correct answer. [12] ). Let k ≥ ω(n). Any semi-adaptive deterministic data structure that solves the multiphase problem of the DISJ function, must have either To prove the lower bound they reduce the semi-adaptive data structure into a low correlation random process. Theorem 4 (Ko and Weinstein [12] ). Let x 1 , . . . , x k be random variables over {0, 1} n and each of them is independently distributed according to the same distribution µ 1 and let y be a random variable over {0, 1} n distributed according to µ 2 (independently of x 1 , . . . , x k ). Let D be a randomized semi-adaptive scheme for the multiphase problem for a partial function Ko and Weinstein [12] proved Theorem 4 for the deterministic schemes for the DISJ function and in the case where µ 1 = µ 2 . However, their proof actually works for any (partial) function f and for any two, possibly distinct distributions µ 1 and µ 2 . Moreover, their proof also works for randomized schemes. The resulting statement for randomized schemes for any function f is what we have given above. To complete the proof of their lower bound, Ko and Weinstein proved that if we set p (and k) large enough so that I x i : y | z ≤ o(1) then such random variable z can not exist when f is the DISJ function. It is this second step which we generalize. Let f : X × Y → {0, 1} be a function and µ be a distribution over We will prove that the existence of a random variable z given by Theorem 4 implies that, for any b ∈ {0, 1}, any balanced product distribution µ and any function g which is "close" to f , there is a large (according to µ) ρ-almost b-monochromatic rectangle for g in terms of t q . This technique is known as smooth corruption bound [5, 6] or smooth rectangle bound [10] . We denote the smooth corruption bound of f as scb ρ,λ µ . size (under µ) at most 2 −s . We will define smooth corruption bound formally in the next section. Thus, if we use Theorem 4 as a black box we generalize Theorem 3 for any function of large corruption bound. Any semi-adaptive randomized scheme for the multiphase problem of f , with error probability bounded byε, must have either t u ·n ≥ Ω k/w , or We point out that Ω and O in the bound given above hide absolute constants independent of α, ε and λ. As a consequence of our main result, and of previously-known bounds on corruption, we will are able to show new lower-bounds of t q = Ω( n w ) against semi-adaptive schemes for the multiphase problem of the Inner Product, Gap Orthogonality and Gap Hamming Distance functions (where the gap is √ n). These lower-bounds hold assuming that t u = o( k wn ). They follow from the small discrepancy of the Inner Product, and from a bound shown by Sherstov on the corruption of the Gap Orthogonality following by a reduction to the Gap Hamming Distance [21] . This result also gives an alternative proof of the same lower-bound proven by Ko and Weinstein [12] , for the Disjointness function, of t q = Ω( √ n w ). This follows from the bound on corruption of Disjointness under a product distribution, shown by Babai et al. [2] . The paper is organized as follows. In Section 2 we give important notation, and the basic definitions from information theory and communication complexity. The proof of Theorem 5 appears in Section 3. The various applications appear in Section 4. We use a notational scheme where sets are denoted by uppercase letters, such as X and Y , elements of the sets are denoted by the same lowercase letters, such as x ∈ X and y ∈ Y , and random variables are denoted by the same lowercase boldface letters, such as x and y. We will use lowercase greek letters, such as µ, to denote distributions. If µ is a distribution over a product set, such as X × Y × Z, and (x, y, z) ∈ X × Y × Z, then µ(x, y, z) is the probability of seeing (x, y, z) under µ. We will sometimes denote µ by µ(x, y, z), using non-italicized lowercase letters corresponding to X × Y × Z. This allows us to to use the notation µ(x) and µ(y) to denote the x and y-marginals of µ, for example; then if we use the same notation with italicized lowercase letters, we get the marginal probabilities, i.e., for each x ∈ X and y ∈ Y If y ∈ Y , then we will also use the notation µ(x | y) to denote the x-marginal of µ conditioned seeing the specific value y. Then for each x ∈ X and y ∈ Y , we have We will also write (x, y, z) ∼ µ to mean that (x, y, z) are random variables chosen according to the distribution µ(x, y, z), i.e., for all (x, y, z) ∈ X ×Y ×Z, Pr[x = x, y = y, z = z] = µ(x, y, z). Naturally if W ⊆ X × Y × Z, then µ(A) = (x,y,z)∈A µ(x, y, z). We let supp(µ) denote the support of µ, i.e., the set of (x, y, z) with µ(x, y, z) > 0. We now formally define the smooth corruption bound and related measures from communication complexity, and refer the book by Kushilevitz and Nisan [13] for more details. At the end of this section we provide necessary notions of information theory which are used in the paper, and for more details on these we refer to the book by Cover and Thomas [8] . Let f : X × Y → {0, 1, * } be a partial function and µ(x, y) be a distribution over X × Y . We say that f is λ-close to a function g : be the set of ρ-almost b-monochromatic rectangles for f under µ. The complexity measure mono quantifies how large almost b-monochromatic rectangles can be [5] : Using mono we can define the corruption bound of a function as cb ρ µ (f ) = log Thus, if scb ρ,λ µ (f ) ≥ s then there is a b ∈ {0, 1} and a function g which λ-close to f under µ such that for any ρ-almost b-monochromatic rectangle for g under µ it holds that µ(R) ≤ 2 −s . The notion mono ρ µ is related to the discrepancy of a function: It is easy to see that for a total function f holds that disc µ (f ) ≥ (1 − 2ρ) · mono ρ µ (f ) for any ρ. Thus, Theorem 5 will give us lower bounds also for functions of small discrepancy. We define several measures from information theory. If µ (z), µ(z) are two distributions such that supp(µ ) ⊆ supp(µ), then the Kullback-Leibler divergence of µ from µ is With Kullback-Leibler divergence we can define the mutual information, which measures how close (according to KL divergence) is a joint distribution to the product of its marginals. If we have two random variables (x, y) ∼ µ(x, y), then we define their mutual information to be If we have three random variables (x, y, z) ∼ µ(x, y, z), then the mutual information of x and y conditioned by z is We present several facts about the mutual information, the proofs can be found in the book of Cover and Thomas [8] . Fact 6 (Chain Rule). For any random variables x 1 , x 2 , y and z holds that Since mutual information is never negative, we have the following corollary. For any random variables x, y and z holds that I x : y ≤ I x : y z . The 1 -distance between two distributions is defined as There is a relation between 1 -distance and Kullback-Leibler divergence. The Proof of Theorem 5 Let f : {0, 1} n × {0, 1} n → {0, 1, * } be a partial function. Suppose there is a semi-adaptive random scheme D for the multiphase problem of f with error probability bounded byε such that t u · n ≤ o k/w . Let µ(x, y) = µ 1 (x) × µ 2 (y) be a product distribution over {0, 1} n × {0, 1} n , such that µ(x, y) isα-balanced according to f . Let b ∈ {0, 1} and g : {0, 1} n × {0, 1} n → {0, 1, * } be a partial function which is λ-close to f under µ. We will prove there is a large almost b-monochromatic rectangle for g. Let x 1 , . . . , x k be independent random variables each of them distributed according to µ 1 and y be an independent random variable distributed according to µ 2 . Let the random variable z ∈ {0, 1} m and the index i ∈ [k] be given by Theorem 4 applied to the random variables x 1 , . . . , x k , y and the function f . For simplicity we denote x = x i . We will denote the joint distribution of (x 1 , . . . , x k , y, z) by µ(x 1 , . . . , x k , y, z). Note that here the notation is consistent, in the sense that µ(x i , y) = µ 1 (x i ) × µ 2 (y) for all i ∈ [k], x, y ∈ {0, 1} n . We will then need to keep in mind that µ(z) is the z-marginal of the joint distribution of (x 1 , . . . , x k , y, z) . By f (x, y) = * z m we denote the event that the random variable z m gives us the wrong answer on an input from the support of f , i.e. f (x, y) = * and f (x, y) = z m hold simultaneously. By Theorem 4 we know that Pr f (x, y) = * z m ≤ε. Since f and g are λ-close under µ, we have that µ is still balanced according to g and g(x, y) = * z m with small probability, as stated in the next observation. Let α =α − λ and ε =ε + λ. For the function g it holds that 1. The distribution µ(x, y) is α-balanced according to g. Pr g(x, y) = * z m ≤ ε. Proof. Let b ∈ {0, 1}. We will bound µ g −1 (b ) . α ≤ Pr f (x, y) Thus, by rearranging we get µ g −1 (b ) ≥α − λ = α. The proof of the second bound is similar: Let c be the bound on I x : y z and I y : z given by Theorem 4. Since I x : z ≤ I x : y z , we have I x : z , I y : z ≤ t q · w + o(t q · w) = c. We will prove that if we assume that t u · n < o k/w and we choose p large enough (p of Theorem 4) then we can find a By rearranging, we get the bound from Theorem 5. Let us sketch the proof of how we can find such a rectangle R. We will first fix the random variable z to z such that x and y are not very correlated conditioned on z = z, i.e., the joint distribution µ(x, y | z) is very similar to the product distribution of the marginals µ(x | z) × µ(y | z). Moreover, we will pick z in such a way the probability of error Pr g(x, y) = * z m |z = z is still small. Then, since µ(x, y | z) is close to µ(x | z) × µ(y | z), the probability of error under the latter distribution will be small as well, i.e., if (x , y ) ∼ µ(x | z) × µ(y | z), then Pr g(x , y ) = * z m will also be small. Finally, we will find subsets A ⊆ supp µ(x | z) , B ⊆ supp µ(y | z) of large mass (under the original distributions µ 1 and µ 2 ), while keeping the probability of error on the rectangle R = A × B sufficiently small. Let us then proceed to implement this plan. Let β = α − ε. We will show that β is a lower bound for the probability that z m is equal to b. Let γ be the bound on I x : y | z given by Theorem 4, i.e., I x i : y | z ≤ γ = O tu·n·w p . Lemma 10. There exists z ∈ Z such that Thus, by rearranging we get Pr z m = b ≥ α − ε = β. By expanding the information I x : y | z we find Similarly, for the information I x : z : and so Pr z∼µ(z) The bound for I y : z is analogous. Let e z = Pr µ g(x, y) = * z m |z = z . Then, Thus, by a union bound we may infer the existence of the sought z ∈ Z. Let us now fix z ∈ Z from the previous lemma. Let µ z (x, y) = µ(x, y | z) be the distribution µ(x, y) conditioned on z = z, and let µ z (x, y) = µ(x | z) × µ(y | z) be the product of its marginals. Let S be the support of µ z (x, y), and let S x and S y be the supports of µ z (x) and µ z (y), respectively, i.e., S x and S y are the projections of S into X and Y . Then Pinsker's inequality will give us that µ z and µ z are very close. Let δ = 10 β · γ. Proof. Indeed, by Pinsker's inequality, The right-hand side is 2 · D KL µ(x, y | z) µ(x | z) × µ(y | z) , which by definition of mutual information equals 2 · I x : y | z = z , and by Lemma 10 this is ≤ 10 For the sake of reasoning, let (x , y ) ∼ µ z (x, y) be random variables chosen according to to µ z . Let ε = 5 β · ε + δ. It then follows from Lemma 10 and Lemma 11 that: Proof. We prove that Pr g(x, y) = * z m | z = z − Pr g(x , y ) = * z m ≤ δ. Since Pr g(x, y) = * z m | z = z ≤ 5 β · ε by Lemma 10, the lemma follows. Let B = (x, y) ∈ S x × S y : g(x, y) = z m , g(x, y) = * . C V I T 2 0 1 6 Thus, we have the following. by triangle inequality and Lemma 11 Let c = 5 β · c. We will prove the ratio between µ z (x ) and µ(x ) is larger than 2 O(c ) with only small probability (when x ∼ µ z (x)). The same holds for µ z (y ) and µ(y ). Proof. We prove the lemma for µ z (x ), the proof for µ z (y ) is analogous. By Lemma 10 we know that We expand the Kullback-Leibler divergence: and then use the Markov inequality: We now split S x and S y into buckets C x and C y (for ≥ 1), where the -th buckets are In a bucket C x there are elements of S x such that their probability under µ z (x) is approximately 2 c -times bigger than their probability under µ(x). By Lemma 13, it holds that with high probability the elements x ∈ S x , y ∈ S y are in the buckets C x and C y for ≤ 2 7c . Thus, if we find a bucket C x 1 for 1 ≤ 2 7c which has probability at least 1 2 O(c ) under µ z (x), then it has also probability at least 1 2 O(c ) under µ(x). The same holds also for buckets C y . In the next lemma we will show that there are buckets C x 1 and C y 2 of large probability under µ z such that the probability of error on C x 1 × C y 2 is still small. Lemma 14. There exist buckets C x 1 and C y 2 such that Proof. We prove that 1 , 2 exist via the probabilistic method. Let 1 and 2 be the buckets of x and y , respectively. Thus Pr 1 = = Pr x ∈ C x and Pr 2 = = Pr y ∈ C y . Let B 1 , B 2 ⊆ L = {1, . . . , 2 7c } be sets of indices of small probability, i.e., for i ∈ {1, 2} Proof of Theorem 5. Suppose that t u · n ≤ o k/w . Let R be the rectangle given by Corollary 15. It holds that the rectangle R is 24ε -almost b-monochromatic for g under µ. Therefore, for the function g holds that We need to argue that ε is O(ε/α). By definition, We recall that Thus, we can set p to be large enough to δ be smaller than arbitrary constant and still p ≤ o(k). By the assumption we have 2ε < α. Thus, ε α−ε ≤ 2ε α and we conclude that , we get the result by rearranging Inequality (1). In this section we apply Theorem 5 to derive lower bounds for several explicit functions -Inner Product (IP), Disjointness (DISJ), Gap Orthogonality (ORT) and Gap Hamming Distance (GHD): IP(x, y) = i∈n x i · y i mod 2, Sherstov [21] provided a lower bound of communication complexity of GHD by lower bound of corruption bound of ORT n, 1 8 following by reduction to GHD. Theorem 18 (Sherstov [21] ). Let ρ > 0 be sufficiently small and µ 3 be a uniform distribution over {0, 1} n × {0, 1} n . Then, cb ρ µ3 (ORT n, 1 8 ) ≥ ρ · n. By this theorem and Theorem 5 we get a lower bound for data structures for ORT n, 1 8 . By reductions used by Sherstov [21] we also get a lower bounds for ORT and GHD. ORT n, 1 8 (x, y) = ORT 64n x 64 , y 64 ORT n (x, y) = GHD 10n+15 Where s i denote i copies of s concatenated together. Let D be a semi-adaptive random scheme for the multiphase problem of the presented functions with sufficiently small error probability. By the theorems presented in this section and by Theorem 5, we can derive the following lower bounds for t q · w, assuming that t u · n ≤ o k/w . Popular conjectures imply strong lower bounds for dynamic problems Complexity classes in communication complexity theory An information statistics approach to data stream and communication complexity How to compress interactive communication A strong direct product theorem for corruption and the multiparty communication complexity of disjointness Information complexity versus corruption and applications to orthogonality and gap-hamming A Little Advice Can Be Very Helpful Elements of Information Theory The cell probe complexity of dynamic data structures The partition bound for classical communication complexity and query complexity The Probabilistic Communication Complexity of Set Intersection An Adaptive Step Toward the Multiphase Conjecture Communication Complexity The cell probe complexity of dynamic range counting Crossing the logarithmic barrier for dynamic boolean data structure lower bounds Tight bounds for the partial-sums problem Towards polynomial lower bounds for dynamic problems Logarithmic lower bounds in the cell-probe model Towards Polynomial Lower Bounds for Dynamic Problems On the distributional complexity of disjointness The communication complexity of gap hamming distance Some complexity questions related to distributive computing (preliminary report) We will prove that with high probability we have 2 7c ≥ 1 > 1 and 1 ∈ B 1 . The proof for 2 is analogous.There is only small probability that 1 is in B 1 .Thus, we have that i ∈ B i or i = 1 or i > 2 7c with probability at most 2 3 + 2 2 c . By Lemma 12, we have that Pr g(x , y ) = * z m ≤ ε . By expanding the probability and by Markov inequality we will now get the last inequality for CWe will prove there is 1 and 2 such that e( 1 , 2 ) ≤ 6ε . This is equivalent to the third bound of the lemma. We have: ε ≥ Pr g(x , y ) = * z m = E e( 1 , 2 ) and thus, by Markov, Pr e( 1 , 2 ) ≤ 6ε ≤ 1 6 . By a union bound we conclude that there must exist 1 < 1 , 2 ≤ 2 7c such that Pr[ 1 = 1 ], Pr[ 2 = 2 ] ≥ 1 6·2 7c and e( 1 , 2 ) ≤ 6ε . As a corollary we will prove that the rectangle C x 1 × C y 2 (given by the previous lemma) is a good rectangle under the original distribution µ. where C x 1 and C y 2 are buckets given by Lemma 14. By Lemma 14, we getBy rearranging we getThe bound for Pr y ∈ C y 2 is analogous, thus we have Pr (x, y) ∈ R ≥ 1 36·2 26c . (Here and below, we crucially use the fact that x, y are given by a product distribution.) Now we prove the second bound for R. Let B = (x, y) ∈ R : g(x, y) = z m , g(x, y) = * .by definition of buckets Thus, by rearranging we getPr (x, y) ∈ B] ≤ 6ε · Pr (x, y) ∈ R · 1 2( 1 − 1)( 2 − 1) ≤ 24ε · Pr (x, y) ∈ R , as 1 2( 1 −1)( 2−1) ≤ 4 for 1 , 2 > 1 by Lemma 14.