key: cord-0043257-5vgktd3o authors: Amano, Kazuyuki title: On the Size of Depth-Two Threshold Circuits for the Inner Product Mod 2 Function date: 2020-01-07 journal: Language and Automata Theory and Applications DOI: 10.1007/978-3-030-40608-0_16 sha: 657f136f91c7dcccda9c0457a74902703b7e2822 doc_id: 43257 cord_uid: 5vgktd3o In this paper, we study the size of depth-two threshold circuits computing the inner product mod 2 function [Formula: see text] (mod 2). First, we reveal that [Formula: see text] can be computed by a depth-two threshold circuit of size significantly smaller than a folklore construction of size [Formula: see text]. Namely, we give a construction of such a circuit (denoted by [Formula: see text] circuit) of size [Formula: see text]. We also give an upper bound of [Formula: see text] for the case that the weights of the top threshold gate are polynomially bounded (denoted by [Formula: see text] circuit). Second, we give new lower bounds on the size of depth-two circuits of some special form; the top gate is an unbounded weight threshold gate and the bottom gates are symmetric gates (denoted by [Formula: see text] circuit). We show that any such circuit computing [Formula: see text] has size [Formula: see text] for every constant [Formula: see text]. This improves the previous bound of [Formula: see text] based on the sign-rank method due to Forster et al. [JCSS ’02, FSTTCS ’01]. Our technique has a unique feature that the lower bound is obtained by giving an explicit feasible solution to (the dual of) a certain linear programming problem. In fact, the problem itself was presented by the author over a decade ago [MFCS ’05], and finding a good solution is an actual contribution of this work. The problem of proving strong lower bounds on the size (i.e., the number of gates) of depth-two threshold circuits computing an explicit Boolean function is a big challenge in complexity theory. Currently, we cannot refute that every function in the class NEXP (non-deterministic exponential time) can be computed by a polynomial-size depth-two circuit consisting of threshold gates with unbounded weights (denoted by THR • THR circuit). There is a long line of research aiming for understanding the computational power and the limitation of depth-two threshold circuits (e.g, [5, 9, 10, 13, 14] or see an excellent book [12, Chapter 11.10] ). The strongest known lower bound on the size of THR • THR circuits for a function in NP is Ω(n 3/2 ) due to Kane and Williams [13] . In this paper, we focus on the size complexity of depth-two threshold circuits for the inner product mod 2 function: IP2 n (x 1 , . . . , x n , y 1 , . . . , y n ) = n i=1 (x i ∧ y i ) (mod2). The inner product mod 2 function IP2 n has been widely studied in the context of depthtwo threshold circuits (e.g., [7, 10, 13] ). It is a long standing open question whether IP2 n has a polynomial size depth-two threshold circuit with unbounded weights threshold gates in both layers. If we restrict the weights of threshold gates in one of two layers to be polynomial, then strong lower bounds are known. Let MAJ denote the class of threshold functions whose weights are bounded to be Z ∩ [−poly(n), poly(n)]. Hajnal et al. [10] proved that every MAJ • THR circuit computing IP2 n has size Ω(2 (1/3− )n ) using the discriminator method. An exponential lower bound were also shown by Nisan [16] using a communication complexity argument. Forster et al. [7, 8] proved that every THR • MAJ circuit computing IP2 n has size Ω( √ 2 n /poly(n)) by lowerbounding the sign-rank of the communication matrix of IP2 n . Note that IP2 n has an O(n) size threshold circuit of depth-three; in the first layer, we use n gates to compute x i ∧ y i for each i, and then in the second and third layer, we use O(n) gates to compute the parity of the outputs of them. If the gates at the bottom layer are restricted to be And, Exclusive-or or Symmetric gates, stronger lower bounds for IP2 n are known (see Table 1 ). Remark that, in recent years, several results providing the separation between depth-two and depth-three threshold circuits were given for realvalued functions (e.g., [6, 18] ). However, to the best of our knowledge, the arguments used in these works can not directly be applied for Boolean functions. The contribution of this work is twofold. First, we consider upper bounds on the size of depth-two threshold circuits for IP2 n . Although we know that lower bounds are more preferable, we pursuit upper bounds because we think that the lack of knowledge on good upper bounds for the problem is one of the reasons why we could not obtain a good lower bound. It is folklore that IP2 n can be computed by a THR • AND circuit (hence also by a THR • THR circuit) of size 2 n by applying the inclusion-exclusion formula. Namely, To the best of our knowledge, no asymptotically better bound has not been published. Note that IP2 n has 2n input variables and the construction via the DNF representation of IP2 n needs ∼ 3 n gates. In this work, we show that IP2 n has a depth-two threshold circuit of size significantly smaller than 2 n . Namely, we give an explicit construction of a THR • THR circuit of size O(1.682 n ) as well as a MAJ • THR circuit of size O(1.899 n ) computing IP2 n . The second contribution of this work is to give a new lower bound on the size of depth-two threshold circuits with some special restriction on the bottom gates. A symmetric gate is a gate that takes Boolean inputs whose output is depending only on the number of one's in inputs. Let THR • SYM denote depth-two circuits consisting of a threshold gate with unbounded weights at the top and symmetric gates at the bottom. In [7] , Forster established a breakthrough result that the sign-rank of the 2 n × 2 n Hadamard matrix is Ω( √ 2 n ). Here the sign-rank of a matrix M = (M i, j ) with nonzero entries is the least rank of a matrix A = (A i, j ) with M i, j A i, j > 0 for all i and j. By combining this result and a simple fact that the communication matrix of any symmetric function has rank at most n + 1, Forster et al. [8] established an Ω( √ 2 n /n) lower bound on the size of THR • SYM circuits for IP2 n . In this paper, we improve their bound to Ω((1.5 − ) n ). Although the improvement is somewhat limited, our method has a unique feature; the lower bound is obtained by giving an explicit feasible solution to a certain linear programming problem. Over a decade ago, building on the work of Basu et al. [3] , the author developed an LP-based method to obtain a lower bound on the size of THR • SYM circuits for IP2 n [1] . In [1] , we showed that the problem of obtaining a lower bound on the size of such circuits can be reduced to the problem of solving a certain linear programming problem. Then we solved an obtained linear programming problem over 2 16 variables using an LP solver to establish a lower bound of Ω(1.3638 n ) on the size of THR • SYM circuits for IP2 n . However, the problem of determining a highest lower bound that can be obtained by our LP-based method was left as an open problem in [1] . In this work, we show that this limit is in fact Ω((1.5− ) n ), surpassing the Ω( √ 2 n /n) bounds obtained by the sign-rank method. We achieve this by giving an explicit feasible solution to the dual of the linear programming problem presented in [1] and estimating the value of the objective function. Showing this is an actual contribution of the second part of this work. The rest of the paper is organized as follows. In Sect. 2, we introduce some notations. In Sect. 3, we give new upper bounds on the size of depth-two threshold circuits for IP2 n . Then in Sect. 4, we review an LP-based lower bounds method presented in our previous work [1] , and establish a new lower bound on the size of THR • SYM circuits for IP2 n . For an integer n ≥ 1, [n] denotes the set {1, 2, . . . , n}. The inner product mod 2 function IP2 n is a Boolean function over 2n variables defined by For a Boolean predicate P, let P denote the Iverson bracket function defined as P = 1 if P is true and P = 0 if P is false. Let x 1 , . . . , x n ∈ {0, 1} be Boolean variables. A linear threshold function is a Boolean function of the form for some w 1 , . . . , w n , t ∈ R. Similarly, an exact threshold function is a Boolean function of the form We call w 1 , . . . , w n the weights and t the threshold. It is well known that, without loss of generality, we can assume that the weights and the threshold are integers of absolute value 2 O(n log n) [15] . Hence, hereafter, we assume that the weights and the threshold are all integers. A gate that computes a linear threshold function is called a threshold gate. The class of all linear threshold functions (exact threshold functions, respectively) is denoted by THR (ETHR, respectively). As usual, a depth-two circuit such that the top gate computes a function in C, and every bottom gate computes a function in D is called a C • D circuit. For example, a THR • THR circuit is a depth-two circuit with threshold gates of unbounded weights in both layers. The size of a depth-two circuit is defined to be the number of gates in the bottom layer. The size complexity of a Boolean function f for C • D circuits is the minimum size of a C • D circuit computing f . A majority gate is a gate computing a linear threshold function with additional restriction that w i ∈ {−1, 1} for all i. Here the threshold t can be an arbitrary value, i.e., is not restricted to be the half of the number of input variables. The class of functions computed by a majority gate is denoted by MAJ. In our definition, a majority gate is allowed to read a variable multiple times. For example, we can say that the function is computed by a majority gate of fan-in 1 + 2 + 3 = 6. Remark that a majority gate is often defined as a gate that computes a linear threshold function with polynomially bounded weights. If we adapt this definition of majority gates, the size complexity may be reduced by at most a polynomial factor. However, such a difference will not affect all the results described in this paper. A function f : {0, 1} n → R is called symmetric if the value of f depends only on the number of ones in the input. A gate that computes a symmetric function is called a symmetric gate and the class of all symmetric functions is denoted by SYM. Note that a symmetric gate is usually defined as a Boolean gate, i.e., it outputs a binary value. In this paper, we extend the domain from {0, 1} to R. By this extension, the set of symmetric functions turns out to be closed under linear combinations. This property is useful when we view a threshold-of-symmetric circuit as (the sign of) a real polynomial (see Sect. 4.1). Note also that a symmetric gate can simulate all of AND, OR, the modulo gate. It can also simulate a restrict version of the majority gate where the gate reads each variable at most once and all the weights are restricted to be 1. In this section, we give upper bounds on the size of depth-two threshold circuits for IP2 n , which is significantly smaller than a folklore bound of O(2 n ). We begin with two simple lemmas about exact threshold functions. Both lemmas were appeared in [11] . Lemma 1 [11] . Suppose that a Boolean function f can be computed by a THR•ETHR circuit of size s. Then, f can be computed by a THR • THR circuit of size at most 2s. The same relationship holds for MAJ • ETHR and MAJ • THR circuits. Lemma 2 [11] . The AND of an arbitrary number of exact threshold functions is also an exact threshold function. In other words, the class of exact threshold functions is closed under the AND operation. Before stating our upper bounds, we describe an idea of our construction. Consider the function IP2 2 (x 1 , x 2 , y 1 , y 2 ). Define two exact threshold functions g 1 and g 2 as follows. It is easy to verify that Then, when n is even, IP2 n (x 1 , . . . , x n , y 1 , . . . , y n ) is given by By expanding the product in Eq. (1), we can obtain a polynomial of 3 n/2 terms in which each term is an AND of exact threshold functions. By Lemma 2, we can express each term by a single ETHR gate. Therefore, we have a THR•ETHR circuit of size O(3 n/2 ) = O(1.733 n ) for IP2 n , and also have a THR • THR circuit of the same order by Lemma 1. It is natural to expect that we can obtain a better bound by considering IP2 k for k > 2 as a base case. These ideas can be summarized as the following theorem. where w i ∈ Z and C i ∈ ETHR for i ∈ [ ]. Then, 1. The size complexity of IP2 n for THR • ETHR circuits as well as THR • THR circuits Proof (Sketch). First, observe that IP2 n is just a PARITY of n/k copies of IP2 k . Replace each IP2 k with a constructed -gate THR • ETHR circuit. The PARITY of n/k THR of ETHRs can be written as the sign of the product of n/k sums of ETHRs. Applying distributivity to the product of sums, we get a sum of n/k products of ETHRs. But the product of a bunch of ETHRs can be written as one ETHR, so we get a THR of n/k ETHRs, completing the proof of Statement 1 of the theorem. The proof for Statement 2 is similar. With the aid of computers, we found a formula of length 8 for IP2 4 as well as a formula of total weight 13 for IP2 4 that lead us to the following theorems. It is elementary to verify that IP2 4 (x 1 , . . . , x 4 , y 1 , . . . , y 4 ) = sgn ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ −3 + 2 i∈ [7] g i (z 1 , z 2 , z 3 , z 4 ) This gives a desired bound by Theorem 3. Proof of Theorem 5. Let {x 1 , . . . , x 4 , y 1 , . . . , y 4 } denote the input variables for IP2 4 . For i ∈ [4], we write z i := x i + y i . We introduce the following twelve exact threshold functions and write them as g 1 , . . . , g 12 . It is elementary to verify that [12] This gives a desired bound by Theorem 3. It is plausible that our bounds can further be improved by considering IP2 k for k ≥ 5 as a base case. We remark that, for the case of MAJ • THR circuits, the following argument says that there is a barrier at O( √ 2 n ): The proof of Theorem 5 actually gives a construction of MAJ • ETHR circuits for IP2 n . By applying the "discriminator lemma" developed in [10] carefully, we can prove an Ω(2 (1/2− )n ) lower bound on the size complexity of IP2 n for MAJ • ETHR circuits. Currently, we do not know such a barrier for THR • THR circuits. In this section, we show Ω((1.5 − ) n ) lower bounds on the size of depth-two circuits for IP2 n where the top gate is a threshold gate and the bottom gates are symmetric gates. In Sect. 4.1, we review our LP-based method presented in our previous work [1] , and then we establish the lower bound in Sect. 4.2. Throughout this section, we label the input variables of IP2 n as {x 1 , . . . , x 2n } and define IP2 n (x 1 , . . . , x 2n ) := i∈[n] x 2i−1 x 2i (mod 2). This indexing is different from the one used in the previous section, but will be convenient for a later discussion. As defined before, we call a depth-two circuit with unbounded weights threshold gate at the top and symmetric gates at the bottom as a THR • SYM circuit. For a Boolean function f , the size complexity of IP2 n for THR • SYM circuits is simply denoted by s( f ). Throughout of this section, we treat a THR • SYM circuit as the sign of a real polynomial. Definition 6. We say that a real polynomial P(x 1 , . . . , x n ) sign represents a Boolean function f on n variables if, for every (x 1 , . . . , x n ) ∈ {0, 1} n , We consider a polynomial P : where w S ∈ R and h S is a symmetric function over the set of variables S . The support of P is defined by supp(P) := {S ⊆ X | w S 0}. Obviously, s( f ) is equal to the minimum size of the support of a polynomial P of the form (2) that sign represents f . A point of our method is to define the parameter z k , which gives a lower bound on s( f ), by introducing a certain linear programming problem. Recall that the input variables of IP2 n is X := {x 1 , . . . , x 2n }. Let z 0 = z 1 = 1. For k ≥ 2, the parameter z k is defined inductively (on k) such that z k is the minimum value of the objective function of the following linear programming problem. Let X k = {x 1 , x 2 , . . . , x 2k } be the first 2k variables of X. The program has 2 2k real-valued variables {q T } T ⊆X k and 2k + 4 k 2 constraints. T :v∈T The key observation is the following. ). Suppose that k ≥ 2. Let z k−1 and z k−2 be real numbers such that s(IP2 n ) ≥ z k−1 · s(IP2 n−(k−1) ) and s(IP2 n ) ≥ z k−2 · s(IP2 n−(k−2) ) for every n. Let z k be the minimum value of the objective function of the LP problem (3) . Then s(IP2 n ) ≥ z k · (IP2 n−k ) The following corollary is immediate from Fact 7. In the following, we give a sketch of the proof of Fact 7 for completeness. Let f : {0, 1} X → R be a real function and ρ : X → {0, 1, * } be a partial assignment to X. Let res(ρ) denote the set of variables that assigned a constant by ρ, i.e., res(ρ) := {v ∈ X | ρ(v) * }. The restriction of f by ρ, denoted by f | ρ , is the function obtained by setting x i to ρ(x i ) if x i ∈ res(ρ) and leaving x i as a variable otherwise. The restriction of a polynomial P of the form (2), denoted by P| ρ , is defined similarly. First, replace each h S in P by h S | ρ , which is a symmetric function over the set of variables S − res(ρ). Then, if there are two (or more) functions h S 1 | ρ and h S 2 | ρ such that S 1 es(ρ) = S 2 es(ρ), then they are merged into a single symmetric function. This is possible by the fact that the linear combination of two (or more) symmetric functions over the same set of variables is also a symmetric function. For a polynomial P of the form (2), we decompose P into P T 's for T ⊆ X k in such a way that Letq T be the number of terms in P T . Note that We use the following fact that is easy to verify but useful. ). Let ρ 1 and ρ 2 be two partial assignments such that res(ρ 1 ) = res(ρ 2 ). Proof of Fact 7 (sketch). Suppose that a polynomial P of the form (2) sign-represents IP2 n . In what follows, we consider two types of pairs of partial assignments. Type 1. Choose i ∈ [k] and then choose u ∈ {x 2i−1 , x 2i }. The unchosen variable in {x 2i−1 , x 2i } is denoted by v. Let ρ 1 and ρ 2 be two partial assignments such that res(ρ 1 ) = res(ρ 2 ) = {x 2i−1 , x 2i }, (ρ 1 (v), ρ 1 (u)) = (0, 1) and (ρ 2 (v), ρ 2 (u)) = (1, 1). A key observation is that for every such pair of partial assignments (ρ 1 , ρ 2 ), we have IP2 n | ρ 1 ≡ IP2 n−1 and IP2 n | ρ 2 ≡ IP2 n−1 . This implies that the polynomial P| ρ 1 − P| ρ 2 sign represents IP2 n−1 . Fact 9 says that P T | ρ 1 − P T | ρ 2 is vanished if v T . Hence, we have where the last inequality follows from the assumption in the statement of Fact 7. Let q T :=q T /s(IP2 n−k ) for T ⊆ X k . By dividing both side of (4) by s(IP2 n−k ), we have which is the first constraint in the LP problem (3). We also consider another type of partial assignments. Type 2. Choose i, j ∈ [k] such that i j, and then choose v ∈ {x 2i−1 , x 2i } and u ∈ {x 2 j−1 , x 2 j }. Let v and u be the unchosen variables in {x 2i−1 , x 2i } and {x 2 j−1 , x 2 j }, respectively. Let ρ 1 and ρ 2 be two partial assignments such that res(ρ 1 ) = res(ρ 2 ) = {x 2i−1 , x 2i , x 2 j−1 , x 2 j }, (ρ 1 (v), ρ 1 (v ), ρ 1 (u), ρ 1 (u )) = (0, 1, 1, 0) and (ρ 2 (v), ρ 2 (v ), ρ 2 (u), ρ 2 (u )) = (1, 1, 0, 0). Similar to the case of Type 1, we have IP2 n | ρ 1 ≡ IP2 n−2 and IP2 n | ρ 2 ≡ IP2 n−2 , and hence P| ρ 1 −P| ρ 2 sign represents IP2 n−2 . In addition, P T | ρ 1 −P T | ρ 2 is vanished if |T ∩{u, v}| is zero or two. Hence, we have T :|{u,v}∩T |=1q where the last inequality follows from the assumption of the statement in Fact 7. This inequality is equivalent to which is the second constraint in the LP problem (3). If P is an optimal polynomial for IP2 n , then which is equivalent to Therefore, the minimum value z k of the objective function of the LP program (3) satisfies s(IP2 n ) ≥ z k · s(IP2 n−k ). This completes the proof of Fact 7. The LP problem (3) can easily be generated and solved by using a computer when k is small. In our previous work [1] , we have succeeded to solve these problems by an LP solver for k ≤ 8 (see Table 2 ). During this work, we could extend the table up to k = 10. The best lower bound obtained in this way is Ω(1.3808 n ), but still weaker than a bound of s(IP2 n ) = Ω( √ 2 n /n) due to Forster et al. [7, 8] . Obviously, the best possible lower bound that could be obtained by our approach is s(IP2 n ) ≥ z n ∞ where z ∞ := lim k→∞ (z k ) 1/k . However, finding the value of z ∞ was left as an open problem in [1] . Then, for x ∈ {0, 1} 2k , the first constraint in LP (6) can be written as On the complexity of depth-2 circuits with threshold gates On XOR lemmas for the weight of polynomial threshold functions Polynomials that sign represent parity and descartes' rule of signs Harmonic analysis of polynomial threshold functions A short list of equalities induces large sign rank The power of depth for feedforward neural networks A linear lower bound on the unbounded error probabilistic communication complexity Relations between communication complexity, linear arrangements, and computational complexity Majority gates vs. general weighted threshold gates Threshold circuits of bounded depth Exact threshold circuits Boolean Function Complexity, Advances and Frontiers Super-linear gate and super-quadratic wire lower bounds for depthtwo and depth-three threshold circuits On the computational power of depth-2 circuits with threshold and modulo gates Threshold Logic and Its Applications The communication complexity of threshold gates Lower bounds on threshold and related circuits via communication complexity Depth separations in neural networks: what is actually being separated Minimal sign representation of boolean functions: algorithms and exact results for low dimensions Acknowledgement. The author would like to thank anonymous referees for their helpful comments and suggestions. This work is supported in part by JSPS Kakenhi 18K11152 and 18H04090. In this section, we show that z ∞ ≥ 1.5 establishing a new lower bound on the size complexity of IP2 n for THR • SYM circuits.Theorem 10. For every k ≥ 1,Hence, s(IP2 n ) = Ω((1.5 − ) n ) for every > 0.Although we only prove the lower bound, we strongly believe that our bound on z ∞ is tight, i.e., z ∞ = 1.5. Note that s(IP2 n ) ≤ 2 n by the construction described in Introduction and the fact that AND is contained in SYM. To the best of our knowledge, this is the best known upper bound on s(IP2 n ) 1 .Proof of Theorem 10. The proof is done by giving a feasible solution to the dual of the LP problem (3), and then estimating the value of the objective function.We define Z k to beThe dual of (3) is given byThe LP duality theorem guarantees that the maximum value of the objective function in this dual program (6) equals to z k . Since LP (6) is a maximization problem, any feasible solution gives a lower bound on z k .Here we present a feasible solution to LP (6) that will be analyzed in the proof. Defineas follows:4k 2 if both of u and v are odd and t u,v = 0 otherwise. Note that we inspired this solution through actually solving LP (6) using an LP solver.In order to show the feasibility of s • t, it is enough to verify that the first constraint in LP (6) is satisfied. For x ∈ {0, 1} 2k , letThis can easily be verified by observing that the LHS of this equation is equal to, completing the proof of the feasibility of s • t. We proceed to the estimation of the value of the objective function. The proof is by the induction on k. For k ≤ 10, we can verify the theorem by a direct calculation (see Table 2 ). Suppose that k ≥ 11. By the definition of s • t and the inductive assumption, we haveBy an elementary but somewhat lengthy calculation, we can show that z k ≥ 1.5 k 1 − 1 √ k k as desired. The detailed calculations are omitted due to the page restriction and will appear in the full version of the paper.