key: cord-0646749-r1b974mu authors: Bu, Kaifeng; Koh, Dax Enshan; Li, Lu; Luo, Qingxian; Zhang, Yaobo title: Effects of quantum resources on the statistical complexity of quantum circuits date: 2021-02-05 journal: nan DOI: nan sha: 79d5f0e1ab5406f5a6274b0fc1b2c319f79fddd3 doc_id: 646749 cord_uid: r1b974mu We investigate how the addition of quantum resources changes the statistical complexity of quantum circuits by utilizing the framework of quantum resource theories. Measures of statistical complexity that we consider include the Rademacher complexity and the Gaussian complexity, which are well-known measures in computational learning theory that quantify the richness of classes of real-valued functions. We derive bounds for the statistical complexities of quantum circuits that have limited access to certain resources and apply our results to two special cases: (1) stabilizer circuits that are supplemented with a limited number of T gates and (2) instantaneous quantum polynomial-time Clifford circuits that are supplemented with a limited number of CCZ gates. We show that the increase in the statistical complexity of a quantum circuit when an additional quantum channel is added to it is upper bounded by the free robustness of the added channel. Finally, we derive bounds for the generalization error associated with learning from training data arising from quantum circuits. While quantum circuits are believed to provide an advantage over their classical counterparts, not all of them are capable of doing so. There are several well-known restricted classes of quantum circuits that can be shown to be efficiently simulable by a classical computer. These include the stabilizer circuits [27] , the matchgate circuits [28, 29] as well as these circuits augmented with various supplementary resources [30] [31] [32] [33] [34] . These classical simulation results show that if one hoped to outperform classical algorithms at a given machine learning task, it is necessary to utilize resources outside these classically simulable restricted classes of quantum circuits. A key step in both classical and quantum machine learning is the building of learning models based on training data. Here, the power of a learning model depends on its statistical complexity, i.e., its ability to fit functions, which has been quantified by various measures. These measures include the Vapnik-Chervonenkis (VC) dimension [35, 36] (which has been used to determine the sample complexity of PAC learning [37] and classical neural networks [38] ), the metric entropy (also known as covering number) [39] , the Rademacher complexity [40] (which has been studied in the context of classical neural networks [41] [42] [43] [44] ) and the Gaussian complexity [40] . In building learning models using quantum circuits, various quantum resources, or quantum effects, are typically at play. These include magic [45] [46] [47] , entanglement [48, 49] and coherence [50] [51] [52] . But how do changes in the amounts of these quantum resources affect the statistical complexity of these quantum-circuit-based learning models? In recent work [53] , we partially addressed this question by focusing on a specific resource, namely the resource of magic. In particular, we utilized the (p, q) group norm to quantify the amount of magic in quantum circuits and showed how the statistical complexity of the quantum circuit scales with the depth and width of the circuit and the amount of magic it contains. In this work, we extend our previous results and address the above question more generally by considering the Rademacher and Gaussian complexities as measures of model complexity and utilizing the framework of general resource theories [54, 55] , which offers a powerful paradigm for the quantification and operational interpretation of quantum effects [46] . In a quantum resource theory, quantum channels are categorized as being either a free channel or a resource channel. Free channels are those that are available or inexpensive and resource channels are those that are limited or expensive to use. In this work, we consider quantum-circuit-based learning models in the following two contexts: (1) quantum circuits with access to only a restricted set of channels O, and (2) quantum circuits with access to a restricted set of channels O together with an additional resource channel Ψ ∈ O (for example, we could take O to be the set of stabilizer circuits and Ψ to be the T gate). We show that by adding a resource channel to a set of free channels, the Radamacher and Gaussian complexities are increased by an amount that is bounded by the free robustness of the resource channel multiplied by the number of times the channel is used. Using this result, we derive an upper bound on the generalization error associated with learning from the training data arising from such circuits. Consider an n-qubit quantum circuit that implements a quantum channel 1 Φ. For example, Φ = Φ(θ ) could represent a parametrized quantum circuit with gates parametrized by the parameters θ ∈ R α (for an example, see Fig. 1 ). Let x ∈ F n 2 be an n-bit input string (for example, x could be the binary representation of a collection of pixel values of an image of a handwritten digit). By Born's rule, if we feed the computational basis state | x into the circuit Φ and make a measurement in the computational basis, the probability of measuring y ∈ F n 2 is given by where f Φ : F n 2 × F n 2 → [0, 1] is a real-valued function induced by the channel Φ that maps input-output pairs ( x, y) to probability values. Let Ω be a set of quantum channels. We define the function class F(Ω) as follows: (2) We now introduce the Rademacher and Gaussian complexities [40] , which quantify the richness of sets of real-valued functions and can be used to provide bounds for the generalization error associated with learning from training data. Let G be a set of real-valued functions and let S = (z 1 , . . . , z m ) ∈ R m be a set of m samples. The (empirical) Rademacher complexity of G with respect to S iŝ where the expectation is taken over i.i.d. Rademacher random variables, i.e., ε i ∼ Rad for each i ∈ {1, . . . , m}. Recall that the Rademacher random variable X has probability mass function Similarly, the (empirical) Gaussian complexity of G with respect to S iŝ where the expectation is taken over the i.i.d. random Gaussian variables with zero mean and unit variance, i.e., ε i ∼ N (0, 1) for each i ∈ {1, . . . , m}. Note that the empirical Rademacher and Gaussian complexities depend on the samples S = (z 1 , . . . , z m ). By averaging over samples S taken from a product distribution D m , we obtain the expected Rademacher and Gaussian complexities: In the rest of the main text, we will focus on the Rademacher complexity; similar results hold for the Gaussian complexity, which we relegate to Appendix E. C. Statistical complexity in the quantum resource theory framework Quantum resource theories are characterized by a restricted set of channels, called free channels, which map free states to free states; any channel that is not a free channel is called a resource channel. Let O be a set of n-qubit free channels and let Ψ / ∈ O be an n-qubit resource channel. Define O Ψ := O ∪ { Ψ } to be the class of channels formed by appending Ψ to O. In addition, to take into account the case where the resource channel is used more than once, for each k ∈ Z + , define and at most k of the Φ i 's are Ψ . (8) It is easy to see that the above sets form a nested hierarchy In this work, we will be interested in the statistical complexities of the function classes F(Ω) formed by taking Ω to be the sets in the nested hierarchy in Eq. (9), where F(·) is given by Eq. (2). We first consider the Rademacher complexity of F(O Ψ ). Theorem 1. Given m independent samples S = ( z 1 , . . . , z m ) and a resource channel Ψ, the Rademacher complexity of F(O Ψ ) is bounded as follows: (10) where γ(Ψ) is the free robustness of Ψ with respect to the set O, defined as Therefore, for any probability distribution D on the sample space, if each sample z i is chosen independently according to D for i = 1, . . . , m, we have The proof of Theorem 1 is presented in Appendix B. Theorem 1 tells us that with access to the resource channel, the Rademacher complexity is bounded by the free robustness of the channel. Next, let us consider the case where the resource channel can be used multiple times. In this case, the relevant function class is that defined by Eq. (8). By Eq. (9), the following relationship follows immediatelŷ Theorem 2. Given m independent samples S = ( z 1 , . . . , z m ) and a resource channel Ψ, we have the following bound where γ * = min { 1 + 2γ max,n , (1 + 2γ(Ψ)) k }, and γ max,n is the maximal free robustness over quantum channels on n qubits. Therefore for any probability distribution D on the sample space, with each sample z i chosen independently according to D for i = 1, . . . , m, we have The proof of Theorem 2 is presented in Appendix C. Theorem 2 tells us that the Rademacher complexity for the case where we have access to multiple copies of a resource channel has an upper bound that depends on the Rademacher complexity for the case where there is no resource channel, the free robustness of the resource channel Ψ and the number of times Ψ is used. Now, let us give some examples to illustrate our results. Example 1: Consider quantum circuits whose gates all belong to the Clifford group, and denote the set of Clifford channels associated with such circuits by ST AB. Of interest to us is the Rademacher complexity of F(ST AB) with respect to m independent samples S = ( z i ) m i=1 , denoted byR S (F(ST AB)). As we shall show in Appendix D, we get the following bound for stabilizer circuitŝ . Now, while such circuits can be efficiently simulated on a classical computer, by the Gottesman-Knill theorem [27] , circuits formed from the Clifford+T universal gate set, where T = diag[1, e iπ/4 ], are believed to preclude efficient classical simulation [56, 57] . This motivates us to consider quantum circuits consisting of both Clifford gates and the T gate. We define the set ST AB (k) T to be the set of quantum channels formed from Clifford unitaries and at most k T gates. As we shall show in Appendix D, the following upper bound holds for the Rademacher complexity of ST AB (k) T : where we used the fact that the free robustness of the T gate is upper bounded by √ 2/2. Example 2: Consider the instantaneous quantum polynomialtime (IQP) circuits, a restricted model of quantum computation that has been proposed as a candidate for demonstrating quantum computational supremacy in the near term [6, 8, 9, 58] . The structure of IQP circuits is quite simple: each circuit has the form H ⊗n DH ⊗n , where D is a subcircuit with gates chosen from { Z,CZ,CCZ } (see Fig. 2 ). Let us define I to be the set of IQP circuits for which the gates in D are from the gate set { Z,CZ } and which contains at least one CZ gate (the case in which the circuits do not contain a CZ gate is trivial). Note that each circuit in I is also a Clifford circuit. Moreover, I is a finite set, and the size of I is O(2 n 2 ). Thus, we have the following bound While I can be efficiently simulated on a classical computer [27] , IQP circuits formed from the gate set I +CCZ are hard to simulate classically [6, 8, 58] , which motivates us to consider the set I (k) CCZ of IQP circuits with at most k CCZ gates. As we shall show in Appendix D, the following bound holds: . Given a sample z = ( x, y) (e.g., y = g( x) for some unknown function g), let us consider the loss function l( z i , Φ) = 1 − f Φ ( z i ) where f Φ is defined by Eq. (1). Then the expected error with respect to some unknown probability distribution D on Z n 2 × Z n 2 is er D (Φ) = E z∼D l( z, Φ) Given m independent samples S = ( z 1 , . . . , z m ), the empirical error is The difference between er S and er D is called the generalization error, which determines the performance of the function f on the unseen data drawn from the unknown probability distribution. The Rademacher complexity provides a bound on the generalization error by the following result. where γ * = min { (1 + 2γ(Ψ)) k , 1 + 2γ max,n }. In this paper, we investigated the effects of quantum resources on the statistical complexity of quantum circuits. We considered the Rademacher and Gaussian complexities of the quantum-circuit-based learning model in two cases: (1) quantum circuits with access to only a restricted set of channels O, and (2) quantum circuits with access to a restricted set of channels O together with an additional resource channel Ψ ∈ O. We show that by adding a resource channel to a set of free channels, the Radamacher and Gaussian complexities are increased by an amount that is bounded by the free robustness of the resource channel multiplied by the number of times the channel is used. We applied our results to two special cases: (1) stabilizer circuits that are supplemented with a limited number of T gates and (2) instantaneous quantum polynomial-time Clifford circuits that are supplemented with a limited number of CCZ gates. Using this result, we derive an upper bound on the generalization error associated with learning from the training data arising from such circuits. Our results reveal a new connection between quantum resources and the statistical complexity of quantum circuits, which paves the way for further research into the statistical complexity of learning models based on quantum circuits, like the variational quantum eigensolver and the quantum neural network. Furthermore, from a quantum resource theoretic point of view, our results also provide a new operational interpretation of free robustness in general resource theories. While we focused on the quantum circuit model in this paper, it will be interesting to generalize our results to other computational models such as measurement-based quantum computation (MBQC), tensor networks, etc. Besides the Rademacher and Gaussian complexities, there are also other measures of statistical complexity of function classes, such as the metric entropy, the VC dimension (or more generally, the pseudo-dimension [59] ), and the topological entropy [60] . It will be interesting to see the effects of quantum resources using these other measures of statistical complexity. We leave this problem for further research. where { ε i } i are i.i.d Rademacher random variables. Proposition 5 ( [40, 61] ). The Rademacher complexity satisfies the following properties: (2) For any c ∈ R, we haveR (3) For any c ∈ R m , we haveR where A + c : (5) Given a Lipschitz function φ : R → R with Lipschitz constant L and φ (0) = 0, we havê When the set A is finite, Massart's lemma gives an upper bound for the Rademacher complexity of A. Lemma 6 (Massart's lemma [61] ). Given a finite set A ⊂ R m , then we havê where |A| denotes the size of the finite set A. We now state an important result of the Rademacher complexity, which allows it to be estimated from a single sample set S = (z 1 , . . . , z m ). and (A9) Appendix B: Proof of Theorem 1 Proof. First, let us rewrite the Rademacher complexity as followŝ where ε = (ε 1 , ..., ε m ) ∈ { ± } n , f = ( f ( z 1 ), ..., f ( z m )) and ε, The inequalityR S (F(O)) ≤R S (F(O Ψ )) comes directly from the definition of Rademacher complexity and the fact that O ⊂ O Ψ . Hence, we only need to prove thatR Let us define the set A as follows To finish the proof, we need the following two lemmas about the basic properties of the set A defined in (B3). Lemma 8. Given the set A defined in (B3), we have Proof. Based on the definition of the set A, we have where the first and second inequality comes from the fact that Φ 1 , Φ 2 ∈ Conv(O), and the last inequality comes from Lemma 9. for any k ≥ 0. Proof. By the definition of free robustness, there exist channels Φ 1 , Φ 2 ∈ Conv(O) such that Therefore, for any channel Φ ∈ O where the first equality comes from that fact thatR S (∑ i F i ) = ∑ iRS (F i ) where each F i is a function class, and the facts that Rademacher complexity is invariant under convex combination andR S (cF) = |c|R s (F). We are now ready to prove the lemma. Proof. Based on Lemma 10, we have the following inequality: Besides, for any Φ ∈ O (k) where γ ≤ γ max,n . Therefore, we have Quantum algorithms for supervised and unsupervised machine learning Quantum machine learning: what quantum computing means to data mining Quantum machine learning Quantum machine learning: a classical perspective Machine learning & artificial intelligence in the quantum domain: a review of recent progress Classical simulation of commuting quantum computations implies collapse of the polynomial hierarchy The computational complexity of linear optics Average-case complexity versus approximate simulation of commuting quantum computations How many qubits are needed for quantum computational supremacy Quantum algorithm for linear systems of equations Quantum Recommendation Systems Quantum support vector machine for big data classification Quantum principal component analysis Quantum Computing in the NISQ era and beyond Noisy intermediate-scale quantum (NISQ) algorithms Variational quantum algorithms A variational eigenvalue solver on a photonic quantum processor Quantum chemistry in the age of quantum computing Generalized unitary coupled cluster wave functions for quantum computation A quantum approximate optimization algorithm Classification with quantum neural networks on near term processors Quantum circuit learning Quantum machine learning in feature Hilbert spaces Supervised learning with quantum-enhanced feature spaces Trainability of dissipative perceptron-based quantum neural networks Training deep quantum neural networks The Heisenberg representation of quantum computers Quantum circuits that can be simulated classically in polynomial time Matchgates and classical simulation of quantum circuits Classical simulation complexity of extended Clifford circuits Further extensions of Clifford circuits and their classical simulation complexities Efficient classical simulation of matchgate circuits with generalized inputs and measurements Efficient classical simulation of Clifford circuits with nonstabilizer input states Computational power of matchgates with supplementary resources On the uniform convergence of relative frequencies of events to their probabilities Necessary and sufficient conditions for the uniform convergence of means to their expectations Learnability and the Vapnik-Chervonenkis dimension Nearly-tight VC-dimension bounds for piecewise linear neural networks ε-entropy and ε-capacity of sets in functional spaces Rademacher and Gaussian complexities: Risk bounds and structural results Norm-based capacity control in neural networks Spectrally-normalized margin bounds for neural networks Exploring generalization in deep learning Sizeindependent sample complexity of neural networks The resource theory of stabilizer quantum computation Application of a Resource Theory for Magic States to Fault-Tolerant Quantum Computing Quantifying the magic of quantum channels Quantum entanglement An introduction to entanglement measures Quantifying superposition Quantifying coherence Colloquium: Quantum coherence as a resource On the statistical complexity of quantum circuits A mathematical theory of resources Quantum resource theories Adaptive quantum computation, constant depth quantum circuits and Arthur-Merlin games Classical simulation of quantum computation, the Gottesman-Knill theorem, and slightly beyond Achieving quantum supremacy with sparse and noisy commuting quantum computations Pseudo-dimension of quantum circuits Depth-width trade-offs for neural networks via topological entropy Understanding machine learning: From theory to algorithms The theory of quantum information Improved simulation of stabilizer circuits In this section, we list several basic properties of the Rademacher complexity, which may be found in [61] . Given a subset A of R m , the Rademacher complexity of A is defined aŝAppendix D: Proof of Example 1 and 2 By Choi's representation of quantum channels [62] , the function f Φ can be written as followswhereSince Φ is a (unitary) stabilizer circuit and |Λ is a pure stabilizer state, Φ ⊗ I(|Λ Λ|) is a stabilizer state on 2n qubits. Since the number of pure stabilizer states on 2n qubits is 2 (0.5+o(1))(2n) 2 [63] , let us consider the vectorHence, we havewhere the first inequality comes from Massart's Lemma (see Lemma 6) and the second inequality comes from the fact that · 2 ≤ √ m · ∞ . Now, let us assume that we have access to the T gate. In this case, let us define the corresponding sets of quantum channels ST AB T and ST AB 2/2 (we leave open the question about whether this bound is tight) as the T gate written as a quantum channel Φ T may be decomposed as followswhere S = diag[1, i] is the phase gate and Z = diag[1, −1] is the Pauli Z gate . By Theorem 2, we get the following upper bound on the Rademacher complexity of ST AB (k)T : Since I are the IQP circuits with only Z and CZ as internal gates, I is a finite set with size |I| = O(2 n 2 ). Hencêwhere the first inequality comes from Massart's Lemma (see Lemma 6) and the second inequality comes from the fact thatNow, let us consider IQP ciruits with access to the CCZ gate. Let us define I (k)CCZ to be the set of IQP circuits with at most k CCZ gates. Then, the size of ITherefore, by Massart's Lemma, we havêAppendix E: Results about the Gaussian complexityIn the main text, we focused on the Rademacher complexity. In this appendix, we will show that similar results hold for the Gaussian complexity.Theorem 11. Given m independent samples S = ( z 1 , ..., z m ) and a resource channel Ψ, then we have the following boundwhere γ(Ψ) is the free robustness with respect to the set O, that is,Therefore, for any probability distribution D on the sample space, if each sample z i is chosen independently according to D for i = 1, . . . , m, then we haveProof. The proof is the same as that for the Rademacher complexity, except that we will need to replace the set A in Lemma 8 and Lemma 9 byTheorem 12. Given m independent samples S = ( z 1 , . . . , z m ) and a resource channel Ψ, we have the following boundwhere γ * = min { 1 + 2γ max,n , (1 + 2γ(Ψ)) k }, and γ max,n is the maximal free robustness over quantum channels on n qubits. Given a probability distribution D on the sample space, if each sample z i chosen independently according to D for i = 1, . . . , m, then we haveProof. This result also holds for the Gaussian complexity because the Gaussian complexity also satisfies convexity and invariance under convex combination. Given a set of real-valued functions F, the Rademacher and Gaussian complexity with respect to a given sample S = (z 1 , ...., z m ) may alternatively be defined as follows:where the expectation is taken over the i.i.d Rademacher variables ε 1 , ε 2 , ..., ε m , andwhere the expectation is taken over i.i.d random Gaussian variables with zero mean and variance 1, i.e., g i ∼ N (0, 1). As the only difference betweenR S andR S is that the latter involves taking an absolute value | · |, it is easy to see thatR S and G S satisfy the following bounds. where γ * = min { (1 + 2γ(Ψ)) k , 1 + 2γ max,n } and γ max,n is the maximal free robustness over quantum channels on n qubits.The proof is the same as that of Theorem 2.