key: cord-0565409-cee9j4jn authors: Watkins, William M; Chen, Samuel Yen-Chi; Yoo, Shinjae title: Quantum machine learning with differential privacy date: 2021-03-10 journal: nan DOI: nan sha: 09b24e1dee33afca351c872b70a77ed6eec4e5ca doc_id: 565409 cord_uid: cee9j4jn Quantum machine learning (QML) can complement the growing trend of using learned models for a myriad of classification tasks, from image recognition to natural speech processing. A quantum advantage arises due to the intractability of quantum operations on a classical computer. Many datasets used in machine learning are crowd sourced or contain some private information. To the best of our knowledge, no current QML models are equipped with privacy-preserving features, which raises concerns as it is paramount that models do not expose sensitive information. Thus, privacy-preserving algorithms need to be implemented with QML. One solution is to make the machine learning algorithm differentially private, meaning the effect of a single data point on the training dataset is minimized. Differentially private machine learning models have been investigated, but differential privacy has yet to be studied in the context of QML. In this study, we develop a hybrid quantum-classical model that is trained to preserve privacy using differentially private optimization algorithm. This marks the first proof-of-principle demonstration of privacy-preserving QML. The experiments demonstrate that differentially private QML can protect user-sensitive information without diminishing model accuracy. Although the quantum model is simulated and tested on a classical computer, it demonstrates potential to be efficiently implemented on near-term quantum devices (noisy intermediate-scale quantum [NISQ]). The approach's success is illustrated via the classification of spatially classed two-dimensional datasets and a binary MNIST classification. This implementation of privacy-preserving QML will ensure confidentiality and accurate learning on NISQ technology. name and the confidence levels outputted from the "black box" model. Furthermore, in many applications, a hostile adversary also may have access to the model parameters. In mobile applications, the model usually is stored on the device to reduce communication with a central server [39] . Differential privacy (DP) is an optimization framework to address these issues. DP involves a trade-off of accuracy and power to protect the identity of data. Differentially private QML will allow private and efficient processing of big data. We hypothesize that the benefits of QML will offset the decrease in accuracy arising from DP. This research aims to create a hybrid quantum-classical model based on a variational quantum circuit (VQC) and train it using a differentially private classical optimizer. The classification of two-dimensional (2D) data to two classes is used to test the efficiency of the DP-VQC. As controls in the experiment, we will compare its accuracy to classical neural networks (with and without DP) and a non-private quantum circuit. Two classification tasks are used as benchmarks to compare the efficiencies of private and non-private VQCs to their classical analogs. The novel work detailed in Section III represents the main contribution of this research, exploring how we develop a novel framework that ensures privacy-preserving QML and employ it in two benchmark examples (as follows): • Demonstrate differentially private training on VQC-based ML models. • Demonstrate that a (ε,10 −5 )-DP VQC trains to accuracies exceeding 90% for an MNIST task with ε between 0.5 and 1.0. Section II introduces the concept of differentially private ML and the required QML background. Section III illustrates the proposed differentially private QML. Section IV describes the experimental settings and performance of the proposed differentially private quantum learning and is followed by additional discussions in Section V. Section VI is the conclusion. Supervised learning is an ML paradigm that learns or trains a function that maps the input to output given the input-output pairs [40] . That is, given the training dataset {(x i , y i )}, it is expected that after successful training, the learned function f θ is able to output the correct or approximate value y j provided the testing case x j . To make the training possible, we must specify the loss function or cost function L(ŷ, y), which defines how close the output of the ML modelŷ = f θ (x) is to the ground truth y. The learning or training of an ML model generally aims to minimize the loss function. In classification tasks, the model is trained to output discrete labels or the targets y given the input data x. For example, in computer vision applications, it is common to train ML models to classify images. The most famous example is the MNIST dataset [41] . In MNIST, there are around 5000 images of handwritten digits of the numbers 0-9. In this case, the ML model is trained to output the probability distribution P (y i |x). Here, P (y i |x) represents the probability of label y i of each number i ∈ {0 · · · 9} given the input data, which is a image in this scenario. In classification, the cross-entropy loss is the common choice for the loss function. It can be written in the following formulation: where • M = the number of classes. • log = the natural log. • y o,c = the binary indicator (0 or 1) if class label c is the correct classification for observation o. •ŷ o,c = the predicted probability observation o is of class c. The loss function then is used to optimize the model parameters θ. In the current DL practice, the model parameters are updated via various gradient descent methods [42] . The "vanilla" form of gradient descent is: where θ is the model parameter, L is the loss function, and η is the learning rate or the step-size of each updating step. Mini-batch stochastic gradient descent (SGD) simplifies ML by approximating the loss gradient when the dataset is large or when it is impractical to calculate the loss for the whole dataset at once. Suppose the training data include N points, then define a randomly sampled subset of points B. This is the mini-batch. Equation 3 approximates the gradient from the whole training set 1 N i ∇ θ L(f θ (x i ), y i ) with a loss gradient calculated for a subset of the training set, the mini-batch. where B is the mini-batch set randomly sampled from the complete set of inputs and associated ground truth labels. This batch gradient is used in the step update rule instead of the total loss gradient θ ← θ − ηg B . The batch gradient is recalculated N/|B| times per epoch, and the model parameters are updated for each gradient batch. However, this vanilla form does not always work. For example, it may be easily stuck in local optima [42] , or it can make the model difficult to train or converge. There are several gradient-descent variants that are successfully applied in DL [42] [43] [44] . Based on previous works [21, 30] , we use the RMSProp optimizer to optimize our hybrid quantum-classical model. RMSProp [43] is a special kind of gradient-descent method with an adaptive learning rate that updates the parameters θ as: where g t is the gradient at step t and E [g 2 ] t is the weighted moving average of the squared gradient with E[g 2 ] t=0 = g 2 0 . In this paper, the hyperparameters are set for all experiments as follows: learning rate η = 0.05, smoothing constant α = 0.9, and = 10 −8 . Because of the power of superposition and entanglement generated by quantum gates, quantum computing can create a huge speedup in certain difficult computational tasks and afford quantum advantages to ML [45, 46] . A qubit is the basic unit of quantum information processing that can consist of any two state system, i.e., the spin of an electron or polarization of a photon. Such a state will be written as |ψ = α |1 + β |0 , where the probability of measuring |1 and |0 is |α| 2 and |β| 2 , respectively. Because all classical operations can be considered a set of reversible logical operations, analogous quantum operations can be formalized. These operators are unitary and can be thought of as successive rotations, such that the logic operators are equivalent to quantum rotations. The basic components of quantum rotations are the Pauli matrix, With the Pauli matrix, we can define the single-qubit rotation along each of the X, Y , and Y -axis as follows: The general single-qubit rotation can be constructed with two of the single-qubit rotations R x , R y , and R z . For example, the quantum NOT gate also is known as the "Pauli-X gate," which corresponds to a π rotation about the X-axis [47] . The true power of quantum computing stems from quantum entanglement, which can be achieved by using two-qubit quantum gates. The controlled-NOT (CNOT) gate, shown in Eq. 9, is a gate commonly used to entangle qubits. It reverses the state of second qubit if the first qubit (control qubit) is in the |1 state. Its operation on the quantum state can be described in the following circuit diagram: where |Ψ is a single-qubit state. Concretely, if the |Ψ is in the state α |0 + β |1 , which means the system is in |Ψ ⊗ |0 , then under the CNOT operation, the state will The set of CNOT and single-qubit rotation operators allows for a rich group of quantum algorithms that already have been shown to be faster than their classical counterparts, for example, in factorization problems [48] and database searching [49] . The quantum algorithm output is the observation of the final quantum state. On a real quantum computing device, the expectation values can be retrieved through repeated measurements (shots). In simulation, the expectation values 0| U † 0 U † 1 · · · U † n U n · · · U 0 U 1 |0 can be calculated analytically. For a more detailed review of quantum computing, measurements, and algorithms, refer to [47, 50, 51] . In recent years, quantum computing has become feasible due to many breakthroughs in condensed matter physics and engineering. Companies, such as IBM [52] , Google [7] , and Dwave [53] , are creating NISQ devices [6] . However, noise limits the reliability and scalability in which quantum circuits can be used. For example, quantum algorithms requiring large numbers of qubits or circuit depth cannot be faithfully implemented on these NISQ devices. Because current cloud-based quantum devices are not suitable for the training described in this research, quantum circuit simulators are used [54] . VQCs are a special kind of quantum circuit, equipped with tunable or learnable parameters that are subject to iterative optimization [11, 12] . Figure 1 presents the basic components of a VQC. VQCs potentially can be robust against device noise as they can absorb the noise effects into their parameters in the optimization process [9, 10] . Numerous efforts have been made to design quantum algorithms based on VQCs [9, 10] , including the calculation of chemical ground states [8] and optimization problems [55] . Several theoretical studies have shown that VQCs are more capable than conventional deep neural networks [56] [57] [58] [59] in terms of the number of parameters or convergence speed. Recent results have numerically demonstrated that certain quantum architectures can perform better than their classical counterparts under specific conditions. For example, quantum convolutional neural networks (QCNNs) can learn faster (with fewer training epochs) than classical CNNs and reach higher accuracies, even when the number of parameters are similar [60, 61] . In [21] , a demonstration shows that a quantum long short-term memory (LSTM) can learn much faster (i.e., reach comparable accuracies with fewer training epochs) than a classical LSTM in function approximation tasks when the number of parameters are similar. VQCs have been applied in several classic ML tasks, such as classification [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [60] [61] [62] , function approximation [11, 21] , solving differential equations [22] , sequential learning [21, 33, 63] , and generative modeling [23] [24] [25] [26] . Recent results have demonstrated the successful application of VQCs in the forefront of ML, for example, in metric learning [28, 29] , deep reinforcement learning [30] [31] [32] 64] , and speech recognition [34] . Many technology companies collect data about the online presence of their users, and these data are shared, sometimes publicly, to use in focused marketing. This can create a breach in privacy because anonymizing data requires more than just erasing the name from each data entry [65] . Privacy also can be breached by ML models that use crowdsourced information and data scraped from the Internet. Previous studies have shown that models memorize their training samples, and even models with millions of parameters can |0 R y (arctan(x 1 )) R z (arctan(x 2 1 )) • R(α 1 , β 1 , γ 1 ) and R z (arctan(x 2 i )) represent rotations along the y-and z-axis by the given angle arctan(x i ) and arctan(x 2 i ), respectively. Arctan is used because the input values are not in the interval of [−1, 1]. The CNOT gates are used to entangle quantum states from each qubit and R(α, β, γ) represents the general single qubit unitary gate with three parameters. The parameters labeled R y (arctan(x i )) and R y (arctan(x 2 i )) are for state preparation and are not subject to iterative optimization. Parameters labeled α i , β i and γ i are optimized iteratively. The dashed box denotes one layer of a quantum subcircuit. The dial to the far right represents that the circuit has one output that is the σ z measurement of the first qubit. General information Private information FIG. 2: Information in data under the view of differential privacy. In a DP context, general information is that of the entire population in the data. On the other hand, private information is specific to a particular data entry. be attacked to output memorized data [37] . Section I detailed the necessity of protecting information through privacy-preserving training algorithms. In other words, anonymizing data requires more than just censoring personally identifiable information (PII) from each data entry [65] . The solution requires using DP to curtail privacy leaks. DP is a powerful framework to restrict the information that adversaries can obtain from attacking a trained ML model, but it is not an all-powerful technique. There are two kinds of information under the perspective of DP: general information and private information. General information refers to the information that does not specify any particular data entry and can be seen as the general property of the underlying population. On the other hand, private information refers to the information that is specific to any individual data entry ( Figure 2 ). For a concrete example [65] , consider a study about smokers. An adversary may still learn information from the trained model, e.g., a differentially private query could show that smoking correlates to lung cancer, yet it is impossible to deduce whether or not a specific person is involved in the study. This is known as general information. It remains possible to deduce that an individual smoker is likely to have lung cancer, but this deduction is not due to her/his presence in the study. DP does protect an individual's private information. The power of DP is that deductions about an individual cannot be influenced by the fact that the person did or did not participate in the study [65] . DP seeks to create a randomized machine, characterized by the hyperparamemers ε and δ, which gives roughly the same output for two similar datasets. In the context of ML, the output here is the trained model. This means an adversary cannot deduce the dataset from the output even with auxiliary information or infinite computing resources. Figure 3 illustrates the concept of DP by comparing the output between two datasets, where one X opts out of the dataset. Changing the input means the output could be very different, but DP ensures that the outputs only differ by, at most, ε. In other words, DP combats extraction attacks by having the output be just as likely produced from a model with or without a given training point [66] . In DP, we are interested in mechanisms M, which are randomized algorithms. Suppose is the number of elements of type i ∈ X . Additionally, there is a 1 -norm defined, such that x − y 1 ≤ 1 represents the fact that x and y are neighboring databases, i.e., they differ by up to one record [65] . Rigorously, the definition of DP is [65] a randomized algorithm M with a domain N |X | is (ε, δ)-differentially private for all S ⊆ range(M) and for all x, y ∈ N |X | , such that x − y 1 ≤ 1: where • M = the randomized algorithm. • N |X | = the set of records and the union of the input and label sets in ML context. • S = output randomized algorithm; some subset of all possible model configurations or parameters in ML context. • x = set of records used for model training. • y = another set of records for model training, neighboring x. • ε = privacy loss for the randomized algorithm. • δ = cutoff on DP, the percentage chance that the model does not preserve privacy. (ε, δ)-DP is a relaxation of ε-DP because there is a chance δ that the privacy is broken. DP gives the worse-case scenario privacy loss, so a smaller epsilon does not necessarily mean the privacy is better. However, the additional noise typically means that accuracy is worse. An important characteristic in determining the effectiveness of a differentially private algorithm is the privacy loss. Privacy loss is defined for a given observation ξ ∈ range(M), which quantifies the likeness of observing ξ from M(x) versus M(y) [66] . Combining these two equations shows that a (ε, δ)-differentially private algorithm has a privacy budget of ε. For ML, we can interpret the randomized algorithm M : A → B as a training algorithm with a training set x ∈ A, which produces a model b ∈ B [39, 67] . The definition of DP implies that two training sets, which only differ by the omission of a record, should be equally likely to output a given model, i.e., the set of parameters completely describing the model. The most basic technique to ensure DP is the Gaussian Mechanism as defined in [39, 65, 68] . Every deterministic function f (d) has a defined sensitivity given that d, d are adjacent databases. Then, the Gaussian algorithm is (ε, δ)-differentially private for some noise multiplier σ, such that: There is an infinite number of pairs (ε, δ), which can be defined for a given noise multiplier σ, although usually, as in [67] , δ will be defined as a constant. Likewise, for ML, the most important techniques for creating DP are to add Gaussian noise, as well as to clip the loss gradients [39] . The gradient clip reduces the effect any single data entry can have on the model training, making membership inference difficult. The hyperparameters associated with these operations are the noise multiplier, σ, and a cutoff for the 2 norm, S [39, 67] . After calculating the gradients, if the batch gradient has an 2 norm greater than the cutoff, it is scaled down to have a norm equal to the cutoff. After clipping the gradient, the gradient for the mini-batch has Gaussian noise added with a standard deviation equal to the 2 norm cutoff multiplied by the noise factor, σ. As in Equation 13 , a relationship between ε, δ, σ, andS exists, but its calculation is beyond the scope of this review. More information about the privacy loss calculator can be found in A 1. This modification to the optimizer algorithm can be applied to any ML algorithm (SGD, Adam, RMSprop, etc.). The DP-SGD algorithm is based on the techniques from [39, 68] . Details specific to the software package used in this study to implement DP, PyVacy, are available in A 2. In this work, we propose a hybrid quantum-classical framework interfacing the differentially private classical optimization algorithms with VQC-based QML algorithms. In a hybrid quantum-classical model architecture, the quantum circuits are used to generate the output, mostly in the form of quantum measurement. The measured expectation values then can be used to evaluate the loss function on a classical computer, which then will be used to evaluate the model's performance and adjust the circuit parameters. The updated circuit parameters are then fed back to the quantum computer. This iterative process gradually trains the quantum circuit to achieve the desired results. The DP training in such a hybrid quantum architecture exists in the gradient calculation process, which is on the classical computer. Figure 5 presents the proposed scheme. f(x; θ) ∇ θ f(x; θ) ∇ DP θ f(x; θ) θ A quantum circuit operates on the quantum state. To make QML useful, the first step is to encode the classical data into a quantum state. Amplitude encoding is a technique to encode the classical vector (α 0 · · · α 2 n −1 ) into an n-qubit quantum state |Ψ = α 0 |00 · · · 0 + · · · + α 2 n −1 |11 · · · 1 . The advantage of using this encoding method is that it is possible to significantly reduce the number of qubits and potentially the number of parameters of the quantum circuit. An N -dimensional input vector would require only log 2 N qubits to encode. Refer to [69, 70] for details regarding this encoding procedure. In variational encoding, the input values are used as the quantum rotation angles. A single-qubit gate with rotation along the j-axis by angle α is given by: where I is the identity matrix and σ j is the Pauli matrix with j = x, y, z. In this work, given a vector input x N with N dimensions, we rotate each qubit by R i (x), i ∈ [0, N ): Each single-qubit state is initialized by rotations in the y-axis then in the z-axis. This allows our inputs, x ∈ X, to be encoded into a quantum state of N qubits. Figure 1 depicts this particular encoding scheme. For a detailed review of different quantum encoding schemes, refer to [69] . Modern DL practices heavily depend on gradient-based optimization methods. Classically, the gradients of DL models are calculated using backpropagation methods [71] . In QML, the corresponding method is the parameter-shift rule, which can calculate the analytical gradients of quantum models [11, 54] . For parameter-shift rule, knowledge of certain observables are given. A VQC's output can be modeled as a function of its parameters f (x; θ) with parameters θ. Then, in most cases, the partial derivative of the VQC, ∇ θ f (x; θ), can be evaluated with the same quantum circuit only with the parameters shifted [11] . We illustrate the procedure as follows: consider a quantum circuit with a parameter θ, and the output can be modeled as the expectation of some observable, e.g., B for some prepared state |ψ = U (θ)U 0 (x) |0 or f (x; θ) = x| M θ (B) |x , It can be shown that a finite parameter, s, exists, such that the Equation 17b stands [11] . This implies that the quantum circuit can be shifted to allow for a calculation of the quantum gradient with the same circuit. Now that DP and our VQC architecture are introduced, we unveil our differentially private optimization algorithm-the first of its kind to ensure privacy-preserving QML. Our differentially private optimization framework starts by calculating the quantum gradient using the parameter shift rule. Next, we apply Gaussian noise and clipping mechanisms to this gradient, ∇ θ f (x; θ). The differentially private gradient, ∇ DP θ f (x; θ), now is used in the parameter update step instead of the non-private gradient. This parameter update rule can be SGD, adaptive momentum, or RMSprop. In this study, we solely use RMSprop to update parameters. where M θ (B) is defined in Equation 17a and S, σ are the hyperparameters implicitly defining the level of privacy (ε, δ). This novel framework seamlessly incorporates privacy-preserving algorithms into the training of a VQC, ensuring (ε, δ)-differential privacy. In this work, we choose the standard classification task to demonstrate the proof-of-concept result. However, the proposed framework is rather generic and can be applied to any hybrid quantum-classical ML scenarios. To demonstrate the hypothesized quantum advantage, this study compares differentially private VQCs (DP-VQCs) to non-private VQCs, as well as private and non-private neural networks. We also illustrate the efficacy of our differentially private QML framework. Two different types of classifications will be investigated as benchmarks: 1) labeling points in a 2D plane and 2) a binary classification from an MNIST dataset, differentiating between the '0' and '1' digits. The 2D datasets are standard benchmarks from scikit-learn [72] that are useful in QML because the inputs are low dimensional, thus easy to simulate on classical computers [46] . Meanwhile, the MNIST dataset is used to study the performance of the proposed model with larger dimensional inputs. We implement the model with several open-source software packages. The high-level quantum algorithms are implemented with PennyLane [73] . The quantum simulation backend is Qualacs [74] , which is a high-performance choice when the number of qubits is large. The hybrid quantum-classical model is built with the PyTorch interface [75] . For differentially private optimization, we employ the PyVacy package [76] . An ε is calculated from the DP hyperparameters, S, σ. Because all tasks are classifications, cross-entropy is used as the loss function for all training. According to [65, 66] , the probability of breaking ε-DP should be δ ∼ O(1/n) for n samples. A δ larger than 1/n always will be able to satisfy DP simply by releasing nδ complete records. Therefore, ε is determined by hyperparameter choice, and δ is set to be 10 −5 for the entire study. As part of the investigation into differentially private QML, classical and quantum classifiers are compared. For both the MNIST and 2D classifiers, the quantum circuit has two modules that contain the parameters for the unitary transforms comprising the two quantum subcircuits. Exp LR Mom. Batch 2 Clip Noise Mult. # of Iter. δ non-private 0.05 0.5 32 n/a n/a n/a n/a DP 0.05 0.5 32 1.0 varies 5 10 −5 FIG. 6: First quantum circuit block for 2D classification. The single-qubit gates R y (arctan(x i )) and R z (arctan(x 2 i )) represent rotations along the y-axis and z-axis by the given angle arctan(x i ) and arctan(x 2 i ), respectively. The state is prepared with variational encoding. The dashed box denotes one layer of a quantum subcircuit that is repeated twice. At the end of this circuit, two qubits are measured, and the Z expectation values are calculated. The output from this circuit is a 2D vector. Three datasets of 2D classification from scikit-learn are considered. The generated datasets are divided into training, validating, and testing sets with 60%, 20%, and 20% proportions, respectively. Different datasets are used because the decision boundary between the two classes is increasingly nonlinear and more difficult to classify. Thus, they make good benchmarks for DP training. The leftmost plots of Figure 11 display the input sets, which are named "blobs," "moons," and "circles" based on the shapes they form. The more transparent points are those not part of the training, but instead used for testing the model's FIG. 7: Second quantum circuit block for 2D classification. The parameters labeled R y (arctan(x i )) and R y (arctan(x 2 i )) are for state preparation. x 1 and x 2 are the outputs of the first circuit block. The dashed box denotes one block of a quantum circuit that is repeated twice. At the end of this circuit, two qubits are measured, and the Z expectation values are calculated. The output from this circuit is a 2D vector. In the context of cross-entropy loss, the outputs will be interpreted as the probability that the 2D point belongs to class one or two, respectively. As a baseline for the study, Figure (1.628, 10 −5 )-DP than without privacy. As detailed in Table II , the classical and quantum classifiers are almost equally successful for the blobs and moons sets. On the other hand, Figures 10 and 9 demonstrate that DP-VQC affords superior performance for the circles set as the quantum classifier's accuracy is between 13% and 17% higher than the DP-neural network. The last two columns of Figure11 depict the decision boundary and accuracy of privacy-preserving (0.681, 10 −5 )-differentially private classical neural networks and VQCs. The comparison of Figure8 and Figure9 demonstrates that the neural network's efficiency under DP training differs for different datasets. For the moons input set, the accuracy degradation from DP is somewhat significant at 10%. Yet with the circles set, the accuracy decreases by 40%, and the final loss is double that of the non-private loss. Figures 9 and 10 illustrate that the VQC converges faster than the classical classifier, implying a potential quantum advantage over a classical neural network. The MNIST classification task is prepared similarly to the 2D classification problem. Because of the computational complexity of simulating large quantum systems, the problem is reduced to a binary classification of distinguishing the handwritten digits of '0' and '1.' The digits are grayscale images with a total of 784 pixels. The variational quantum classifier uses amplitude loading (described in sectionIII A 1) to compress the number of inputs to fit within 10 qubits. Therefore, the 784 inputs are padded with additional zeros to make the inputs 1024 dimensional. Next, amplitude loading transforms the 1024 pixels into a 10-qubit quantum state for operating the variational quantum classifier. The neural network uses the same padded 1024 pixels as an input. The hidden layer has only one node, and the output layer is two nodes. Hence, the classical model has 1029 parameters divided between the two weight matrices and biases. The design for this classical benchmark aims to limit the number of parameters for fair comparison to the quantum model. The quantum classifier has two quantum subcircuits. The first has 10 inputs, eight layers of unitary transforms, and four outputs ( Figure 12) . Each qubit has a tunable unitary transform per layer, so there are 8 × 10 × 3 = 240 parameters in the first subcircuit. The second subcircuit has four inputs, two outputs, and four layers (Figure 13 ), so it has 4 × 4 × 3 = 48 tunable parameters associated with the rotations of quantum bits. Consequently, the VQC has 288 parameters. Importantly, this represents roughly only a quarter (27.99% exactly) of the number of parameters associated with the analogous classical neural network used for the same classification task. The MNIST results are summarized in Table III. Multiple levels of privacy are created by iterating the noise multiplier from 1.0 to 5.0. The privacy budget for such noise is between 1.73 to 0.07, respectively. Figure 14 and Table III exemplify that the accuracies of both neural networks and VQCs decrease as ε decreases. This emphasizes the trade-off between utility and privacy in differentially private algorithms. The solid points are those used in training, and the transparent ones are part of the testing set. Total accuracy after 30 epochs is displayed on the lower right of each plot. Differentially private data are becoming more critical because larger models have been shown to memorize more data, e.g., language models [37] . One of the latest state-of-the-art language models, GPT-2, has 1.5 billion parameters and was found to memorize 18 times more information when compared to a 124 million parameters language model. The aforementioned study demonstrates that training data extraction attacks are practical. This necessitates an implementation of a privacy-preserving algorithm, i.e., DP, to curtail memorization and data extraction attacks. This study has presented the implementation and successful roof-of-concept application • R(α 9 , β 9 , γ 9 ) • R(α 10 , β 10 , γ 10 ) of DP to QML models. The application can be extended to a myriad of applications that require privacy-preserving learning and the power advantage stemming from QML. One potential application is facial recognition. These models must train on thousands faces, whose identities are not protected. Therefore, this area would intrinsically benefit from DP, and QML could create even more accurate predictions [17] . QCNN would be another logical application of a private QML algorithm as QCNNs already are being investigated with the MNIST and other benchmarks [61, 62, 77] . Our results show that the private VQC distinguishes between the '0' and '1' digits with an accuracy exceeding 90%. As such, it |0 R y (arctan(x 1 )) R z (arctan(x 2 1 )) • R(α 1 , β 1 , γ 1 ) is expected that a privacy-preserving framework would benefit these application scenarios, including QCNN. With recent QML developments impacting a spectrum of applications, such as speech recognition [34] , quantum recurrent neural network (QRNN) and quantum LSTM for se-quential learning [21, 33, 63] , and even certain emerging applications in medical imaging [78] , we expect the framework described by this work would be of benefit to these new scenarios as well. An important point to consider is the limitation of extending these results to real-world quantum devices. High-dimensional input, such as MNIST, takes an extremely long time to run on a cloud-based quantum computer. However, it is possible to run. One limitation of this study is that the VQCs are simulated with noise-free quantum computers. A future study could investigate the results of running privacy-preserving quantum optimization on a noisy simulator or cloud-based quantum computer. This research is seeking to show that a differentially private variational quantum classifier can be trained to identify the decision boundary between two classes. Figure 11 shows that the given hyperparameters achieve nearly perfect classification. After 30 epochs, both the quantum and classical classifiers achieve accuracies greater than 95% for data organized into blobs and concentric circles. On the other hand, the classical network achieves 99% accuracy for the moons classification, while the moons dataset proved to be the most difficult input for the quantum classifier to classify, achieving merely 86% accuracy. It may be conjectured that the VQC had difficulties in learning the highly convex decision boundary necessary for the moons input set. In spite of that, the VQC generally trains just as well as a classical neural network with only 66% of the total parameters. While DP training usually causes models to fail to capture the long tail of a data distribution, the DP-QML training is just as successful as the non-private algorithm for the blobs and moons datasets, where only a modest accuracy penalty occurs. While the accuracy of the private training for the circles classification is much lower than its non-private counterpart, the DP-VQC still is much more successful at the task than the classical differentially private neural network. Our study demonstrates that a quantum advantage can offset the usual compromise between privacy and accuracy as seen in other DP applications [39, 67] . The MNIST binary classification problem creates an even more compelling case for the QML algorithm being advantageous compared to a classical ML algorithm. FigureIV demonstrates that a privacy-preserving variational quantum classifier can learn to distinguish be-tween the handwritten digits '0' and '1' from the MNIST dataset to an accuracy of nearly 100%. The same figure shows that a classical neural network also can accomplish the task. The quantum advantage arises because the quantum network has only 288 parameters compared to the 1029 parameters characterizing the classical neural network. Furthermore, the differentially private VQC attains better accuracy than the classical neural network for εs between 0.4 and 1.4 (shown in Table III ). This range of ε is sufficient, where differentially private techniques attain good privacy as defined in [39] . This work mainly focuses on the numerical demonstration of potential quantum advantages, leaving the theoretical investigation for future work. Overall, the QML algorithm attains the same accuracy in the MNIST classification task as the classical ML algorithm with only 28% of the number of parameters, making it more efficient than an ML algorithm. In this work, a QML algorithm in a differentially private framework is developed, and the quantum advantage is maintained when the ML algorithm is improved to preserve privacy. This research also shows that VQCs maintain their quantum advantage under DP in the classification of the handwritten digits '0' and '1' and 2D nonlinear classifications with careful selection of hyperparameters. This novel framework combines differentially private optimization with QML. Including DP in the algorithm ensures privacy-preserving learning. We also demonstrate a capacity for high-fidelity privacy and high accuracy in variational quantum classifiers with two different benchmarks. Notably, we show the superior performance in terms of convergence of differentially private QML over classical DP-ML. These results indicate the potential benefits quantum computing will bring to privacy-preserving data analytics. The package used in this research, PyVacy, implements a privacy loss calculation based on the TensorFlow privacy calculator [76] . First, the Renyi differential privacy (RDP) epsilon and order are calculated. Then, the RDP loss is determined. Usually, this is more feasible because a composition property holds for (α, ε) RDP (shown in Proposition 1 of [79] ). Thus, the Renyi divergences can be exactly added together given the same order. At the end of the calculation, the (α, ε) RDP is converted into the (ε, δ)-differential privacy loss. (α, ε)-RDP is defined by the Renyi divergence [79] : The weights for the classical neural networks are initialized uniformly from minus to plus 1 √ 7 and 1 √ 2 for the first and second matrices, respectively. For the variational quantum classifiers, the angle parameters of the unitary transforms are sampled from a normalized distribution scaled by 0.01. A micro-batch of size m and mini-batch of size n are defined, where n/m micro-batches make up a mini-batch. The loss and its gradient are calculated for each micro-batch, and a total norm of gradients is calculated. The micro-batch gradients are scaled and summed together with additional noise. This accumulated gradient now is the effective gradient for the mini-batch. This creates DP by limiting the effect each training point has on the batch gradient. The Python package PyVacy is used to add DP to the RMSprop optimizer employed in this study. The DP optimizer crucially overrides the step() method and defines a microbatch step(). Normally the loss and its gradients are calculated for each mini-batch. In PyVacy, each mini-batch is split into micro-batches. In each micro-batch, the loss and loss gradient are calculated. Then, the micro-batch step function is called, clipping the gradients. After all the micro-batches, the step function is called to add Gaussian noise and update the parameters according to these altered gradients [76] . The mini-batch step takes the parameter gradients for each micro-batch and calculates the effective mini-batch gradients. First, the total norm of the parameters gradients, N , is calculated. Then, the scaled micro-batch gradients are added to a new parameter, called the accumulated gradient. The gradients are scaled by a coefficient that scales down the gradients to have a total norm equal to the norm cutoff, S, or, if necessary, c = min( S N +1e−6 , 1). The accumulated gradients add the micro-batch gradients together to create a new effective gradient for the mini-batch. This effective gradient is scaled so that the loss gradients, calculated from a given microbatch, do not have too large norms. This creates DP by limiting the effect each training point has on the batch gradient [39] . In the overridden step method, the accumulated gradients have Gaussian noise added. The Gaussian noise is proportional to the norm cutoff and noise multiplier, S and z, respectively. The accumulated gradient then is scaled by the ratio of micro-batch to mini-batch sizes, and the micro-batch size usually is set to be 1. This has the effect of using the accumulated gradients in place of the original parameter gradients in the step update rule. [1] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014. [4] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Advances in neural information processing systems, pp. 3104-3112, 2014. [5] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, Mastering the game of Go with deep neural networks and tree search Quantum computing in the nisq era and beyond Quantum supremacy using a programmable superconducting processor A variational eigenvalue solver on a photonic quantum processor Variational quantum algorithms Noisy intermediate-scale quantum (nisq) algorithms Quantum circuit learning Circuit-centric quantum classifiers Classification with quantum neural networks on near term processors Parameterized quantum circuits as machine learning models Transfer learning in hybrid classical-quantum neural networks Classification with quantum machine learning: A survey Towards building a facial identification system using quantum machine learning techniques Quantum unsupervised and supervised learning on superconducting processors Hybrid quantum-classical classifier based on tensor network and variational quantum circuit A hybrid system for learning classical data in quantum states Quantum long short-term memory Solving nonlinear differential equations with differentiable quantum circuits Quantum generative adversarial networks Qugan: A generative adversarial network through quantum states Quantum generative adversarial networks for learning and loading random distributions Quantum generative adversarial network for generating discrete data Quantum semi-supervised generative adversarial network for enhanced data classification Quantum embeddings for machine learning A unified classification framework with quantum metric learning Variational quantum circuits for deep reinforcement learning Reinforcement learning with quantum variational circuit Quantum enhancements for deep reinforcement learning in large spaces Recurrent quantum neural networks Decentralizing feature extraction with quantum convolutional neural network for automatic speech recognition Label-only membership inference attacks The secret sharer: Evaluating and testing unintended memorization in neural networks Extracting training data from large language models Model inversion attacks that exploit confidence information and basic countermeasures Deep learning with differential privacy Artificial intelligence: a modern approach The mnist database of handwritten digits An overview of gradient descent optimization algorithms Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude Adam: A method for stochastic optimization Quantum computation and quantum information Quantum machine learning in feature hilbert spaces Quantum computing Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer A fast quantum mechanical algorithm for database search Quantum computing: An introduction Quantum computers The ibm q experience and qiskit open-source quantum computing software Efficient arbitrary simultaneously entangling gates on a trapped-ion quantum computer Evaluating analytic gradients on quantum hardware A quantum approximate optimization algorithm Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms Entanglement in a quantum annealing processor The expressive power of parameterized quantum circuits The power of quantum neural networks Quantum convolutional neural networks for high energy physics data analysis Hybrid quantum-classical graph convolutional network Hybrid quantumclassical convolutional neural networks Learning temporal data with variational quantum recurrent neural network Quantum reinforcement learning in continuous action space The algorithmic foundations of differential privacy Differential privacy and machine learning: a survey and review Differential privacy has disparate impact on model accuracy A general approach to adding differential privacy to iterative training procedures Information encoding Transformation of quantum states using uniformly controlled rotations Handwritten digit recognition with a back-propagation network Scikit-learn: Machine learning in Python Pennylane: Automatic differentiation of hybrid quantum-classical computations Qulacs: a fast and versatile quantum circuit simulator for research purpose Pytorch: An imperative style, high-performance deep learning library Pyvacy: Privacy algorithms for pytorch Quantum algorithms for deep convolutional neural networks Hybrid quantum convolutional neural networks model for covid-19 prediction using chest x-ray images Renyi differential privacy