key: cord-0133508-h1fw84j8 authors: Zhong, Hao; Bu, Kaifeng title: Privacy-Utility Trade-Off date: 2022-04-26 journal: nan DOI: nan sha: c99dd1811ef42a25c6750aedd9cb4ab85c8db48e doc_id: 133508 cord_uid: h1fw84j8 In this paper, we investigate the privacy-utility trade-off (PUT) problem, which considers the minimal privacy loss at a fixed expense of utility. Several different kinds of privacy in the PUT problem are studied, including differential privacy, approximate differential privacy, maximal information, maximal leakage, Renyi differential privacy, Sibson mutual information and mutual information. The average Hamming distance is used to measure the distortion caused by the privacy mechanism. We consider two scenarios: global privacy and local privacy. In the framework of global privacy framework, the privacy-distortion function is upper-bounded by the privacy loss of a special mechanism, and lower-bounded by the optimal privacy loss with any possible prior input distribution. In the framework of local privacy, we generalize a coloring method for the PUT problem. Advance in digital technologies of data-producing and datausing provides enormous reliable and convenient services in the era of Big Data. Along with the prosperity, security issues and challenges result in increasing privacy concerns. Nowadays, examples include patient data breach, facial recognition data abuse, and surveillance measures without consent during the COVID-19 pandemic. Traditional semantic security for cryptosystems is not always achievable due to the auxiliary information available to adversary. Hence, differential privacy has been introduced by Dwork et al. [5] , [6] to deal with this problem. The original differential privacy is a typical divergence-based privacy since it is exactly the max-divergence between the probability distributions on neighboring databases. This new measure requires an upper bound on the worstcase change, which may take high expense of utility. As a result, several relaxations of differential privacy have been studied recently. We categorize these existing measures into two main classes, divergence-based privacy and mutual information-based privacy. Besides, -approximation can be used for particular privacy notion to relax the strict relative shift. The most generalized divergence-based privacy for now was given by [20] , Rényi differential privacy. This -Rényi divergence-based measure keeps track of the privacy cost and allows tighter analysis of mechanism composition [13] , [21] , [22] . All the divergence-based measures consider the worstcase guarantee of privacy, and thus they protect the information of every single individual. The mutual information-based privacy concerns the expected information leakage from the input (plain) database to the output (synthetic or sanitized) database. A straightforward measure is based on Shannon's mutual information which makes use of related results in information theory [4] , [30] . Another measure called max-information was also given by Dwork et al. [9] , which aims at bounding the change in the conditional probability of events relative to their a priori probability. Two most common mutual information-based privacy measures are based on Sibson's [26] and Arimoto's -mutual information [1] , respectively. Plenty of works [10] , [12] , [15] , [19] , [29] , [31] based on the -mutual information were published, concerning data processing and mechanism composition. Notably, the case = ∞ leads to a newly defined privacy definition, maximal leakage, which was first proposed by Issa et al. [16] . As compared to most of mutual-information metrics, it is easier to compute and analyze. In Ref. [7] , Dwork et al. defined approximate ( , )diferential privacy. It is a differential privacy notion with a shift tolerance, which provides more flexibility while designing mechanisms. In another paper by Dwork [9] , she unified this idea with max information. Let and be two adjacent database, i.e., differing in only one individual. It is worth noting that for ( , )-differential privacy, the mechanism is approximate private if and only if there exists a distribution which is statistically -close to ( ) and max-divergently −close to ( ) [8] . The main goal is to find a privacy-preserving mechanism not destroying statistical utility, or to solve the privacy-utility trade-off (PUT) problem. A rate-distortion theory flavored PUT problem is to determine the minimal compression during a valid communication [18] . As an important topic in information theory, rate distortion has been studied and applied in many fields including complexity and learning theory. For example, the information bottleneck (IB) problem which was initiated by Tishby [27] , has attracted lots of attention as it provides an explanation of the success of machine learning [14] , [24] , [25] , [28] . As for PUT problem, global differential privacy-distortion with average Hamming distance was studied in [30] ; local differential privacy-distortion with average Hamming distance was studied in [18] ; local maximal leakagedistortion was studied in [23] ; Arimoto mutual informationdistortion with hard Hamming distance was studied in [19] . In this paper, we study the the minimal privacy loss with given Hamming distortion for seven different privacy notions including differential privacy, approximate differential privacy, maximal information, maximal leakage, Rényi differential privacy, Sibson mutual information and mutual information. Our main contributions in this work are listed as follows: 1) We obtain the lower-and upper-bounds of privacydistortions for the privacy notions mentioned above. Moreover, for the prior input distribution is unknown, we obtain the closed-form expression for differential privacy-and rate-distortion. 2) We generalize the coloring method to solve the PUT problems for local approximate differential privacy-and maximal leakage-distortion. The term "local" means the data server is untrusted and the individual information is calibrated before uploading. 3) If the input space consisting of samples independently and identically distributed chosen, we show the analytical closed form results for differential privacy-, maximal leakage-and rate-distortion, together with the lower-and upper-bounds for the rest of privacy-distortions. The remainder of this paper is organized as follows. In Section II, we introduce the basics notions and concepts. In Section III, we investigate the global privacy distortion, i.e., the privacy-utility trade-off when there is a curator with access to the plain data. In Section IV, we study the local privacy distortion and demonstrate our results by two images. Moreover, we show the privacy-distortion trade-off for parallel composition. Finally, we conclude in Section V. Let X be the input data space, which is a finite set. A randomized mechanism is a conditional distribution | mapping the discrete random variable to an output data ∈ Y. We assume that the data is drawn from some distribution with full support on X. For simplicity, we denote ( ) and | ( | ) as and ( | ) for ∈ X and ∈ Y, respectively. In general, the prior distribution is unknown. Hence, we assume it belongs to some set of probability distribution on X, which is called source set of and discloses the prior knowledge about the input distribution. For example, Kalantari et al. [18] divided the set of source sets into the following three classes. Definition 1 ( [18]). A source set P is classified into three classes as follows: Class I. Source set P whose convex hull Conv(P) includes the uniform distribution . Class II. Source set P not belonging to Class I, with ordered probability values. Without loss of generality, 1 ≥ 2 ≥ · · · ≥ |X | . Class III. Source set P that cannot be classified as Class I or Class II. Note that Class I contains two special cases: (1) it only contains the uniform distribution and (2) it contains all possible probability distributions, which means there is no auxiliary information about input distribution. And Class II contains a special case that it only contains one non-uniform probability distribution. The input space X can represent attributes for a single individual or several individuals. A straight way to estimate the utility of mechanism is to calculate the distortion between and , where smaller distortion corresponds to greater usefulness. This is because smaller distortion implies one can gain more information of the original data from the released one. Let Y be a synthetic database, i.e., it is in the same universe with X. Then we define the distortion as follows. where ( , ) is the Hamming distance between and , i.e., the number of attributes they differ in. Two elements and are called neighbors if ( , ) = 1. In this paper, we focus on the accuracy-persevering mechanisms that do not distort the data above some threshold . where E , denotes the average over the randomness of input and mechanism. Moreover, is (P, )-valid if is ( , )-valid for any ∈ P. The set of all ( , )-valid ((P, )-valid, respectively) mechanisms is denoted by Q ( , ) (Q (P, ), respectively). Due to the convexity, it is directly to see that Q (P, ) = Q ( (P), ). Hence, without loss of generality, we always assume that P is convex. In this paper, we investigate different notions of privacy, which we list as follows. Definition 3. Let be a distribution with full support on X and let represent a non-negative real number. (1) [ -differential privacy, [5] , [6] ] A mechanism is called -differential private if for any ∈ Y, neighboring elements and ∈ X, In this case, we denote the minimal as ( ) for the mechanism . (2) [( , )-differential privacy, [8] ] Given a positive number < 1, a mechanism is called ( , )-differential private if for any ∈ Y, neighboring elements and ∈ X, In this case, we denote the minimal as ( ) for the mechanism . (3) [Maximal information, [9] ] A mechanism has maximal information if for any ∈ Y, ∈ X, In this case, we denote the minimal as ( ) for the mechanism . (4) [Maximal leakage, [16] ] A mechanism has maximal leakage if ∑︁ In this case, we denote the minimal as ( ) for the mechanism . (5) [Rényi differential privacy, [20] ] Given a positive number > 1, a mechanism is -Rényi differential private if for any neighboring elements and ∈ X, the Rényi divergence In this case, we denote the minimal as , ( ) for the mechanism . (6) [ -mutual information privacy, [10] , [26] , [29] ] Given a positive number > 1, a mechanism has maximal -mutual information if the the Sibson mutual information ( ; ( )) ≤ . In this case, we denote the minimal as , ( ) for the mechanism . (7) [Mutual information privacy, [4] , [30] ] A mechanism is called -mutual information private if the mutual information ( ; ( )) ≤ . In this case, we denote the minimal as ( ) for the mechanism . The minimum ★ ( ) above is called the privacy loss of mechanism . We now introduce the privacy-distortion function, which describes the minimal achievable privacy loss under restricted distortion. It is defined as follows. is called the corresponding privacy-distortion function for ★ representing the privacy notions aforementioned. The set of mechanisms achieving the minimum is denoted by Q * ★ (P, ). In particular, if P consists of only a single element , then we write them as * ★ ( , ) and Q * ★ ( , ). Conversely, given restricted privacy cost ≥ 0, we denote the smallest expected distortion achievable by * ★ (P, ) := min A global privacy framework relies on a data server which collects all the data directly from individuals and publish them in a privacy-protected synthetic database (see Fig. 1 ). Theindividual database is drawn from X = {1, 2, · · · , } with respect to the probability distribution in P where is the total number of types the individuals belong to. For fixed , all database in X can be divided into + 1 categories based on their Hamming distance to . Let N ( ) be the set consisting of all elements with attributes different from and 0 ≤ ≤ . Then the size of the set N ( ) is := ( − 1) . To avoid notation representation confusion,˜ * is used to denote the privacy-distortion function of global case. Here, we have the following result which holds for every privacy notions in Definition 3. Lemma 5. Given a source set P, let be a positive integer not greater than , we have the following properties of privacy-distortion function: (1) For any ∈ P,˜ * (P, ) ≥˜ * ( , ). (2) Let be the uniform mechanism, i.e., ( | ) = 1 for any , ∈ X. Then for ( −1) ≤ ≤ 1,˜ * (P, ) = * ( ) = 0. (3) Let be the mechanism given by Wang et al. [30] , defined as , then˜ * (P, ) ≤˜( ). Proof of Lemma 5. By Definition 4, for any mechanism ∈ Q * (P, ), is (P, )-valid. Thus, is ( , )-valid for any ∈ P. Hence,˜ * ( ) =˜ * (P, ) ≥˜ * ( , ). Conversely, for any (P, )-valid mechanism , the definition of privacy distortion function implies˜ * (P, ) ≤˜ * ( ). Besides, it is easy to verify that is (P, )-valid for ( −1) ≤ ≤ 1 and is (P, )-valid for 0 < ≤ 1. This completes the proof. Theorem 6. Let P be a source set such that each probability distribution in P has full support and let * := · max ∈ P min ∈X ( ). Then we have the following results on the privacy-distortion function with respect to the privacy notions in Definition 3. (1) (Differential privacy) (2) (Approximate differential privacy) (3) (Maximal information) where = max ∈X ( ) and = arg max ∈ P min ∈X ( ). In particular,˜ * (P, Proof of Theorem 6. Based on Lemma 5, we have already known Otherwise, for 0 < < ( −1) we consider the upper-and lower-bounds, respectively. (1) First, let us consider the upper bound on the privacydistortion functions. By Lemma 5,˜ * (P, ) is upper-bounded by˜ * ( ). Hence, it suffices to calculate or upper-bound * ( ). For simplicity, we denote ( −1) ( − ) and 1 − by and , respectively. Note that if 0 < < ( − 1)/ then − < < 1 < . Let us start with -type notions, i.e., , (P, ) and˜, (P, ). , by the definition of Sibson -mutual information, we havẽ We now show that if P contains the uniform distribution , then˜, ( ) obtains the maximal value at = . To prove this claim, let us consider the following optimization problem. It is obvious that the feasible region of (5) consists of all the probability distributions. By the convexity of Sibson mutual information (see Lemma 21 in Appendix A), this optimization problem is convex. To find the optimal value, we utilize the Karush-Kuhn-Tucker (KKT) conditions [2] . The KKT conditions for convex problem (5) are as follows. a) For ∈ X, is the Lagrangian defined as follows. satisfies the above KKT conditions. Note that feasible solution of KKT conditions is also optimal for convex problem [2] . Thus˜, 1.3) For˜, let 1 = −1 and 2 = − , then 0 < 1 < / < 2 < and we havẽ ), by the relation between Sibson mutual information and maximal leakage (see Lemma 20 in Appendix A), we havẽ ), by the relation between Sibson mutual information and mutual information (see Lemma 20 in Appendix A), we havẽ (2) Second, let us consider the lower bounds on the privacydistortion functions. For any mechanism in Q (P, ), it is ( , )-valid for any ∈ P. Thus, for Therefore, we have We complete the proof by contradiction. 2.1) For˜, assume there exists a ∈ Q (P, ) such that ( ) < := log( − 1) * (1− −1 )− . Given an arbitrary vector ∈ X, we can always find ∈ N −1 ( ) and ∈ N ( ) for any positive integer no larger than . Since and are neighboring, we have 2) The -DP can be regarded as ( , 0)-DP. Thus, for˜, we have˜ * (P, ) ≥ log ( −1) ( * − ) . 2.3) For˜, assume there exists a ∈ Q (P, ) such that ( ) < := log (1− * ), then for any , ∈ X and ∈ P, 2.4) For˜, assume there exists a ∈ Q (P, ) such that For any vector ∈ X, we can always find ∈ N −1 ( ) and ∈ N ( ) for any positive integer no larger than . Since and are two neighbors, then by the probability preservation property of Rényi divergence (see Lemma 22 in Appendix A), we have By Hölder inequality, However, the inequality (6) implies that Contradiction! Thus * , * , (P, ) ≥ − − 1 log + − 1 log 1 − / 1 2 (1 − * ) ( + 1) + * . We restrict the input distribution to = arg max ∈ P min ∈X ( ). By Lemma 5,˜ * (P, ) ≥˜ * ( , ). To evaluate˜ * ( , ), let us consider the following convex optimization problem. All the conditions in (7) imply feasible is ( , )-valid and thus the optimal value is˜ * ( , ). It is difficult to solve (7) directly. Instead, let us consider the following relaxed optimization problem. (7) implies ( 1) in (8) . Thus, the optimal value of (8) is not greater than˜ * ( , ). By the convexity of mutual information (see Lemma 21 in Appendix A), optimization problem (8) It is seen in Theorem 6 that the difference between lowerand upper-bounds is decreasing in * . In fact, the difference vanishes for some special cases. Assume that the source set P in Theorem 6 contains the uniform distribution . Then arg max ∈ P min ∈X ( ) = , which implies * = 1 and = − . Hence, we have the following results on privacydistortion functions. (2) (Approximate differential privacy) In particular,˜ * (P, ) = 0 if ( −1) ≤ ≤ 1 for all notions aforementioned. In this section, we assume |X| = > 0 and consider the local privacy framework, i.e., the privacy mechanism is applied before data uploading (see Fig. 2 ). The distance metric ( , ) is actually the discrete distance, i.e., Hence, for any ≠ , and are neighborhood. Given 0 < ≤ 1, belongs to Q ( , ) if and only if =1 ( | ) ≥ 1 − . We claim that a good choice of output space is the synthetic one i.e., Y = X (See Lemma 24 in Appendix C). Let us start with the source set of Class I, for which we take = 1 into Theorem 7 and get the following results directly. (2) (Approximate differential privacy) * (P, ) = log (4) (Maximal leakage) * (P, ) = log (1 − ), 0 < < −1 , 0, −1 ≤ ≤ 1; (5) (Rényi differential privacy) By taking = 1 into Theorem 6, one can obtain the properties of privacy-distortion functions for any source set P. However, if P does not contain the uniform distribution then Theorem 6 fails to give the lower bounds sometimes. Thus, in this subsection, we introduce a different method to address the PUT problem for the source set of Class II, and we mainly study the differential privacy-, approximate differential privacy-and maximal leakage-distortion trade-off. First, we consider the scenario in which the distribution on X is given. Before proving Theorem 9, we need introduce the following useful lemmas. On the other hand, if ≥ (1 − ) ( −1) , we define mechanism as follows. Thus, * ( , ) = 0. Lemma 11. For 0 < < 1, if * ( , ) > 0, then there exists a mechanism ∈ Q * ( , ) such that (1) ≤ ( | ) ≤ ( − 1| − 1) ≤ · · · ≤ (1|1) ≤ 1; (2) ( | ) = * ( , ) ( |1) + for 2 ≤ ≤ . To prove Lemma 11, we improve the coloring method introduced by Kalantari et al. [18] . The details of the proof are provided in Appendix B, C and D. Proof of Theorem 9. For ≥ (1 − ) ( −1) , Lemma 10 proves the statement. For 0 < < (1 − ) ( −1) , let * be a mechanism satisfying the conditions in Lemma 11. Then . Let us consider the following optimization problem, where 0 = 1 and +1 = . Note that { * ( | )} 1≤ ≤ is a feasible solution of (9), and hence * ( , ) ≥ 1 . Conversely, let { * 1 , * 2 , · · · , * } be the optimal solution of (9). We define the following mechanism. For any ∈ [1, ], =1 ( | ) = 1, that is, is a conditional probability. The first condition in (9) implies that is within Q ( , ). The other two imply that Thus, we have 1 = * ( , ). Moreover, condition ( 1) in (9) combined with the assumption is non-increasing in yields that The optimal values of optimization problem (9) and (10) Then problem (10) can be rewritten as the following optimization problem. . The dual of linear program (11) is the minimization linear program as below.˜3 By strong duality theorem, we have 3 ( 1 ) =˜3 ( 1 ). Fix 2 ≥ 0, we define the following linear program. It is easy to calculate the coordinate of the intersection point, i.e., We claim that for any ∈ [1, − 1], 1 > 2 > · · · > −1, > +1, > · · · > −1, > 0, 0 < 1 < 2 < · · · < −1, < +1, < · · · < −1, . (14) Therefore, the feasible region of (13) is unbounded, and corner points are located from top left to bottom right as follows. We now prove (14) . By the definition of ( ) , ( ) is increasing in , and hence is positive. Thus, is decreasing in . On the other hand, line with negative slope − ( ) passes point for any 1 ≤ ≤ , which implies is increasing in . The slope of the objective function in (13) determines which corner point will be reached last. Hence, we divide . Thus, −1, is optimal solution of (13) and for 0 ≤ 2 < 1. Let Thus, In a summary, we have that if 0 < < (1 − ) (1) , then * ( , ) = log Next, let us consider the case when the input distribution belongs to some set P of Class II. Proposition 12. The approximate differential privacydistortion with P of Class II is * (P, ) = min is the optimal value of PV. Proof. Recall the optimization problem (13), we can also obtain the approximate differential-privacy function as the solution to the following optimization problem. min P , 3 where ( ) = = − +1 for the corresponding in P. Thus, we have the result. By taking = 0, we get the following result of differential privacy-distortion function for given input distribution, which is stronger than that in [18] , where it only provides the optimization problem instead of the solution. Theorem 13. Given distribution with 1 ≥ 2 ≥ · · · ≥ , we have * ( , ) = max 0, min . A recent paper [23] studied the privacy-utility trade-off by taking maximal leakage as the privacy measure. We regain the results by coloring argument which we put in Appendix E. if ( −1) < ≤ ( ) and 1 ≤ ≤ . An equal expression of Theorem 14 showing smallest distortion with fixed leakage was given by Saeidian et al. [23] . Assume that the prior belongs to some set P of Class II. Recall the definition of * (P, ) in Definition 4, we have a straightforward result as follows. where * ( , ) = ( − ) + +1 ( − ) if log ≤ < log( + 1) and 1 ≤ ≤ . In this subsection, we demonstrate the results in local privacy framework by figures. Our results are first applied to binary datasets. In particular, each attribute associated with individuals is the answer of some binary classification problem, that is a yes/no question or a setting with 0-1 outcome. In general, X = {0, 1} and = 2. For local privacy mechanism, we compare the privacy-distortion functions with input distribution unknown by taking = 2 into Theorem 8. It is shown in Fig. 3 that high accuracy requirement (i.e., < 0.5) leads to the phenomena-the privacy costs for divergence-based measures (i.e., , and , ) are numerically higher than the ones for mutual information (i.e., Fig. 3 . Privacy-Distortion trade-off curves for differential privacy ( ( )), 0.1-approximate differential privacy ( 0.1 ( )), maximal leakage ( ( )), and mutual information ( ( )); Privacy-Distortion trade-off ranges for Rényi differential privacy ( 2, ( )) and Sibson mutual information ( 2, ( )) with = 2. Privacy-Distortion trade-off curves for differential privacy ( ( , )), 0.1-approximate differential privacy ( 0.1 ( , )), and maximal leakage ( ( , )). , , and ). This is reasonable since divergence-based privacy pays more attention to individuals than the collectives. Notably, the randomized response mechanism is always an optimal choice. We now turn to the privacy-distortion functions with the knowledge of prior distribution. To demonstrate the power of Theorem 9, we set a certain example with = 4 and prior distribution = ( 1 , 2 , 3 , 4 ) = (0.4, 0.3, 0.2, 0.1). As shown in Fig. 4 , maximal leakage is the most relaxed privacy notion among these three popular measures. Notably, the curves for differential privacy and approximate differential privacy jump to zero when = 0.6 and 0.54, respectively. This is because the special mechanism mentioned in the proof of Lemma 10 is ( , )-valid if and only if ≥ (1 − ) (1 − 1 ). To create a database consisting of data with local privacy guarantees, the data server usually collects the i.i.d. samples from all possible individual uploads (see Fig. 5 ). In other words, the input space X contains the elements which are independently and identically distributed samples from X = {1, 2, · · · , } with full-support distribution , i.e., Then the global privacy mechanisms on X by on X is given as follows. Due to the independence of sampling, we have Lemma 16. Letˆ * (P, ) be the privacy-distortion function over X . Then (1)ˆ * (P, ) = * (P, / ) for differential privacy and Rényi differential privacy. (2)ˆ * (P, ) = * (P, / ) for maximal information, maximal leakage, Sibson mutual information and mutual information. Proof of Lemma 16. By the relations among these privacy notions (See Lemma 20 in Appendix A), it suffices to prove the cases for maximal information and the -type notions. Let be a mechanism from X to Y . 1)(Maximal information) 2)(Rényi differential privacy) 3)(Sibson mutual information) , the results above complete the proof. Based on the additivity of composition and Theorem 8 , we get the following results directly. (2) (Approximate differential privacy) Proof of (2) in Theorem 17. For given ∈ (0, 1), where * = max , ( | ). Thus, * −1 (P, ) ≤ˆ * (P, ) ≤ * (P, ). Theorem 18. let be a given prior distribution such that (1) (Approximate differential privacy) * ( , ) = max 0, min if ( −1) < ≤ ( ) and 1 ≤ ≤ . This paper investigates the privacy-utility trade-off problem for seven popular privacy measures in two different scenarios: global and local privacy. For both global and local privacy, our results provide some upper and lower bounds on privacydistortion function, which reveals the relationships between privacy and distortion in a quantitative way. In particular, with known prior distributions, we obtain the analytical closed form of privacy-distortion function for approximate differential privacy. To the best of our knowledge, this is the first result on the privacy-uitility tradeoff for approximate differential privacy. In addition to the privacy measures we used here, it is still possible to consider other privacy measures, such as the privacy measure defined by Arimoto -mutual information and -mutual information. Moreover, the application privacyutility trade-offs in machine learning is an important problem, which we leave it for further study. VI. ACKNOWLEDGMENTS K. Bu thanks the support from ARO Grant W911NF-19-1-0302 and the ARO MURI Grant W911NF-20-1-0082. To clearly show the properties of different notions for any given mechanism , we turn to the following definition equivalent to Definition 3. Let be a fixed distribution with full support on X. (1) (Maximal divergence) The differential privacy loss of is defined as . (2) (Approximate maximal divergence) Given a postive real number < 1, the approximate differential privacy loss of is defined as (3) (Maximal information) The maximal information between and ( ) is denoted by Thus, ( ) ≤ max{ ( 1 ), ( 2 )}. Next, let ( ) ≤ for = 1 and 2. Then for any ∈ Y and ∈ X, we have Thus, }. This completes the proof. The following two lemmas reveal the probability preservation properties of the -type privacy measurements. Let be a prior distribution such that 1 ≥ 2 ≥ · · · ≥ . For any mechanism , we write it as the matrix form below. . To reinforce the understanding of prior distribution, we define the accumulation function (0) = 0 ( ) := = − +1 for 1 ≤ ≤ . By Lemma 10 and 33, there exists mechanism costing zero approximate privacy loss as well as mechanism causing no maximal leakage under specific assumption sacrificing accuracy. In this appendix, we assume that privacy loss is inevitable, i.e., the threshold is at most (1 − ) ( −1) for approximate differential privacy and ( −1) for maximal leakage. For any mechanism with ( ) > 0, we say two distinct entries ( | ) and ( | ) of the same column consist of a critical pair if ( | ) = ( ) ( | ) + . We mow color every single entry of following the rules below. • Color the element black if it is the bigger one in the critical pair. • Color the element red if it is the smaller one in the critical pair. • Color the element white, otherwise. Notice that if ( | ) = 0 then for the th column, every element is not greater than with equality if the element is black. Similarly, for any mechanism with ( ) > 0, we color every single entry of following the rules below. • Color the element black if it is the biggest among the entries of the same column. • Color the element red if it is the smallest one among the entries of the same column. • Color the element white, otherwise. Lemma 24. (Size of the output space) There exists a mechanism ∈ Q * ★ (P, ) such that | (X)| ≤ for the notions aforementioned except for the approximate privacy with positive . To prove Lemma 24, we introduce the transformation on mechanism given by Kalantari et al [18] . Proof of Lemma 24. For any ∈ Q (P, ), assume that | ( )| = > , that is ( | ) > 0 for ≤ and ( | ) = 0 for > . We create a mechanism˜defined as follows. Note that |˜( )| = − 1, and for any ∈ P, E ,˜[ ( , )] = (1 −˜( | )) ≤ E , [ ( , )] where the equality holds if and only if > + 1. It is followed by the fact˜belongs to Q (P, ). Hence, it is sufficient to prove that (˜) ≤ ( ). for 0 ≤ , ≤ 1. Then is convex since ∇ 2 is positive semi-definite. Therefore, we have for 1 ≤ ≠ ≤ , Hence, , (˜) ≤ , ( ). 5) The function 1 −1 exp −1 , ( ) is convex (refer to Lemma 21 in Appendix A), and thus so is the situation for fixed . Hence, we have In this subsection, we consider the properties of ∈ T 1 T 2 (Q * ( , )). By Lemma 25, 26 and 27, satisfies the following properties. 1) (1|1) ≥ (2|2) ≥ · · · ≥ ( | ). 2) All the off-diagonal elements of a row in have the same color. 3) If a diagonal element is not black, then the off-diagonal elements of the same the row are red. There are more traits for the color pattern of . Contradiction. The proof for no row with all red elements is similar. By Property 28, there exists no black off-diagonal element. Moreover, if a diagonal element is not black, then it is white. Information measures and capacity of order for discrete memoryless channels Convex optimization Elements of Information Theory Differential privacy as a mutual information constraint Calibrating noise to sensitivity in private data analysis Differential privacy, 33rd International Colloquium: Automata, Languages and Programming Our Data, Ourselves: Privacy Via Distributed Noise Generation The algorithmic foundations of differential privacy Generalization in adaptive data analysis and holdout reuse Robust generalization via -mutual information Robust generalization via -mutual information On conditional Sibson's -mutual information Individual privacy accounting via a Rényi filter The Information Bottleneck Problem and Its Applications in Machine Learning Convexity/concavity of renyi entropy and -mutual information An operational measure of information leakage and Sudeep Kamath An operational approach to information leakage Robust privacy-utility tradeoffs under differential privacy and hamming distortion Lalitha Sankar, and Flavio du Pin Calmon, Tunable measures for information leakage and applications to privacyutility tradeoffs Rényi differential privacy Semi-supervised knowledge transfer for deep learning from private training data Scalable private learning with PATE Optimal Maximal Leakage-Distortion Tradeoff On the information bottleneck theory of deep learning Opening the black box of deep neural networks via information Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete The information bottleneck method Deep learning and the information bottleneck principle Sergio Verdú, -mutual information On the relation between identifiability, differential privacy and mutual-information privacy Strong asymptotic composition theorems for Sibson mutual information Combing all properties of in Appendix D, one can obtain the following lemma If * ( , ) > 0 then there exists a mechanism ∈ Q * ( , ) such that max ( | ) = ( | ) for 1 ≤ ≤ (3) The privacy losses ( ), ( ), , ( ) and ( ) provide global privacy guarantees.and upper-bounded by ( ). (9) , ( ) is non-decreasing in , lower-bounded by ( ) and upper-bounded by ( ).The convexity of privacy metrics has long been studied. As we see in Section III, the following results spark the ideas in development of optimal mechanisms. Lemma 21 (Convexity of privacy). For fixed , in | , (1) ( ), ( ) and ( ) are quasi-convex;Proof of Lemma 21. The results of (2)-(5) are well-known and one may refer to the publications mentioned above. We now prove (1). Let 1 and 2 be ( 1 , ) and ( 2 , )-differential private mechanisms respectively for some ∈ [0, 1). Then for any ∈ Y and neighbors , in X, ( | ) ≤ ( | ) + ,followed by , (˜) ≤ , ( ). 6) As tends to 1 in 5), we have (˜) ≤ ( ).This explains why we choose synthetic version as the released database in local privacy framework. Next, inspired by the construction of transformation above, we introduce several interesting transformers which are useful in coloring scheme. For any mechanism with ( (1)| (1)) ≥ ( (2)| (2)) ≥ · · · ≥ ( ( )| ( )) for some permutation over {1, 2, · · · , }. We define the transformer T 1 over as follows.Lemma 25. For approximate differential privacy and maximal leakage, if mechanism satisfying ( ) > 0, then T 1 is still a well-defined mechanism such that 1)We skip the proof for simplicity. It is worth noting that if we color T 1 following the rules aforementioned, then T 1 ( | ) and ( ( )| ( )) have the same color for 1 ≤ , ≤ .Lemma 26. There exists ∈ Q * ( , ) for approximate differential privacy or maximal leakage, such that all the offdiagonal elements in the same row have the same color.Proof. We denote the number of black (red, white, respectively) off-diagonal elements in the th row by ( ) ( ( ) , ( ) , respectively).where Δ is small enough such that 0 ≤ ≤ 1 and does not increase. Moreover, is also a ( , )-valid mechanism.2) If ( ) , ( ) > 0, then we definewhere Δ is small enough such that 0 ≤ ≤ 1 and does not increase. Moreover, is also a ( , )-valid mechanism.3) If ( ) , ( ) > 0, then we definewhere Δ is small enough such that 0 ≤ ≤ 1 and does not increase. Moreover, is also a ( , )-valid mechanism.Notice the transformation can be done repeatedly until satisfies the claim.Through the composition of the operators above, we obtain a transformer T 2 such that for any optimal mechanism ,all the off-diagonal elements of a row in T 2 have the same color.Lemma 27. There exists ∈ Q * ( , ) for approximate differential privacy or maximal leakage, such that if a diagonal element is not black, then the off-diagonal elements of the same the row are red.Proof. LetQ * ( , ) ⊂ Q * ( , ) be the set of ( , )-valid mechanisms with smallest distortion. Choose an arbitrary mechanism ∈Q * ( , ). Let denote the set of the line numbers such that ( | ) is not black and ( | ) is not red for ≠ .We define On the contrary,Contradiction! Hence, ( | ) ≥ ( | ) for 1 ≤ ≤ .Proposition 31. For the coloring scheme of approximate differential privacy, all the off-diagonal elements in the first row of are red.Proof. Assume that all the off-diagonal elements in the th row are red and all the off-diagonal elements in the th row are white. Since the sum of each row is one, we haveSimilar with the proof of Property 30, this inequality implies ( | ) ≥ ( | ). In other words, all the red rows are arranged above the white ones in .Proposition 32. For the coloring scheme of maximal leakage, all the diagonal elements are black in .Proof. We claim that if the only possible white diagonal element is ( | ), then ( | ) = 0 for ≠ . Otherwise, we can transfer some non-zero off-diagonal elements' value of the th row to ( | ) until it turns black, resulting in a optima l mechanism with smaller distortion. Contradiction to ∈Q * ( , ). Thus ( | ) = 0 for ≠ , leading to the fact ( | ) = 1. This is not possible since ( | ) is white.Combing all aforementioned properties of , one can obtain Lemma 11. The condition of equivalence for zero maximal leakage is given as follows .Sufficiency. For ≥ ( −1) , we define mechanism as follows.= 1 0 · · · 0 1 0 · · · 0 . . . . . . . . . . . .One can easily obtain thatCombining theses two lemmas leads to the proof of Theorem 14.Proof of Theorem14. For ≥ ( −1) , Lemma 33 proves the statement.For 0 < < ( −1) , let * be a mechanism satisfying the conditions in Lemma 34. Then * ( , ) = log =1 * ( | ). Consider the following optimization problem,where 0 = 1 and +1 = 0. Notice that { * ( | )} 1≤ ≤ is a feasible solution of (15), and hence * ( , ) ≥ 6 . Conversely, let { * 1 , * 2 , · · · , * } be the optimal solution of (15). We define the following mechanism.For any ∈ [1, ], =1 ( | ) = 1 which leads to the fact that is a conditional probability. The first restricted condition in the scenario of (15) implies that is within Q ( , ). The other two imply that * ( , ) ≤ ( ) = log ∑︁ By letting := − +1 for 1 ≤ ≤ , we obtain the following standard linear programming. ≥ 0 for 1 ≤ ≤ .Moreover, 6 = log(− 7 ). The dual of (16) is the minimization linear program as below.7 := min 1 , 2 , 3 (−1 + ) 1 − 2 + 3 subject to (17) (1) ( − ) 1 − 2 + 3 ≥ − for 1 ≤ ≤ (2) 1 , 2 , 3 ≥ 0.By strong duality theorem, we have 7 =˜7. Similar with the proof of Theorem 9, we can obtain the following result by fixing 2 ≥ 0. if ( −1) < ≤ ( ) and 1 ≤ ≤ .