key: cord-0610984-599z8c5d authors: Unal, Ali Burak; Pfeifer, Nico; Akgun, Mete title: ppAURORA: Privacy Preserving Area Under Receiver Operating Characteristic and Precision-Recall Curves with Secure 3-Party Computation date: 2021-02-17 journal: nan DOI: nan sha: 1bc4ace043fb1da9afa992d5c44f9c23d5aeaa87 doc_id: 610984 cord_uid: 599z8c5d Computing an AUC as a performance measure to compare the quality of different machine learning models is one of the final steps of many research projects. Many of these methods are trained on privacy-sensitive data and there are several different approaches like $epsilon$-differential privacy, federated machine learning and methods based on cryptographic approaches if the datasets cannot be shared or evaluated jointly at one place. In this setting, it can also be a problem to compute the global performance measure like an AUC, since the labels might also contain privacy-sensitive information. There have been approaches based on $epsilon$-differential privacy to deal with this problem, but to the best of our knowledge, no exact privacy preserving solution has been introduced. In this paper, we propose an MPC-based framework, called fw{}, with private merging of sorted lists and novel methods for comparing two secret-shared values, selecting between two secret-shared values, converting the modulus, and performing division to compute the exact AUC as one could obtain on the pooled original test samples. With fw{} computation of the exact area under precision-recall curve and receiver operating characteristic curve is even possible when ties between prediction confidence values exist. To show the applicability of fw{}, we use it to evaluate a model trained to predict acute myeloid leukemia therapy response and we also assess its scalability via experiments on synthetic data. The experiments show that we efficiently compute exactly the same AUC with both evaluation metrics in a privacy preserving manner as one can obtain on the pooled test samples in the plaintext domain. Our solution provides security against semi-honest corruption of at most one of the servers performing the secure computation. Recently, privacy preserving machine learning studies aimed at protecting sensitive information during training and/or testing of a model in scenarios where data is distributed between different sources and cannot be shared in plaintext [Mohassel and Zhang, 2017 , Wagh et al., 2018 , Juvekar et al., 2018 , Mohassel and Rindal, 2018 , Ünal et al., 2019 , Damgård et al., 2019 , Byali et al., 2020 . However, privacy protection in the computation of the area under curve (AUC), which is one of the most preferred methods to compare different machine learning models with binary outcome, has not been addressed sufficiently. Even though there are no studies in the literature enabling such a computation for the precision-recall (PR) curve, there are several differential privacy based approaches in the literature for the receiver operating characteristic (ROC) curve [Chaudhuri and Vinterbo, 2013 , Boyd et al., 2015 , Chen et al., 2016 . Briefly, they protect the privacy of the data by introducing noise into the computation so that one cannot obtain the original data employed in the computation. However, due to the nature of differential privacy, the resulting AUC is different from the one which could be obtained by using non-perturbed prediction confidence values (PCVs). For private computation of the exact AUC, there exists no approach in the literature to the best of our knowledge. In this paper, we propose the privacy preserving area under receiver operating characteristic and precision-recall curves (ppAURORA) based on a secure 3-party computation framework to address the necessity of an efficient, private and secure computation of the exact AUC. We compute the area under the PR curve (AUPR) and ROC curve (AUROC) with ppAURORA. We address two different cases of ROC curve in ppAURORA by two different versions of AUROC computation. The first one is designed for the computation of the exact AUC by using PCVs with no tie. In case of a tie of PCVs of samples from different classes, it just approximates the metric based on the order of the samples, having a problem when values of both axes change at the same time. In order to compute the exact AUC even in case of a tie, we introduce the second version with a slightly higher communication cost than the first approach. Along with the AUC, both are capable of protecting the information of the number of samples belonging to the classes from all participants of the computation, which could be used to obtain the order of the labels of the PCVs [Whitehill, 2019] . Furthermore, since we do not provide the data sources with the ROC curve, they cannot regenerate the underlying true data. Therefore, both versions are secure against such attacks [Matthews and Harel, 2013] . We utilized the with-tie version of AUROC computation to compute the AUPR since the values of both axes can change at the same time even if there is no tie. We introduce a novel 3-party computation framework to achieve privacy preserving AUC computation with ppAURORA. The framework consists of privacy preserving sorting and four operations with novel and efficient versions, which are select share (MUX), modulus conversion (MC), compare (CMP) and division (DIV) . MUX is designed to select one of two secret shares based on a secret shared bit value. MC privately converts the ring of a secret shared value from 2 −1 to 2 . CMP compares two secret shared values, determines if the first argument is larger than the second one without revealing values and splits the result in a secret shared way. DIV performs the division of two secret shared values without sacrificing the privacy of the values. Note that our new DIV is specifically customized for efficient and secure AUC computation. In this section, we describe the scenarios at which ppAURORA is applicable. Note that it is not limited to these scenarios. End-to-end MPC-based Collaborative Learning: Recently, researchers proposed multi-party computation (MPC) based training and testing of several machine learning algorithms [Mohassel and Zhang, 2017 , Wagh et al., 2018 , Damgård et al., 2019 . Their approaches can train the model privately and collaboratively, and make predictions on the test data of the data sources involved in the computation. However, the privacy preserving collaborative evaluation of the model is lacking. To fulfill such a gap, one can integrate ppAURORA at the end of the process once the PCVs are secret shared among two computing parties. One possible scenario would be that one wants to build a model for predicting hospitalization times for COVID-19 patients. Usually, the personal data cannot be shared easily, due to the private nature of the data, but even sharing the hospitalization times might be problematic in case the hospitals do not want to allow competitors to learn about this piece of information. Nevertheless, one can assume that a model built on data from many hospitals will perform much better than models built on individual datasets and that the global AUC will allow for a better model selection than an average of locally computed AUCs. This privacy preserving global AUC computation can be performed with our ppAURORA. Evaluation of Models Trained by Federated Learning: In some cases, the data is not allowed to be shared at all. For such cases, federated learning has been widely utilized to train a collaborative machine learning model without gathering the data in one place. Each data source updates the model by its own data in an online learning process and passes the model to the next data source until the model converges or the iteration limit is reached. Once the data sources obtain the trained model, they make the predictions of their own test data. In order to collaboratively evaluate the performance of the model, they can utilize ppAURORA. Security Model: In this study, we prove the full security of our solution (i.e., privacy and correctness) in the presence of semi-honest adversaries that follow the protocol specification, but try to learn information from the execution of the protocol. We consider a scenario where a semi-honest adversary corrupts a single server and an arbitrary number of data owners in the simulation paradigm [Lindell, 2017 , Canetti, 2001 where two worlds are defined: the real world where parties run the protocol without any trusted party, and the ideal world where parties make the computation through a trusted party. Security is modeled as the view of an adversary called a simulator S in the ideal world, who cannot be distinguished from the view of an adversary A in the real world. The universal composability framework [Canetti, 2001] introduces an adversarial entity called environment Z, which gives inputs to all parties and reads outputs from them. The environment is used in modeling the security of end-to-end protocols where several secure protocols are used arbitrarily. Security here is modeled as no environment can distinguish if it interacts with the real world and the adversary A or the ideal world and the simulator S. We also provide privacy in the presence of a malicious adversary corrupting any single server, which is formalized by Araki et al. [2016] . The privacy is formalized by saying that a malicious party, which arbitrarily deviates from the protocol description, cannot learn anything about the inputs and outputs of the honest parties. Notations: In our secure protocols, we use additive secret sharing over three different rings Z L , Z K and Z P where L = 2 , K = 2 −1 , P = 67 and = 64. We denote two shares of x over Z L , Z K and Z P with ( x 0 , x 1 ), ( x K 0 , x K 1 ) and ( x P 0 , x P 1 ), respectively. If a value x is shared over the ring Z P , each bit of x is additively shared in Z P . This means x is shared as a vector of 64 shares where each share takes a value between 0 and 66. We also use boolean sharing of a single bit which is denoted with ( x B 0 , x B 1 ). Secure multi-party computation was proposed in the 1980s [Yao, 1986 , Goldreich et al., 1987 . These studies showed that multiple parties can compute any function on inputs without learning anything about the inputs of the other parties. Let us assume that there are n parties I 1 , · · · , I n and I i has a private input x i for i ∈ {1, . . . , n}. All parties want to compute the arbitrary function (y 1 , . . . , y n ) = f (x 1 , . . . , x n ) and get the result y i . MPC allows the parties to compute the function through an interactive protocol and I i to learn only y i . We first explain the 2-out-of-2 additive secret sharing and how addition (ADD) and multiplication (MUL) are computed. In additive secret sharing, an -bit value x is shared additively in the ring Z L as the sum of two values. For -bit secret sharing of x, we have x 0 + x 1 ≡ x mod L where I i knows only x i and i ∈ {0, 1}. All arithmetic operations are performed in the ring Z L . For additive secret sharing, we use protocols based on Beaver's multiplication triples [Beaver, 1991] . Addition: z = x + y . I i locally computes z i = x i + y i . In order to compute the addition of a shared value x and a constant c, I i locally computes z i = x i + c and I 1−i locally computes z 1−i = x 1−i . Multiplication: z = x · y . Multiplication is performed using a pre-computed multiplication triple c i = a i · b i [Beaver, 1991] . I i computes e i = x i − a i and f i = y i − b i . I i sends e i and f i to I 1−i . I i reconstructs e and f , and then computes z i = i · e · f + f · a i + e · b i + c i . The computation of the multiplication triple is performed via homomorphic encryption or oblivious transfer. I i cannot perform multiplication locally. One of the most common ways to summarize the plot-based model evaluation metrics is area under curve (AUC) which measures the area under the curve. It is applicable to various different evaluation metrics. In this study, we employ AUC to measure the area under the ROC curve and the PR curve. In machine learning problems with binary outcome, the ROC curve is very effective to take the sensitivity and the specificity of the classifier into account by plotting the false positive rate (FPR) on the x-axis and the true positive rate (TPR) on the y-axis. AUC summarizes this plot by measuring the area between the line and the x-axis, which is the area under ROC curve (AUROC This formula just approximates the exact AUROC in case of a tie in V depending on the order of the samples. As an extreme example, let V have 10 samples with the same PCV. The first 5 samples have label 1 and the second 5 samples have label 0. Such a setting outputs AU ROC = 1. As a contrary, if we have samples with 0 at the beginning and samples with 1 later, we obtain AU ROC = 0. In order to define an accurate formula for the AUROC in case of such a tie condition, let ξ be the vector of indices in ascending order where the PCV of the sample at that index and the preceding are different for 0 ≤ |ξ| ≤ M where |ξ| denotes the size of the vector. Assuming that ξ[0] = 0, the computation of AUROC can be done as follows: As Equation 2 indicates, one only needs TPR and FPR values on the points where the PCV changes to obtain the exact AUROC. We will benefit from this observation in the privacy preserving AUROC computation. Another model evaluation metric for machine learning problems with binary outcome is the PR curve which plots recall on the x-axis and precision on the y-axis, and it is generally preferred over AUROC for scenarios with class imbalances. The AUC summarizes this plot by measuring the area under the PR curve (AUPR). Since both precision and recall can change at the same time even without a tie, we measure the area by using the Equation 2 where T being the precision and F being the recall. In this section, we give the definitions of basic operations of the framework that we use in ppAURORA. Note that we include only the operations with novelty due to the page limit. One can refer to the Supplement for the other operations. Selecting One of Two Secret Shared Values: Algorithm 1 performs 3-party computation to select one of two secret shared values based on the secret shared bit value (functionality F MUX ). At the beginning of the protocol, S 0 and S 1 hold ( x 0 , y 0 , b 0 ) and ( x 1 , y 1 , b 1 ), respectively. At the end of the secure computation, S 0 and S 1 hold the fresh shares of z = x − b(x − y). We utilize the randomized encoding of multiplication [Applebaum, 2017] . As shown in Equation 3, we need to multiply two values owned by different parties in the computation of b 0 ( x 1 − y 1 ) and b 1 ( x 0 − y 0 . We assume that S 2 is the computation party and performs these multiplications via the randomized encoding. Modulus Conversion: Algorithm 2 describes our 3-party protocol realizing the functionality F MC that converts shares over Z K to fresh shares over Z L where L = 2K. Assuming that S 0 and S 1 have the shares x K 0 and x K 1 , respectively, the first step for S 0 and S 1 is to mask their shares by using the shares of the random value r ∈ Z K sent by S 2 . Afterwards, they reconstruct (x + r) ∈ Z K , by first computing y K i = x K i + r K i for i ∈ {0, 1} and sending these values to each other. Along with the shares of r ∈ Z K , S 2 also sends the information in boolean shares telling whether the summation of the shares of r wraps so that S 0 and S 1 can convert r from the ring Z K to the ring Z L . Once they reconstruct y ∈ Z K , S 0 and S 1 can change the ring of y to Z L by adding K to one of the shares of y if y K 0 + y K 1 wraps. After conversion, the important detail regarding y ∈ Z L is to fix the value of it. In case (x + r) ∈ Z K wraps, which we identify by using PC, depending on the boolean share of the outcome of PC, S 0 or S 1 or both add K to their shares. If both adds, this means that there is no addition to the value of y ∈ Z L . At the end, S i subtracts r i ∈ Z L from y i ∈ Z L and obtains x i ∈ Z L for i ∈ {0, 1}. Comparison of Two Secret Shared Values: Algorithm 3 gives the definition of the 3-party protocol for the functionality F CMP that compares two secret shared values x and y, and outputs zero if x ≥ y, 1 otherwise. In Algorithm 3, we find the value of (x−y)∧2 L−1 that indicates the most significant bit (MSB) of (x−y). First, converts the ring of d from Z K to Z L by calling MC. Finally, S 0 and S 1 subtract the shares of d over Z L from the shares of x − y and map the result to 1 if it equals K, otherwise 0. Division: Algorithm 4 gives the definition of the 3-party protocol realizing the functionality F DIV that computes x/y. S 0 and S 1 hold shares of x and y, and DIV outputs fresh shares of x/y. DIV computes x/y correctly when S 0 and S 1 know the upper bound for x and y. Thus, it is not a general purpose method over Z L . We rather specifically designed it for ppAURORA since S 0 and S 1 know the upper bound of inputs to DIV in the computation of the AUC. There are some frameworks in the literature that also perform 3-party secure computation. The recently published SecureNN Wagh et al. [2018] has shown successful results in terms of performance in private CNN training. We compared our secure SecureNN Wagh et al. [2018] Our Framework protocols with the corresponding building blocks from SecureNN that one needs to use for computing AUC. Table 1 summarizes the comparison. It is clear that our protocols have lower costs. Therefore, we can deduce that our framework outperforms SecureNN in AUC computation. Our novel protocols (MUX, MC, CMP, DIV) require at most 5 communication rounds, which is the most important factor affecting the performance of secure computing protocols. Our new protocols show an improvement in terms of round complexities. 1 Algorithm MUX() input :S 0 and S 1 hold ( x 0 , y 0 , b 0 ) and ( x 1 , y 1 , b 1 ), respectively. output :S 0 and S 1 get z 0 and z 1 , respectively, where z = S 2 divides z into two shares ( z 0 + z 1 ) and sends z 0 and z 1 to S 0 and S 1 , respectively Algorithm 1: Select Share (MUX) 1 Algorithm MC() input :S 0 and S 1 hold x K 0 and x K 1 , respectively output :S 0 and S 1 get x 0 and x 1 , respectively 2 S 0 and S 1 hold a common random bit n 3 S 2 picks a random numbers r ∈ Z 2 L−1 and generates r K Algorithm 2: Modulus Conversion (MC) In this section, we give the description of our protocol for ppAURORA computation. In ppAURORA, we have data owners that outsource their PCVs and the ground truth labels in secret shared form and three non-colluding servers that perform 3-party computation on secret shared PCVs to compute AUC. The protocol starts with outsourcing by the data sources. Afterward, the servers perform the desired calculation privately. Finally, they send the shares of the result back to the data sources. The communication between all parties is performed over a secure channel (e.g., TLS). 1 Algorithm CMP() input :S 0 and S 1 hold ( x 0 , y 0 ) and ( x 1 , y 1 ), respectively output :S 0 and S 1 get z 0 and z 1 , respectively, where z is equal to zero if x >= y and K otherwise 2 S 0 and S 1 hold a common random bit f 3 1 Algorithm DIV() input :S 0 and S 1 hold ( x 0 , y 0 ) and ( x 1 , y 1 ), respectively. output :S 0 and S 1 get z 0 and z 1 , respectively, where z = x/y. 2 S 0 , S 1 and S 2 know the common scaling factor F 3 S 0 and S 1 know the upper limit of x and y, which is denoted by U 4 S 0 and S 1 hold two common random values r 0 and r 1 , where r 0 < L/2U and r 1 < L/2U 5 For each i ∈ {0, 1}, S i executes Steps 4-5. 6 S i computes a i = r 1 x i + r 0 y i and b i = r 1 y i 7 S i sends a i and b i to S 2 8 S 2 reconstructs a and b and computes c = aF/b 9 S 2 creates two shares of c denoted by c 0 and c 1 , and sends them to S 0 and S 1 , respectively 10 S 0 computes z 0 = c 0 11 S 1 computes z 1 = c 1 − r 0 F/r 1 Outsourcing: At the start of ppAURORA, each data owner H i has a list of PCVs and corresponding ground truth labels for i ∈ {1, . . . , n}. Then, each data owner H i sorts its whole list T i according to PCVs in descending order and divides it into two additive shares T i0 and T i1 , and sends T i0 and T i1 to S 0 and S 1 , respectively. We refer to S 0 and S 1 as proxies. Sorting: After the outsourcing phase, S 0 and S 1 obtain the shares of individually sorted lists of PCVs of the data owners. Afterwards, the proxies need to perform a merging operation on each pair of individually sorted lists and continue with the merged lists until they obtain the global sorted list of PCVs. This can be considered as the leaves of a binary tree merging into the root node, which is, in our case, the global sorted list. Due to the high complexity of privacy preserving sorting, we decided to make the sorting parametric to adjust the trade-off between privacy and practicality. Let δ = 2a + 1 be this parameter that determines the number of PCVs that will be added to the global sorted list in each iteration for a ∈ N, T i k and T j k be the shares of two individually sorted lists of PCVs in S k s for k ∈ {0, 1} and |T i | ≥ |T j | where |.| is size operator. At the beginning, the proxies privately compare the lists elementwise. They utilize the results of the comparison in MUXs to privately exchange the shares of PCVs in each pair if the PCV in T j is larger than the PCV in T i . In the first MUX, they input the share in T i k to MUX first and then the share in T j k along with the share of the result of the comparison to select the larger of the PCVs and move it to T i k . In the second MUX, they reverse the order to select the smaller of the PCVs and move it to T j k . We call this stage shuffling. Then, they move the top PCV of T i k to the merged list of PCVs. If δ = 1, then they continue comparing the top PCVs in the lists and moving the largest of them to the merged list. Once they move δ PCVs to the merged list, they shuffle the lists again. Until finishing up the PCVs in T i k , the proxies follows shuffling-moving cycle. The purpose of the shuffling is to increase the number of candidates for a specific position and, naturally, lower the chance of matching a PCV in the individually sorted lists to a PCV in the merged list. The highest possible chance of a matching is 50%. This results in a very low chance of guessing the matching of whole PCVs in the list. Regarding the effect of δ on the privacy, it is important to note that δ needs to be an odd number to make sure that shuffling always leads to increment in the number of candidates. Even value of δ may cause ineffective shuffling during the sorting. Furthermore, δ = 1 provides the utmost privacy, which means that the chance of guessing the matching of the whole PCVs is 1 over the number of all possible merging of those two individually sorted lists. However, the execution time of sorting with δ = 1 can be relatively high. For δ = 1, the execution time can be low but the number of possible matching of PCVs in the individually sorted list to the merged list decreases in parallel to the increment of δ. As a guideline on the choice of δ, one can decide it based on how much privacy loss any matching could cause on the specific task. In case of δ = 1 and |T j k | = 1 at some point in the sorting, the sorting continues as if it had just started with δ = 1 to make sure that the worst case scenario for guessing the matching can be secured. More details of the sorting phase are in the Supplement. Once S 0 and S 1 obtain the global sorted list of PCVs, they calculate the AUROC based on this list by employing one of the versions of AUROC depending on whether there exists tie in the list. is a share of the global sorted list of PCVs, and labels 1 For each i ∈ {0, 1}, S i executes Steps 2-11 Algorithm 5: Secure AUROC computation without ties T i is a share of the global sorted list of PCVs, and labels 1 For each i ∈ {0, 1}, S i executes Steps 2-18 Algorithm 6: Secure AUROC computation with tie In Algorithm 5, we compute the AUROC as shown in Equation 1 because we assume that there is no tie in the sorted list of PCVs. At the end of the secure computation, the shares of numerator N and denominator D are computed where AU ROC = N/D. S i for i ∈ {0, 1} knows the number of test samples M . Thus S i can determine the upper bound for N and D and DIV can be used to calculate AU ROC = N/D. With the help of high numeric value precision of the results, most of the machine learning algorithms yield different PCVs for samples. Therefore, such an approach to compute the AUROC is applicable to most of the machine learning tasks. However, in case of a tie between samples from two classes in the PCVs, it does not guarantee that it gives the exact AUROC. Depending on the order of the samples, it approximates the score. To have a more accurate AUROC, we will propose another version with a slightly higher communication cost in the next section. To detect ties in the list of PCVs, S 0 and S 1 compute the difference between each PCV and its following PCV. S 0 computes the modular additive inverse of its shares. The proxies apply a common random permutation to the bits of each share in the list to prevent S 3 from learning the non-zero relative differences. They also permute the list of shares using a common random permutation to shuffle the order of the real test samples. Then, they send the list of shares to S 2 . S 2 XORes two shares and maps the result to one if it is greater than zero and zero otherwise. Then, proxies privately map PCVs to zero if they equal to their previous PCV and one otherwise. This phase is depicted in Algorithm 7. In Algorithm 6, S 0 and S 1 use these mappings to take only the PCVs which are different from their subsequent PCV into account in the computation of the AUROC based on the Equation 2. In Algorithm 6, DIV method is used because the upper limit for the numerator and denominator is known, as in the AUROC computation described in Section 5.1.1. As in the AUROC computation described in Section 5.1.2, S 0 and S 1 map a PCV in the global sorted list to zero if it equals to the previous PCV and one otherwise by running the Algorithm 7. Then, we use Equation 2 to calculate AUPR as shown in Algorithm 8. In the AUPR calculation, the denominator of each precision value is different. Therefore, we need to perform division in each iteration. Since the proxy servers can determine the upper bound for numerators and denominators we can use the DIV operation to perform divisions. input : C i = ( con 1 i , ..., con M i ), C i is a share of the global sorted list of PCVs, M is the number of PCVs 1 S 0 and S 1 hold a common random permutation π for M items 2 S 0 and S 1 hold a list of common random values R 3 S 0 and S 1 hold a list of common random permutation σ for items 4 For each i ∈ {0, 1}, S i executes Steps 5-13 Algorithm 8: Secure AUPR computation with tie We provide all semi-honest simulation-based security proofs for the MUX, MC, CMP and DIV functions defined in our framework as well as the computations of ppAURORA in the Supplement. Synthetic Dataset: Since we aimed to analyze the scalability of ppAURORA at different settings, we generated a synthetic dataset with no restriction other than having the PCVs between 0 and 1. Experimental Setup: We conducted our experiments on Amazon EC2 t2.xlarge instances. For the LAN setting, we utilized only the instances in the Frankfurt region. For the WAN setting, we additionaly selected one instance from both London and Paris. We conducted the experiments on LAN setting. We assume that the result needs to be precise in the first three decimal places. Thus, we set the precision of the division operation to cover up to the fourth decimal place. To assess the correctness, we computed the AUROC by ppAURORA and compared it to the result obtained without privacy. Both settings gave AU ROC = 0.693. To check the correctness of AUROC with no-tie of ppAURORA, we randomly picked one of the samples in tie condition and generated a subset of the samples with no tie. We obtained the same AUROC with no-tie version of AUROC of ppAURORA as the non-private computation. Additionally, we computed AUPR by employing ppAURORA and verified that both private and non-private gave AU P R = 0.844. These results indicate that without sacrificing the privacy of the data, ppAURORA can compute the exact same AUC as one could obtain on the pooled test samples. To justify the collaborative evaluation, we performed 1000 repetitions of the non-private AUROC computation with N ∈ {5, 10, 20, 40, 80, 160} samples, which are randomly drawn from the whole data. The AUC starts to become more and more stable when we increase the size of the samples. We evaluated no-tie and with-tie versions of AUROC and AUPR of ppAURORA with δ = 1 on the settings in which the number of data sources is 16 and the number of samples is N ∈ {64, 125, 250, 500, 1000}. The results showed that ppAURORA scales almost quadratically in terms of both communication costs among all parties and the execution time of the computation. Figure 1a displays the results. We also analyzed the performance of all computations of ppAURORA on a varying number of data sources. We fixed δ = 1, the number of samples in each data sources to 1000 and experimented with D data sources where D ∈ {2, 4, 8, 16}. Similar to the previous analysis, ppAURORA scales around quadratically to the number of data sources and Figure 1b summarizes the results. We also analyzed the effect of δ ∈ {1, 3, 5, 11, 25, 51, 101} by fixing the number of data sources to 8 and the number of samples in each data source to 1000. The execution time shown in Figure 1c displays logarithmic decrease for increasing δ. In all analyses, since the dominating factor is sorting, the execution times of the computations are close to each other. Additionally, our analysis showed that LAN is 12 to 14 times faster than WAN on average due to the high round trip time of WAN, which is approximately 13.2 ms. However, even with such a scaling factor, ppAURORA can be deployed in real life scenarios if the alternative is a more time-consuming approval process required for gathering all data in one place still protecting the privacy of data. We provided the detailed results as tables in the Supplement. In this work, we presented a novel secure 3-party computation framework and its application, ppAURORA, to compute AUC of the ROC and PR curves privately even when there exist ties in the PCVs. We proposed four novel protocols in the framework, which are MUX to select one of two secret shared values, MC to convert the ring of a secret value from 2 −1 to 2 , CMP to compare two secret shared values and DIV to divide two secret shared values. They have low round and communication complexities. The framework and its application ppAURORA are secure against passive adversaries in the honest majority setting. We implemented ppAURORA in C++ and demonstrated that ppAURORA can both compute correctly and privately, and scales quadratically to the number of parties and the number of samples. To the best of our knowledge, ppAURORA is the first method that enables computing the exact AUC (AUROC and AUPR) privately and securely. In this section, we give the definitions of basic operations that we utilized in addition to the novel operations given in the main paper. In two-party setting, multiplication is performed using a multiplication triple [Beaver, 1991] which is generated via homomorphic encryption or oblivious transfer. In our 3-party setting, S 2 generates the multiplication triple and sends the shares of it to S 0 and S 1 . S 0 and S 1 hold ( x 0 , y 0 ) and ( x 1 , y 1 ), respectively, and secure multiplication outputs fresh shares of xy (functionality F MUL ). 1 Algorithm MUL() input :S 0 and S 1 hold ( x 0 , y 0 ) and ( x 1 , y 1 ), respectively. output :S 0 and S 1 get z 0 and z 1 , respectively, where z = x · y. Algorithm 9: Multiplication In this function, a value r in the ring Z K whose bits are secret shared to S 0 and S 1 in the ring P where P = 67 is compared with a common value y. At the end of the secure computation, S 2 learns a bit n = n ⊕ (r > y). We get the definition of this functionality F PC from Wagh et al. [2018] where it is called as Private Compare (PC). The only change that we made is S 2 sends the fresh boolean shares of n to S 1 and S 2 . PC is described in Algorithm 10 in the Supplement. In this section, we provide the detailed results of the experiments with ppAURORA to compute AUROC and AUPR in Tables 2 and 3, respectively. We analyzed the stability of the AUROC based on the number of test samples to justify the collaborative evaluation. We experimented with N ∈ {5, 10, 20, 40, 80, 160} test samples which are randomly chosen from the DREAM Challenge Dataset. In order to have a fair evaluation, we repeated these experiments 1000 times. The experiments showed that the AUROC becomes more reliable and stable if we increase the number of test samples. In case a data source has no more additional test samples, the collaborative evaluation can be a best option. Figure 2 summarizes the results of the analysis. We include some sorting examples to demonstrate the process. In Figure 3 , we show the merging of two lists of PCVs with the same size. In this example, δ = 1. By this setting, we do not decrease the number of possible merging of two individually sorted lists, which can be computed as: where L 1 and L 2 are the individually sorted lists and |.| denotes the size operation. Since the cost of having fully private merging is high and may not necessary for some applications, we can have an intermediate solution. By setting δ to a higher odd value, we can speed up the merging process. Figure 4 and 5 demonstrates an example of such a merging with δ = 3. The total number of shuffling is 5 and 3, respectively. respectively, a common input y and a common random bit n. output :S 0 and S 1 get n = n ⊕ (r > y), n B 0 and n B 1 , respectively. 2 S 0 , S 1 hold common random values s j ∈ Z * p for all j ∈ [ ] and a random permutation π for elements. S 0 and S 1 additionally hold common random values u j ∈ Z * p . Let t = y + 1 mod 2 4 P i executes Steps 5 − 17: In Figure 6 , we show the sorting when the number of PCVs in the second list is 1 to justify why we set δ = 1 regardless of the initial δ due to the privacy reasons. By setting δ, we secure that the number of possible PCVs in each position of the global sorted list is the same as one can see on such a merging without privacy. The beginning and the end of the global sorted list have two possible matching, however, due to the nature of the individual sorting, the number of possible PCVs for the other positions is only 3, which is lower than the ones in Figure 3 . Here, we provide semi-honest simulation-based security proofs for the MUX, MC, CMP and DIV functions we have defined in our framework. Since the protocols we propose for AUC calculation use MUX, MC, CMP, DIV and previously defined functions, we prove the security of the main protocols in the F-hybrid model by proving the security of each function we call. Lemma 10.1. The protocol MUL in Algorithm 9 in the Supplement securely realizes the functionality F MUL . For each red arrow, i.e. for each shuffling, we utilize PC operation on PCVs from two lists which are on the same index to select the larger of them and employ the result of this comparison along with MUXs put the larger one into the L 1 and the other into L 2 . Afterward, we move the top of the first list to the global sorted list. Since δ = 1, we perform shuffling after we move the top element. Proof. In order to prove the correctness of our protocol we show that z 0 + z 1 = xy. We prove the security of our protocol. During the protocol execution, S 2 sends a multiplication triple to S 0 and S 1 and does not receive any values. Thus the view of S 2 is empty and it is very easy to prove security in case S 2 is corrupted. S i where i ∈ {0, 1} sees a i , b i , c i , e i and f i . These values are uniformly distributed random values and hence can be perfectly simulated. Lemma 10.2. The protocol MUX in Algorithm 1 in the main paper securely realizes the functionality F MUX . Proof. We first prove the correctness of our protocol. z i is the output of S i where i ∈ {0, 1}. We need to prove that Figure 4 : The merging of two same size lists with δ = 3. We again start with shuffling and then move the top PCV of the first list into the global sorted list. Afterward, we compare the second element of the first list and the top element of the second list via PC. The proxies reconstruct the result of this comparison and move the share of the larger of the compared PCVs to the global sorted list. They continue until they move δ PCVs to the global sorted list. Then, they shuffle and repeat the same procedure until there is no PCV in the first list. Next we prove the security of our protocol. S 2 gets M 2 , M 3 , M 5 and M 6 . All these values are uniformly random values because they are generated using uniformly random values r 0 , r 1 , r 2 , r 4 . S 2 computes M 2 M 5 + M 3 M 6 . The computed value is still uniformly random because it contains uniformly random values r 0 , r 1 , r 2 , r 4 . As a result, any value learned by S 2 is perfectly simulated. For each i ∈ {0, 1}, S i learns a fresh share of the output. Thus S i cannot associate the share of the output with the shares of the inputs and any value learned by S i is perfectly simulatable. Lemma 10.3. The protocol PC in Algorithm 10 in the Supplement securely realizes the functionality F P C . Proof. The proof of PC is given in Wagh et al. [2018] . Lemma 10.4. The protocol MC in Algorithm 2 in the main paper securely realizes the functionality F M C in F PC hybrid model. Proof. First, we prove the correctness of our protocol by showing ( In the protocol, y = (x + r) mod K and isWrap(x, r, K) = r ? > y, that is, isWrap(x, r, K) = 1 if r > y, 0 otherwise. At the beginning, S 0 , S 1 and S 2 call F P C to compute c = r ? > y and S 0 and S 1 obtain the boolean shares c 0 and c 1 , respectively. Besides, S 2 sends also the boolean shares w 0 and w 1 of w = isWrap( r 0 , r 1 , K) to S 0 and S 1 , respectively. If isWrap( y 0 , y 1 , K) is 1 then S 0 adds K to y 0 to change the ring of y from K to L. To convert r from ring K to ring L, S 0 and S 1 add K to their shares of r based on their boolean shares w 0 and w 1 , respectively. If w 0 = 1, then S 0 adds K to its r 1 and S 1 does the similar with its shares. Later, we need to fix is the summation of x and r, that is the value y. In case of x + r ≥ K, we cannot fix the summation value y in ring L by simply converting it from ring K to ring L. This summation should be x + r in ring L rather than (x + r) mod K. To handle this problem, S 0 and S 1 add K to their shares of y based on their shares c 0 and c 1 . As a result, we convert the values y and r to ring L and fix the value of y if necessary. The final step to obtain x i for party S i is simply subtract r i from y i where i ∈ {0, 1}. Next, we prove the security of our protocol. S 2 involves this protocol in execution of F P C . We give the proof F P C above. At the end of the execution of F P C , S 2 learns n . However, n = n ⊕ (x > r) and S 2 does not know n. Thus n is uniformly distributed and can be perfectly simulated with randomly generated values. S i where i ∈ {0, 1} sees fresh shares of Lemma 10.5. The protocol CMP in Algorithm 3 in the main paper securely realizes the functionality F CM P in F MC hybrid model. Proof. First, we prove the correctness of our protocol. Assume that we have -bit number u. v = u − (u mod 2 −1 ) is either 0 or 2 −1 . v is only represented with the most significant bit (MSB) of u. In our protocol, z i is the output of S i where i ∈ {0, 1}. We need to prove that Reconstruct( z i ) is 1 if x < y and 0 otherwise. S i where i ∈ {0, 1} computes d K i = (x i − y i ) mod K which is a share of d over K. S i computes d i which is a share of d over L by invoking MC. Note that z = x − y − Reconstruct( d i ) and all bits of z are 0 except MSB of z which is equal to MSB of (x − y). Now we need to map z to 1 if it's equal to K and 0 if it's equal to 0. S 0 sends the z 0 and z 0 + K in random order to S 2 and S 1 sends the z 1 to S 2 . S 2 reconstructs two different values, divides these values by K, creates two additive shares of them, and sends these shares to S 0 and S 1 . Since S 0 and S 1 know the order of the real MSB value, they correctly select the shares of its mapped value. Second, we prove the security of our protocol. S i where i ∈ {0, 1} sees d i which is a fresh share of d and a[0] i and a[1] i one of them is a fresh share of the MSB of x − y and the other is a fresh share of the complement of the MSB of x − y. Thus the view of S i can be perfectly simulated with randomly generated values. Lemma 10.6. The protocol DIV in Algorithm 4 in the main paper securely realizes the functionality F DIV . Proof. We first prove the correctness of our protocol. z i is the output of S i where i ∈ {0, 1}. We prove that Reconstruct( z i ) = xF y . DIV method produces correct results when S 0 and S 1 know the upper limit of x and y values. In this case, the values of r 0 and r 1 are chosen in such a way that the values r 1 x 0 + r 0 y 0 + r 1 x 1 + r 0 y 1 and r 1 y 0 + r 1 y 1 do not wrap around Z L . If wrapping occurs around Z L , DIV method produces the wrong result. Next we prove the security of our protocol. S 2 gets a 0 , a 1 , b 0 and b 1 . All these values are uniformly random values because they are generated using uniformly random values r 0 and r 1 . As a result, any value learned by S 2 is perfectly simulated. For each i ∈ {0, 1}, S i learns a fresh share of the output. Thus S i cannot associate the share of the output with the shares of the inputs and any value learned by S i is perfectly simulatable. Lemma 10.7. The protocol in Algorithm 5 in the main paper securely computes AUROC in (F MUL , F DIV ) hybrid model. Proof. In the protocol, we separately calculate the numerator N and the denominator D of the AUROC, which can be expressed as AU ROC = N D . Let us first focus on the computation of D. It is equal to the multiplication of the number of samples with label 1 by the number of samples with label 0. In the end, we have the number of samples with label 1 in T P and calculate the number of samples with label 0 by P − T P . Then, the computation of D is simply the multiplication of these two values. In order to compute N , we employed Equation 1 in the main paper. We have already shown the denominator part of it. For the numerator part, we need to multiply the current T P by the change in F P and sum up these multiplication results. A ← MUL( T P , F P − pF P ) computes the contribution of the current sample on the denominator and we accumulate all the contributions in N , which is the numerator part of Equation 1 in the main paper. Therefore, we can conclude that we correctly compute the AUROC. Next, we prove the security of our protocol. S i where i ∈ {0, 1} sees { RL } j∈M , { A } j∈M , D and ROC which are fresh shares of these values. Thus the view of S i is perfectly simulatable with uniformly random values. Lemma 10.8. The protocol in Algorithm 7 in the main paper securely marks the location of ties in the list of prediction confidences. Proof. For the correctness of our protocol, we need to prove that for each index j in T , t[j].con = 0 if (C[j] − C[j + 1]) = 0, t[j].con = 1, otherwise. We first calculate the difference of successive items in C. Let assume we have two additive shares ( a 0 , a 1 ) of a over the ring Z L . If a = 0, then (L − a 0 ) ⊕ a 1 = 0 and if a = 0, then (L − a 0 ) ⊕ a 1 = 0 where L − a 0 is the additive modular inverse of a 0 . We use this fact in our protocol. S 0 computes the additive inverse of each item c 0 in C 0 which is denoted by c 0 , XORes c 0 with a common random number in R, which is denoted by c 0 and permutes the bits of c 0 with a common permutation σ which is denoted by c 0 . S 1 XORes each item c 1 in C 1 with a common random number in R which is denoted by c 1 and permutes the bits of c 1 with a common permutation σ which is denoted by c 1 . S i i ∈ {0, 1} permutes values in C i by a common random permutation π which is denoted by D i . After receiving D 0 and D 1 , S 2 maps each item d of D to 0 if d 0 ⊕ d 1 = 0 which means d 0 + d 1 = 0 and maps 1 if d 0 ⊕ d 1 = 0 which means d 0 + d 1 = 0. After receiving a new share of D from S 2 , S i i ∈ {0, 1} removes dummy values and permutes remaining values by π . Therefore, our protocol correctly maps items of C to 0 or 1. We next prove the security of our protocol. S i where i ∈ {0, 1} calculates the difference of successive prediction values. The view of S 2 is D which includes real and dummy zero values. S i XORes each item of C i with fresh boolean shares of zero, applies a random permutation to bits of each item of C i , applies a random permutation π to C i and add dummy zero and non-zero values. Thus differences, the index j where D[j] = 0, the index j where D[j] = 0 are uniformly random. The number of zero and non-zero values are not known to S 2 due to dummy values. With common random permutations σ j∈M and common random values R[j], j ∈ M , each item in C are hidden. Thus S 2 can not infer anything about real values in C. Furthermore, the number of repeating predictions is not known to S 2 due to a random permutation π. Lemma 10.9. The protocol in Algorithm 6 in the main paper securely computes AUROC in (F MUL ,F MUX ,F DIV ) hybrid model. Proof. In order to compute the AUROC in case of tie, we utilize Equation 2 in the main paper, of which we calculate the numerator and the denominator separately. The calculation of the denominator D is the same as Lemma 10.7. The computation of the numerator N has two different components, which are N 1 and N 2 . N 1 , more precisely the numerator of is similar to no-tie version of privacy preserving AUROC computation. This part corresponds to the rectangle areas in ROC curve. The decision of adding this area A to the cumulative area N 1 is made based on the result of the multiplication of A by t.con. t.con = 1 indicates if the sample is one of the points of prediction confidence change, 0 otherwise. If it is 0, then A becomes 0 and there is no contribution into N 1 . If it is 1, then we add A to N 1 . On the other hand, N 2 , which is the numerator of (T , accumulates the triangular areas. We compute the possible contribution of the current sample to N 2 . In case this sample is not one of the points that the prediction confidence changes, which is determined by t.con, then the value of A is set to 0. If it is, then A remains the same. Finally, A is added to N 2 . Since there is a division by 2 in the second part of Equation 2 in the main paper, we multiply N 1 by 2 to make them have common denominator. Afterwards, we sum N 1 and N 2 to obtain N In order to have the term 2 in the common denominator, we multiply D by 2. As a result, we correctly compute the denominator and the nominator of the AUROC. Next, we prove the security of our protocol. S i where i ∈ {0, 1} sees { RL } j∈M , { A } j∈M , { pF P } j∈M , { pT P } j∈M , D and ROC which are fresh shares of these values. Thus the view of S i is perfectly simulatable with uniformly random values. Lemma 10.10. The protocol in Algorithm 8 in the main paper securely computes AUPR in (F MUL ,F MUX ,F DIV ) hybrid model. Proof. In order to compute the AUPR , we utilize Equation 2 in the main paper of which we calculate the numerator and the denominator separately. We nearly perform the same computation with the AUROC computation in case of tie. The main difference is that we need perform division to calculate each precision value because denominators of each precision value are different. The rest of the computation is the same with the computation in Algorithm 6 in the main paper. The readers can follow the proof of Lemma 10.9. Next, we prove the security of our protocol. S i where i ∈ {0, 1} sees { RL } j∈M , { T _P C } j∈M , { A } j∈M , { pP C } j∈M , { pRC } j∈M , D and ROC which are fresh shares of these values. Thus the view of S i is perfectly simulatable with uniformly random values. Lemma 10.11. The sorting protocol in Section 5 in the main paper securely merges two sorted lists in (F CMP ,F MUX ) hybrid model. Proof. First, we prove the correctness of our merge sort algorithm. L 1 and L 2 are two sorted lists. In the merging of L 1 and L 2 , the corresponding values are first compared using the secure CMP operation. The larger values are placed in L 1 and the smaller values are placed in L 2 , after the secure MUX operation is called twice. This process is called shuffling because it shuffles the corresponding values in the two lists. After the shuffling process, we know that the largest element of the two lists is the top element of L 1 . Therefore, it is removed and added to the global sorted list L 3 . On the next step, the top elements of L 1 and L 2 are compared with the CMP method. The comparison result is reconstructed by S 0 and S 1 and the top element of L 1 or L 2 is removed based on the result of CMP and added to L 3 . The selection operation also gives the largest element of L 1 and L 2 because L 1 and L 2 are sorted and the selection operation selects the larger of the top elements of L 1 and L 2 . We show that shuffling and selection operations give the largest element of two sorted lists. This ensures that our merge sort algorithm that only uses these operations correctly merges two sorted lists in ordered manner. Next, we prove the security of our merge sort algorithm. In the shuffling operation, CMP and MUX operations are called. CMP outputs fresh shares of comparison of corresponding values in L 1 and L 2 . Shares of these comparison results are used in MUX operations and MUX operation generates fresh shares of the corresponding values. Therefore, S 0 and S 1 cannot precisely map these values to the values in L 1 and L 2 . In the selection operation, CMP is called and the selection is performed based on the reconstructed output of CMP. S 0 and S 1 are still unable to map the values added to L 3 to the values in L 1 and L 2 precisely because at least one shuffling operation took place before these repeated selection operations. Shuffling and δ − 1 selection operations are performed repeatedly until the L 1 is empty. After each shuffling operation, the fresh share of the larger corresponding values in L 1 and the fresh share of the smaller corresponding values in L 2 are stored. The view of S 0 and S 1 are perfectly simulatable with random values due to the shuffling process performed at regular intervals. It is possible in some cases to use unshuffled values in selection operations. To prevent this, the following rules are followed in the execution of the merge protocol. If there are two lists that do not have the same length, the longer list is chosen as L 1 . If the δ is greater than the length of the L 2 list, it is set to the largest odd value smaller or equal to the length of L 2 so that the unshuffled values that L 1 may have are not used in selection processes. If the length of L 2 is reduced to 1 at some point in the sorting, the δ is set to 1. Thus L 2 will have 1 element until the end of the merge and shuffling is done before each selection. Araki et al. [2016] defined the notion of privacy against malicious adversaries in the client-server setting. In this setting, the servers performing secure computation on the shares of the inputs to produce the shares of the outputs do not see the plain inputs and outputs of the clients. This notion of privacy says that a malicious party cannot break the privacy of input and output of the honest parties. This setting is very similar to our setting. In our framework, two parties exchange a seed which is used to generate common randoms between them. Two parties randomize their shares using these random values which are not known to the third party. It is very easy to add fresh shares of zero to outputs of two parties with common random values shared between them. In our algorithms, we do not state the randomization of outputs with fresh shares of zero. Thus, our framework provides privacy against a malicious party by relying on the security of a seed shared between two honest parties. Secureml: A system for scalable privacy-preserving machine learning Securenn: Efficient and private neural network training {GAZELLE}: A low latency framework for secure neural network inference Aby3: A mixed protocol framework for machine learning A framework with randomized encoding for a fast privacy preserving calculation of non-linear kernels for machine learning applications in precision medicine New primitives for actively-secure mpc over rings with applications to private machine learning Flash: fast and robust framework for privacy-preserving machine learning BLAZE: blazing fast privacy-preserving machine learning A stability-based validation procedure for differentially private machine learning Differential privacy for classifier evaluation Differentially private regression diagnostics How does knowledge of the auc constrain the set of possible ground-truth labelings? An examination of data confidentiality and disclosure issues related to publication of empirical roc curves How to simulate it -A tutorial on the simulation proof technique Universally composable security: A new paradigm for cryptographic protocols High-throughput semi-honest secure three-party computation with an honest majority How to generate and exchange secrets How to play any mental game Efficient multiparty protocols using circuit randomization Garbled circuits as randomized encodings of functions: a primer A crowdsourcing approach to developing and assessing prediction algorithms for aml prognosis