key: cord-0577127-6sxwmrav authors: Fitzsimons, Jack K.; Mantri, Atul; Pisarczyk, Robert; Rainforth, Tom; Zhao, Zhikuan title: A note on blind contact tracing at scale with applications to the COVID-19 pandemic date: 2020-04-10 journal: nan DOI: nan sha: ec6f213d8901bc2e0da73d1421129a84c72503dc doc_id: 577127 cord_uid: 6sxwmrav The current COVID-19 pandemic highlights the utility of contact tracing, when combined with case isolation and social distancing, as an important tool for mitigating the spread of a disease [1]. Contact tracing provides a mechanism of identifying individuals with a high likelihood of previous exposure to a contagious disease, allowing additional precautions to be put in place to prevent continued transmission. Here we consider a cryptographic approach to contact tracing based on secure two-party computation (2PC). We begin by considering the problem of comparing a set of location histories held by two parties to determine whether they have come within some threshold distance while at the same time maintaining the privacy of the location histories. We propose a solution to this problem using pre-shared keys, adapted from an equality testing protocol due to Ishai et al [2]. We discuss how this protocol can be used to maintain privacy within practical contact tracing scenarios, including both app-based approaches and approaches which leverage location history held by telecoms and internet service providers. We examine the efficiency of this approach and show that existing infrastructure is sufficient to support anonymised contact tracing at a national level. Coronavirus disease 2019 (COVID- 19) has spread rapidly around the globe, with many governments and health organisations battling to slow or contain the spread of the disease [3, 4] . In particular, geofencing and contact tracing have been used extensively to try to contain the virus and to identify those potentially at risk due to direct contact with confirmed cases. The aim of contact tracing is two-fold: to help and manage people who may have been exposed to infectious disease, and to stop the chain of transmission in order to control the outbreak. The prevalence of mobile telephones, and in particular smartphones, provide a potentially powerful tool for contact tracing efforts: the vast majority of users carry their devices with them throughout the day, and the phones themselves are capable of generating detailed location histories, either via GPS or signal strength . However, any possible implementation using them has to tackle several issues. Firstly, there is an issue of privacy. With the availability of users' location, there is a growing concern that private data can be used by an unauthorized entity in ways that infringe the individual right to privacy, for example by profiling user's behavior, targeting the user by stalking and spamming or drawing inferences about user's sensitive data. Unfortunately, there is an ongoing battle between personal privacy, the approaches used by governments, and the effectiveness of the approaches applied. For example, in South Korea and Israel, there has been a large public push back against the sharing of individuals' intimate information and the use of mass surveillance approaches respectively [5] . Secondly, any scheme that is used has to be both scalable and effective. For example, schemes that rely on a smartphone application that has to be installed by users on their smartphones can only be effective if a significant proportion of the population has a smartphone, takes an active step to install the app, and allows it to run continuously. Further, in many areas, a minority of people have access to smartphones but may carry other GSM mobile devices that can be a source of information. For example in 2018 India had a smartphone penetration rate of 26% in comparison to 88% subscription rate to wireless phones [6] . Even in the countries which do not have this problem, there can be important groups in the society, such as older people, who have smaller smartphone ownership rates; this is critically important for COVID-19 as older people are in a high risk group [7, 8] . Such solutions can also struggle to get a large enough proportion of smartphone owners to use the app. Furthermore, if they rely on a single source of data such as Bluetooth, they might miss many other ones that are more widespread such as GPS or GSM triangulation. In this work we offer a practical framework for identifying whether two parties have come in close direct contact based on their absolute locations without disclosing either parties' location history. To achieve this, we have developed a simple framework which utilizes techniques from secure twoparty computation (2PC), whereby one party holds a database of private location data within a virtual private cloud or firewall and the other party holds a database of locations of confirmed COVID-19 patient locations for the same time period. The first database might be coming, for example, from a telecommunications service provider while the second is held by the government (see Figure 1 ). The goal is to identify if any of the locations in the former database lie within a threshold distance of locations in the latter database without revealing either database directly. In the literature this problem is known under the name of private proximity testing [9] The approach taken in this work is to first reduce the problem of private proximity testing into a blind contact tracing by an appropriate tessellation of the surface of the Earth [9, 10] . We then blindly compute a number of matches between each row of the two data sets over a fixed period of time (e.g. two weeks). In this way, we also hide the exact timestamp of the matching. We achieve this by modifying the perfectly secure protocol for equality testing presented in [2] . We offer three use cases of how the protocol may be used in practical application to enable collaboration between government and individuals or IT/telecom companies. Finally, we outline the implementation details for such an approach, showing that the resource intensity of such protocols is easily within capacity of modern data centers. Many institutions and governments have already introduced contact tracing smartphone applications or are currently working on them. Singapore has released TraceTogether [11], Poland published code for a prototype of their app [12] , British NHS and Germany's Fraunhofer Institute for Telecommunications are in the completion phases of their solution, all of which are based on the Bluetooth technology [13] . Chan et al. [14] present a thorough review of contact tracing smartphone apps based on Bluetooth and presents a new third-party free protocol. The authors specifically do not consider approaches based on absolute-location, such as ours, due to a cryptographic difficulty of protecting infected patients' locations. However, we believe our approach offers a simple and efficient solution that achieves that objective. The solution closely related to ours has been proposed by MIT [15] . Their smartphone-Private Kit: Safe Paths-builds location trail from GPS data of users. In its current iteration, data of infected patients is shared with the government who redacts them. This is a similar setup to ours however we stress that our solution offers perfect security against active adversaries. In its third iteration, Private Kit: Safe Paths will not rely on sharing information with the government but will instead perform a multi-party computation based on location trails of users. We do not know the proposed protocol, and therefore do not comment on its security, however both iterations require high participation rates for the solution to be effective. Upon completion of this work, we became aware of the independent work by Berke et al [16] , which presents an approach for contact tracing problem via the private-set intersection problem. As far as we know, this is the only result that achieves the same functionality (c.f. Definition 2.1 below) in the context of contact tracing. In the setting of computational security, the authors introduce a hash-based protocol with only two parties: the sender (server) and the receiver (client) without requiring the trusted dealer. They also provide an adaption of a Diffie-Hellman protocol as an example. On the other hand, we take the information-theoretic approach with the added assumption of trusted randomness dealer. Other cryptographic articles that are mostly related to ours concern private proximity testing. In particular, Narayanan et al. [9] did a reduction of private proximity testing protocols to private equality testing. However it outputs time and position to the receiver, which we try to avoid in our setting.Šeděnka and Gasti [17] have studied proximity testing on Earth based on distance computation. However their functionality necessarily has a security weakness that allows for triangulation. In this article, we are interested in the secure computation for two-parties with a trusted dealer. The task of a trusted dealer is to instantiate a randomness distribution phase that will allow the two parties (sender and receiver) to securely compute a joint function. To aid with exposition, we will use a running example with a receiver called Alice and a sender called Bob. Given the receiver and sender locations, denoted as {p at } and {p bt } respectively, the receiver learns how many times she was in proximity to the sender i.e. how many times D(p at , p bt ) < ∆, where D is a distance measure, ∆ is the threshold, and we assume that the locations are provided at discrete time points. The sender receives no output and additionally, we require both the parties not to learn anything about the other party's location or database. This is a modified blind proximity testing problem. However, we can reduce it to a blind contact tracing problem by tessellating the surface of the Earth (the choice of the grid for which is described in Section 4.1). After doing this, the locations of Alice and Bob {x t }, {y t } are discrete and correspond to the unit cells in the grid. This reduction allows us to consider the joint function as equality on the input of the sender and receiver. The goal is to check for all t if x t = y t and output the total number of matches, N , to the receiver without leaking anything about the t. However, when the receiver's input matches the sender's input for all the t then receiver can trivially guess the sender's location. This functionality is inspired by the privacy-preserving equality testing where output y t is leaked to the receiver when x t = y t . Note that both the functionalities of equality testing and contact tracing achieve the same task of privacy-preserving equality with the same security against the malicious sender but different security requirements when receiver is adversarial. The ideal functionality for blind contact tracing can be defined in the following way. Definition 2.1. Ideal functionality for blind contact tracing. The ideal functionality for blind contact tracing S bct ({x t }, {y y }) for a receiver with input {x t } and sender with input {y t } is given by the following procedure: 3. Return N to the receiver and nothing to the sender. This provides correctness and perfect security against dishonest parties (sender or receiver). Our concrete protocol for the blind contact tracing is adapted from the equality testing protocol of Ishai et al. [2] . The blind contact tracing protocol is divided into two phases: key generation and computation. The first stage consists of three parties -trusted dealer, sender, receiver and proceeds in the following way. The sender and receiver requests correlated randomness from the trusted dealer and sends a pre-agreed (between sender and receiver) input t. The trusted dealer samples a tuple of permutations (P t , Q) where P t is 2-wise independent permutation chosen from a family of permutation and Q is uniformly independent, chosen from S n . It also generates uniformly random string r t . The trusted dealer sends the set ({P t }, Q) to sender and ({r t }, {s Q(t) }) to receiver where s Q(t) = P t (r t ). Upon receiving the randomness in the key generation phase, the sender and receiver proceed with the computation phase. It consists of one round of communication between the sender and the receiver. As a first step: receiver performs a one-time pad of their private input {x t } with the random strings {r t } to obtain u t where u t = x t + r t . The receiver sends {u t } to the sender. In the next step, the sender performs one-time pad of their private input {y t } with the string obtained from the receiver in the first step (u t ) and permutes it using the permutation P t . The sender sends v Q(t) where v Q(t) = P t (u t − y t ) to the receiver. As the final step, the receiver checks if s Q(t) is equivalent to the string v Q(t) received from sender for every t. The receiver calculates the total number of matches N . We describe the blind contact tracing in Protocol 1. Receiver's input: private location data {x t ∈ X|t ∈ [n]} Sender's input: private location data {y t ∈ X|t ∈ [n]} Receiver's output: N (total number of matches) 1. Key generation (trusted dealer, sender, receiver): 1.1. Sample random 2-wise independent permutations P t : X → X and uniformly independent permutation Q ∈ S n where S n is the nth symmetric group and random strings r t ∈ X. 1.2. The sender gets ({P t }, Q) and the receiver gets ({r t }, {s Q(t) }), where s Q(t) = P t (r t ). 2. Computation (sender, receiver): 2.1. The receiver computes u t = x t + r t and sends the collection {u t } to sender. 2.2. The sender computes v Q(t) = P t (u t − y t ) and sends the permuted collection {v Q(t) } to the receiver. 2.3. The receiver then compares s Q(t) to v Q(t) and outputs N , where N := Q(t) w Q(t) and w Q(t) = 1 if s Q(t) = v Q(t) otherwise w Q(t) = 0. We now present an informal security argument, with more formal demonstrations presented in Appendix A. The correctness follows from the correctness of the one-time pad and permutation (2-wise and uniform). For the privacy of the receiver's input against a dishonest sender we argue in the following way. Note that the sender receives ({P t }, Q) from trusted dealer during the key generation phase and {x t + r t } from the receiver in the computation phase. The security of {x t } reduces to the one-time pad, assuming the trusted dealer doesn't collude with the sender and generates uniformly random bits. The privacy of the sender's input against a dishonest receiver is the following. In a single run of protocol, the receiver obtains ({r t }, {s Q(t) }) from the trusted dealer and {v Q(t) = P t (u t − y t )} from the sender. The sender's input privacy follows from the security guarantee of 2-wise random permutation and under the same assumption of no-collusion between the trusted dealer and the receiver. The receiver checks if v Q(t) = s Q(t) , but it doesn't get any information about t since Q is a random permutation over t. It is not difficult to verify that the blind contact tracing protocol is perfectly secure against active adversaries and is optimal in terms of communication complexity ( as shown in [2] . We also note that information theoretic security in two-party computation is not possible in the plain model unless certain assumptions such as trusted agents or shared secrets are made. Thus, our assumption of the trusted dealer can be justified in the sense that it allows us to achieve a strong security notion (information-theoretic) while preserving the efficiency of the protocol. Since the trusted dealer is never involved in the computation phase, as long as it do not collude with either of the parties, the protocol remain secure against any other attack by the trusted dealer. Up until now, we have presented our protocol in terms of an arbitrary receiver and sender. However, in the implementation and depending on the objectives one needs to specify who the receiver is. Here we consider three cases specific to COVID-19. 1. An individual is the receiver and the government is the sender. The privacy of the individual, in that case, is guaranteed straightforwardly by the security of blind contact tracing protocol. In practice, this requires direct communication between the individual and the government while the individuals would have to possess the database with their location trails. 2. Service Provider (IT/Telecom company) is the receiver, the governmental body is the sender. This is a scenario under which the service provider is the receiver and it notifies the users whether they were in contact with an infected person or not. In practice, this is easier to implement than the previous scenario. Again, blind contact tracing protocol offers perfect security in that case under the assumption the service provider is not colluding with the government, which is in any case, the assumption whenever any such service is used. 3. Government is the receiver. The scenario under which the government is the receiver might be particularly useful if the government wants to learn about the statistics of people who were in the proximity of infected patients. If it wishes so, the government could then either directly or via service provider inform the individual of their risk, without knowing their identity. To determine the feasibility of our approach to applications, such us contract tracing for COVID-19, we implement the blind contact tracing protocol using severless infrastructure available on cloud computing platforms. The first crucial elements to consider in the implementation is the input data and the grid construction that we apply to it. The second one is the resource intensiveness. We discuss both in more detail in the following sections. The raw input to the protocol may come in a variety of forms such as GPS location information, GSM triangulation data, Bluetooth beacons, and router mac addresses amongst others. In many cases, we can expect the input to be a continuous value such as longitude and latitude. To make this data fit the paradigm of the blind contact tracing, we need to map locations to some discrete tiling over the surface of the earth. There are a few ways to do this such as constructing a square grid or hexagonal patterns, which are more efficient in terms of search queries, as seen in [9] . Both approaches satisfy the protocols correctness. The second consideration is that the Earth is not flat. To avoid error we must scale the mapping of location to tiles depending on the location on earth we are considering. In [17] , the Haversine approximation as outlined which approximates Earth as a sphere and has an approximation error rate of < 0.1%. We used an alternative approach, recommended by the FCC [18] , which uses an ellipsoid approximation to earth and has significantly less error over distance under a few hundred kilometers. To scale GPS coordinates to kilometers under this approximation we calculate two constants which we multiply the latitude and longitude by An important property of the proposed solution is its ability to scale to deal with contact tracing at a national level. To this end, it is trivial to show how the solution scales at a constant rate for the data. The approach is easily run in parallel across containers to meet the demand required. To get a ballpark estimate of the demand let us consider the scenario for a small, medium and large state based on estimated figures taken from Ireland, Poland and Japan roughly at the time of publication. We have also included similar comparisons at a synthetic city level of abstraction. We search a 3 × 3 overlapping grid of 7 × 7 meters each about a location to look for a match to avoid edge effects. We also check a time window of 40 minutes around the timestamp associated with a location. Time is quantized into 20 minute periods and we search for paths crossed in a single day for exemplary purposes. Of course, if we were to ignore edge effects we could reduce the over head by a small factor or if we wished to increase accuracy we could sample a finer resolution of accuracy about each point. This trade off is left to the users of the protocol. In this work we endeavour to overcome the potential challenges faced by computational security approaches in the context of contact tracing. The proposed protocol offers an information theoretic security and is optimal in terms of communication complexity. Furthermore, our solution has advantages in terms of effectiveness. It can use different sources of geolocation data such as GPS, cell-towers triangulation, Bluetooth or WiFi routers, and is not restricted to a single mode. Importantly it can be used on historic data collected by different means. It can be readily implemented on existing infrastructure by governmental institutions and third parties that have access to geolocation data including telecommunication service providers or IT enterprises. As such, it does not require installation of a separate smartphone application. By using cloud computing infrastructure, we have developed a scalable key generation service to facilitate these efforts. Access to the service, along with a lightweight library in Python to leverage service, is available to the wider community upon request 1 . receiver's input privacy when the sender is cheating and b) the sender's input privacy when the receiver is cheating. It is important to note that we assume that a trusted dealer behaves honestly and does not collaborate with any malicious party. Definition A.1 (Correctness). We say that a blind contact tracing protocol P x,y (where both the parties sender and receiver are honest) is correct if for all inputs (x, y) the parties output from running the protocol P x,y is exactly equal to the output of parties in ideal blind contact tracing functionality S bct (x, y). Theorem A.2 (Correctness). The blind contact tracing protocol shown in Protocol 1 is perfectly correct. That is for every random 2-wise permutation P t : X → X, uniformly random permutation Q ∈ S n , every message r t ∈ X, s Q(t) (with s Q(t) = P t (r(t))) and every input {x t ∈ X|t ∈ [n]}, {y t ∈ X|t ∈ [n]} it holds that receiver obtains N : On the other hand, the sender obtains no output. Proof. The correctness follows straightforwardly from the property of 2-wise permutation P t and uniformly random permutation Q as well as the correctness of one-time pad. The output N is incremented on the receiver side if and only if x t − y t is nonzero. To see this, let's consider v Q(t) − s Q(t) = P t (u t − y t ) − P t (r t ) = P t (u t − y t − r t ) As we know, the receiver computes u t = x t + r t , therefore Eq. 1 can be equivalently written as Therefore, v Q(t) = s Q(t) if and only if x t = y t . Thus, N := Q(t) w Q(t) = Q(t) z Q(t) . Definition A.3 (Privacy against malicious party). We say that a blind contact tracing protocol P(x, y) is: 1. secure against dishonest receiver if it does not leak anything about sender's input y except N (total number of i's where x i = y i ). 2. secure against dishonest sender if it does not leak anything about receiver's input x. Theorem A.4 (Privacy against malicious party). The blind contact tracing protocol shown in Protocol 1 is perfectly secure against a malicious sender or a malicious receiver (assuming the trusted dealer doesn't collude with any adversarial party). Proof. The security proof of Protocol 1 is only a slight modification of the one presented in [2] and uses a simulation proof technique [19] . We shall construct a simulator that creates a view for the adversary's which is indistinguishable from the real execution of the protocol. The simulator holds all the information from the key generation part i.e. ({P t }, Q, {r t }, {s Q(t) }). Now consider the two cases of malicious sender or malicious receiver. 1. Malicious sender: The simulator generates randomly u t and send it to the adversary (the sender). The simulator receives v Q(t) from the adversary and computes y t = u t − P −1 t (v Q(t) ). It sends y t to the ideal blind contact tracing functionality S bct . 2. Malicious receiver -The simulator calculates x t = u t − r t . and inputs it to the ideal blind contact tracing functionality S bct . If the ideal functionality outputs 1, the simulator sends s Q(t) to the adversary. If it outputs 0, the simulator chooses randomly v Q(t) such that v Q(t) = s Q(t) and sends it to the adversary. Quantifying sars-cov-2 transmission suggests epidemic control with digital contact tracing On the power of correlated randomness in secure computation Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand Report 12 -the global impact of covid-19 and strategies for mitigation and suppression Coronavirus privacy: Are South Korea's alerts too revealing? Digital india: Technology to transform a connected nation Are smartphones ubiquitous?: An in-depth survey of smartphone adoption by seniors Statement -older people are at highest risk from COVID-19, but all must act to prevent community spread Location privacy via private proximity testing Sky and telescope. Virtues of the Haversine Tracesecure: Towards privacy preserving contact tracing Sudheesh Singanamalla, Jacob Sunshine, and Stefano Tessaro. Pact: Privacy sensitive protocols and mechanisms for mobile contact tracing Apps gone rogue: Maintaining persoanl privacy in an epidemic Assessing disease exposure risk with location data: A proposal for cryptographic preservation of privacy Privacy-preserving distance computation and proximity testing on earth, done right Coast Guard, and US Coast Guard. The Code of Federal Regulations of the United States of America How to simulate it-a tutorial on the simulation proof technique A Security proofs We require a blind contact tracing protocol to satisfy two properties: correctness and input privacy. The former deals with the setting when all the parties (sender, receiver, dealer) follow the steps of the protocol as described in Protocol 1. For the latter We would like to acknowledge Oracle Cloud, Amazon Web Services and Strategic-Blue for their support. All authors would like to thank Joe Fitzsimons for his insights and discussions. A.M gratefully acknowledges funding from the AFOSR MURI project "Scalable Certification of Quantum Computing Devices and Networks". R.P. thanks EPSRC (UK). A.M would also like to thank Ashima Arora for some helpful discussion regarding implementation.