key: cord-0439583-rrmxsm9b
authors: Demirag, Didem; Ayday, Erman
title: Tracking and Controlling the Spread of a Virus in a Privacy-Preserving Way
date: 2020-03-29
journal: nan
DOI: nan
sha: ded1b304dbf83825056115e67f68d3f7f48339d5
doc_id: 439583
cord_uid: rrmxsm9b

Today, tracking and controlling the spread of a virus is a crucial need for almost all countries. Doing this early would save millions of lives and help countries keep a stable economy. The easiest way to control the spread of a virus is to immediately inform the individuals who recently had close contact with the diagnosed patients. However, to achieve this, a centralized authority (e.g., a health authority) needs detailed location information from both healthy individuals and diagnosed patients. Thus, such an approach, although beneficial to control the spread of a virus, results in serious privacy concerns, and hence privacy-preserving solutions are required to solve this problem. Previous works on this topic either (i) compromise privacy (especially privacy of diagnosed patients) to have better efficiency or (ii) provide unscalable solutions. In this work, we propose a technique based on private set intersection between physical contact histories of individuals (that are recorded using smart phones) and a centralized database (run by a health authority) that keeps the identities of the positive diagnosed patients for the disease. Proposed solution protects the location privacy of both healthy individuals and diagnosed patients and it guarantees that the identities of the diagnosed patients remain hidden from other individuals. Notably, proposed scheme allows individuals to receive warning messages indicating their previous contacts with a positive diagnosed patient. Such warning messages will help them realize the risk and isolate themselves from other people. We make sure that the warning messages are only observed by the corresponding individuals and not by the health authority. We also implement the proposed scheme and show its efficiency and scalability via simulations.

A pandemic, which typically occurs due to uncontrollable spread of a virus, is a major threat for the mankind. It may have serious consequences including people losing their lives and economical devastation for countries. To decrease the severity of such consequences, it is crucial for countries to track the spread of a virus before it becomes widespread.

Main threat during such a spread is the individuals that had close contact with the carriers of the disease (i.e., people carrying the virus before they are diagnosed with the disease or before they start showing symptoms). Thus, it is very beneficial to identify and warn individuals that were in close contact with a carrier right after the carrier is diagnosed. If a country can identify who had close contact with the already diagnosed patients, by sending warnings to its citizens and telling them to self-quarantine themselves, the spread of the virus can be easily controlled. By doing so, individuals that receive such warnings (that they had close contact with one or more diagnosed patients) can take self-measures immediately. This is also economically preferred instead of completely shutting down a country.

However, implementing such an approach is not trivial due to privacy reasons. First of all, due to patient confidentiality, identities of diagnosed patients cannot be shared with other individuals. Similarly, healthy individuals do not want to share sensitive information about themselves (e.g., their whereabouts) with the authorities of the country.

In this work, we propose a privacy-preserving technique that allows individuals receive warnings if they have been in close proximity of diagnosed patients in the past few weeks (that is determined based on the incubation period of the virus). We propose keeping the (physical) contact histories of individuals by using communication protocols in their smart phones. These contact histories are then used to determine if an individual was in close contact with a diagnosed patient in the past few weeks, and if so, the individual receives a warning. In order to do this in a privacy-preserving way, the proposed system uses private set intersection on the background as the cryptographic building block (between the local contact histories of the individuals and database keeping the identities of the diagnosed patients).

The proposed scheme guarantees that (i) identities or the whereabouts of the diagnosed patients are not revealed to any other individuals, (ii) contact histories of the individuals are not shared with any other parties, (iii) warning received by an individual (saying that they were in close proximity of a diagnosed patient) is only observed by the corresponding individual and no one else, and (iv) the individual that receives a warning can anonymously share their demographics with the healthcare officials only if they want to. Furthermore, we also propose an extension of the proposed scheme against malicious individuals that may try to temper their local contact histories in order to learn the diagnosis of some target individuals. We also implement and evaluate the proposed technique to show its efficiency and practicality.

The rest of the paper is organized as follows. In the next section, we summarize the related work. In Section 3, we provide brief background about the cryptographic building blocks we use in this paper. In Section 4, we describe the proposed solution in detail. In Section 5, we implement and evaluate the proposed scheme. In Section 6, we provide an extension of the proposed scheme and discuss its potential other uses. Finally, in Section 7, we conclude the paper.

The importance of outbreak and disease surveillance (without considering the privacy) has been studied by many previous works [2, 3] . Some countries prefer cell phone tracking-based systems (for diagnosed patients) in order to warn other individuals about the locations of diagnosed patients. However, such systems compromise privacy of individuals to track the spread of a virus [13] .

[4] provides a study of existing contact tracing mobile apps for COVID-19 in terms of their privacy considerations. Authors show that none of the existing apps and none of the existing schemes (except for private messaging systems) can protect privacy of diagnosed patients and other exposed individuals at the same time. However, private messaging systems (in which, a diagnosed patient anonymously sends messages to its previous physical contacts) lack scalability and they heavily rely on proxy servers to obfuscate the identify of the diagnosed patient.

Many apps, including [10] [11] [12] track individuals' location and save it in a local database. However, such an approach compromises location privacy of diagnosed patient since in these approaches, such information is gathered by the health authorities. In [10] [11] [12] , aggregate location paths (of diagnosed patients) are sent to the individuals and individual gets notification if they were in close proximity of a diagnosed patient. This results in another privacy vulnerability since individuals can observe the paths of diagnosed patients. Even though only the aggregate location paths are shared by the app, if the number of diagnosed patients is few in a given area, this may result in a significant privacy leak for the whereabouts of diagnosed individuals. As opposed to these approaches, in this work, we propose a scheme that protects the privacy of diagnosed individuals as well as the healthy ones.

Here, we briefly introduce the cryptographic building blocks we use in this paper.

Private set intersection cardinality (PSI-CA) aims to compute the cardinality of the intersection of two sets belonging to two parties without disclosing either set to the other party [5] . At the end of the protocol between the client and server, the client learns only the size of the intersection. The only information that is leaked to the respective parties is the upper bounds for the size of the client and server's inputs. We provide the details of the PSI-CA algorithm within the proposed protocol (in Section 4.4).

Similar to PSI-CA, authorized private set intersection (APSI) also aims to compute the intersection of two sets (belonging to two parties) in a privacy-preserving way [6] . Different from PSI-CA, APSI requires the authorization of the client input first. Thus, client input is signed by a mutually trusted authority and client also provides these signatures as the input of the algorithm. We provide the details of the APSI algorithm when we introduce an extension of the proposed algorithm against a malicious individual that may temper with their local contact history (in Section 6).

In this section, we first introduce our system and threat models and then, we describe the proposed solution in detail.

The proposed system includes healthy (or not yet diagnosed) individuals with smart phones, diagnosed patients for the disease, and a health authority (e.g., ministry of health or NIH). Individuals interact with the database of the health authority.

In the following, we will describe the proposed scheme assuming a single database for the health authority. For the sake of generality, one can also assume multiple local databases for the health authority (e.g., located in different geographical regions). The health authority keeps the identities of the diagnosed patients and it does not want this information to be learnt by other parties. Individuals keep their local (physical) contact histories in their smart phones and they do not want this information to be observed by other parties (including the health authority). Also, when an individual receives a warning about a contact with a diagnosed patient, the individual wants to make sure that no other party can observe this warning. We consider a semi-honest attacker model for the parties that involve in the protocol. That is, each party in the system follows the protocol honestly but they may be curious to learn sensitive information of the other parties. On the other hand, individuals may try to learn the identities of diagnosed patients or the health authority may try to learn the contact histories of the individuals. As we will discuss in detail later, the proposed algorithm protects the parties against these threats. Finally, we assume all communications between parties (between smart phones of two individuals or between an individual and the database of the health authority) are encrypted, and hence robust against eavesdroppers.

Each individual keeps a vector in their local smart phone for their physical contact history. When an individual A spends some amount of time within close proximity of an individual B, their phone records the ID of person B. For IDs of the individuals, we propose using the hash of their IMEI or phone numbers. To measure the proximity between the individuals, we propose using the Bluetooth signals on their devices. Thus, to add individual A as a contact, B needs to spend at least t seconds within r radius of A. This process is also illustrated in Figure 1 . As a result of this interaction, the new record in the contact history of A is the ID of B. Each individual may also separately keep the time and the duration of the contact (to have more insight about their risk, as will be discussed later).

x y z a b c

Alice's contact history Bob's contact history

To add as a contact: Bob spends at least t seconds within r radius of Alice It has been shown that the strengths of Bluetooth signals can be used to approximate the distance between two devices [8, 14] . Alternatively, one can also use (i) just the Bluetooth coverage (e.g., when two devices are in the range of each other for more than t seconds, they can update their local contact histories with each others' IDs), (ii) GPS informa-tion (when two devices are within their Bluetooth coverage, they can exchange GPS information to measure their distance more accurately and update their local contacts if they spend more than t seconds within a close range of each other), or (iii) NFC signal coverage (when the devices are within NFC signal coverage of each other, which is about 3 feet maximum, for more than t seconds, they can update their local contact histories with each others' IDs). Since this part is not the main contribution of the paper, we do not go into the details of establishing the contact histories.

It is important to make sure that an individual cannot add a contact in their contact list aiming to learn whether a target person has positive diagnosis or not. For instance, knowing the IMEI number of a target person, an attacker may construct its local contact list only from the ID of that target, and hence learn the diagnosis of the target. To prevent such an attack, we consider two options: (i) make sure the local contact histories of individuals are stored in such a way that data cannot be accessed or modified by the individuals (e.g., local contact history can be encrypted in the device by the key of the health authority or contact history can be stored in a storage for which the individual does not have read/write permission). Or, (ii) each new contact B of an individual A also includes a digital signature that is signed by a centralized authority (e.g., the telecom operator). To do so, if individuals A and B spend a certain amount of time within close proximity of each other (measured as discussed before), they both send the contact request to the operator, the operator signs and sends back the signed contact record to both parties, and each party keeps the contact records and the corresponding signature together. This way, an attacker cannot fake new contacts in its local contact history. The validity of these signatures are then verified when the local contact history of an individual is compared with the diagnosed patients in the health authority's database (we discuss this in detail in Section 6).

Using either of these techniques, the developed algorithm makes sure that contact history cannot be tempered with by the individual. Note that if the storage of the local contact list would be an overhead, it is also possible to store the local contacts of an individual at a cloud server (encrypted by individual's key) and update the local contacts periodically.

When an individual is diagnosed (e.g., by a hospital) with the disease, the ID (hash of IMEI or phone number) of the positive diagnosed patient is stored in the database of the health authority (e.g., ministry of health), as shown in Figure 2 .

It is important to note that only the hospital and the health authority know the ID of the diagnosed individual.

Alice diagnosed positive for the disease Alice's ID is stored by the database Figure 2 : Updating the database of health authority with the IDs of the diagnosed patients.

The application on an individual's smart phone sends queries to the health authority's database following a random schedule. This schedule can be determined by the system to avoid an overload to the database. It is also important not to allow the individual to send queries at any time in order to control the system's bandwidth.

An individual A uses their contact history to query the database of the health authority and the goal is to identify whether there is an intersection between the local contact history of the individual and the IDs of the diagnosed patients in the authority's database (as shown in Figure 3 ). Size of this intersection reveals the number of diagnosed people with whom A had been in close proximity in the previous a few weeks. As the size of the intersection increases, the risk of individual A being infected also increases. The proposed algorithm provides the result of this intersection to individual A as a warning message. Using the warning, the individual may take early precautions (e.g., have a test or quarantine themselves). To compute this intersection in a privacy-preserving way, we use the private set intersection cardinality (PSI-CA) protocol, in which parties that are involved in the protocol obfuscate their inputs (sensitive information) and compute the result of this intersection. Eventually, only individual A learns the result of the intersection and the health authority does not learn any information about the contact history of individual A or the result of the intersection. We also make sure that individual A does not learn anything about the database content of the health authority (e.g., IDs of diagnosed patients). We provide the details of this protocol in the following. Figure 4 illustrates the details of the proposed PSI-CA based protocol between an individual (client) and the health authority (server). As input to the protocol, client has its local contact list and server has the list of positive diagnosed patients (steps c.1 and s.1 in the figure).

Client masks its input with the random exponent R c and obtains the list of a i -s and computes X = g R c (X is similar to an ElGamal public key). Client sends the list of a i values and X to the server (step c.2 in the figure).

Server permutes its input list and applies H(.) on the list (step s.2 in the figure). Server masks a i values with its random exponent R s , shuffles the resulting list, and computes Y = g R s , which is a public-key like value (step s.3 in the figure). Server creates the list of ts j -s by applying the one-way function H(.) over the multiplication of X R s and exponentiation of hs j -s to random value R s (step s.4 in the figure). Server sends shuffled and masked a i values, Y , and ts j -s to the client.

As the last step, client does the matching between the list of ts j -s that it received from the server and its own list of Figure 4 : Details of the PSI-CA based protocol between an individual (client) and the health authority (server).

Once an individual receives a warning as a result of the proposed algorithm, they can choose to (i) provide (anonymous) information back to the health authority to help the authority to track the spread and/or (ii) share their local contact history with the health authority to get further information about their risk.

In (i), the individual shares their demographics, the size of intersection they obtain as a result of the proposed algorithm, and their location with the health authority. Using such information received from different individuals, the health authority can have a clear idea about how the virus spreads in the population. In (ii), the health authority, using the local contact history of the individual and further details about the contacts of individual (duration and location of each contact), can provide a more detailed risk information to the individual (e.g., if the duration of the contact with a diagnosed patient is long, then the risk of individual also increases). As discussed before, such contact details (i.e., duration or location) are not used in the proposed privacy-preserving algorithm; they can be collected and kept by the individual and may be shared with the health authority to get more insight about the risk.

We implemented and evaluated the proposed privacypreserving search algorithm. We ran our experiments on ma-cOS High Sierra, 2.3 GHz Intel Core i5, 8GB RAM, and 256GB hard disk. We used MD5 as the hash function to hash the IMEI numbers of individuals in the local contact lists and in the health authority's database. We used the implementation of PSI-CA in [5] , in which q and p are 160 and 1024 bits, respectively. We ran each experiment for 20 times and reported the average.

We show the results of the evaluation in Tables 1 and 2. In Table 1 , we set the size of client's local contact list to 1, 000 and vary the size of server's database. In Table 2 , we set the size of server's database to 100, 000 and vary the size of client's local contact list. Our results show that the online phase of the protocol can be efficiently completed by the parties even when the input sizes of both party are significantly large. Note that the server does not need to run the offline part of the algorithm for each client separately. Instead, the server can use the same offline computation during its interaction with every client. Also, a client can conduct its offline steps as it generates its local contact list.

Security and correctness of the proposed algorithm depends on the security and correctness of the original PSI-CA algorithm. Thus, we refer to [5] for the security and correctness of the proposed algorithm.

In this section, we first discuss an extension of the proposed algorithm, in which we prevent a malicious individual from modifying their local contact history. Then, we discuss about potential additional features of the proposed technique.

As discussed, a malicious individual may temper with their local contact history to learn the diagnosis of some target individuals. To prevent this, one option is to record each contact along with a corresponding digital signature from a centralized authority, such as the telecom operator (as discussed in Section 4.2). Here, we describe how such signatures can be used when computing the intersection between the individual and the health authority's database. For this, we propose using the authorized private set intersection (APSI) protocol, in which the entries in the local contact history of an individual are digitally signed by a centralized authority and the validity of these signatures are verified by the health authority during the protocol. We discuss the details of this APSI-based protocol in the following. Figure 5 illustrates the details of APSI-based protocol between an individual (client) and the health authority (server) in the semi-honest setting. First, a common input (N, e, g, H, H ) is determined for the protocol. N = pq is the RSA modulus, where p and q are safe primes. e is the public exponent. g is a random element in Z * N . Also, H and H are the hash functions, modeled as random oracles. All computations are done in mod N. Both server and the client have the same input sets as in PSI-CA protocol. In addition to these sets, client also has a list of RSA signatures (σ i )-s, where σ i = H(c i ) d mod N (steps c.1 and s.1 in the figure).

As an offline step, server permutes its input list and masks its input. For this, the server first applies the function H(.) over the list of its input elements and exponentiates each element with randomness 2R s . Then, hash function H is applied to the list to obtain ts j values (step s.2 in the figure) . These values are then sent to the client.

As an online step, client masks σ i -s by multiplying them with g R c:i , where R c:i is the randomness (step c.2 in the figure). Client sends the resulting list that contains a i values to the server.

At server's online step, server computes Y = g 2eR s . Server exponentiates a i values with 2eR s and obtains the list of a i -s (step s.3 in the figure). Server sends Y and a i -s to the client.

At the last step, client obtains the tc i values by applying the function H (.) over the product of a i and Y −R c:i . In order to get the size of the intersection, client finds the matches between the lists of ts j -s and tc i -s (step c.3 in the figure). At step c.4, a notification is generated based on the output of APSI. Based on the evaluation results reported in [1] , when the database size of the health authority is around 3 million and the local contact history of the individual is 6: (i) the offline part of the protocol takes around 200 minutes for the health authority (and in our protocol, this part can be done once for all the individuals) and offline time for an individual takes negligible time, (ii) online part of the protocol takes around 2 milliseconds both for the health authority and the individual, (iii) computation cost of the online part scales linearly with the size of the individual's local contact history, and (iv) communication costs for the individual and the health authority are around 750B and 4GB, respectively. It is worth noting that the communication cost of the individual scales linearly with the size of the individual's local contact history and the health authority can send the same message to all individuals (it does not need to send a separate 4GB data to each individual, and hence the communication cost of the health authority can be optimized).

The proposed technique can also be used to provide real-time whereabouts of diagnosed individuals. Having this informa-tion, healthy individuals would know which locations to stay away at any given time. In fact, such an approach has been used by some countries during the recent Coronavirus disease pandemic [9] . However, we believe such a usage of the system may cause social chaos and it may also result in deanonymization of diagnosed patients' identities (even if the information is shared in a differentially-private way [7] ). Therefore, we prefer not to include this functionality in the proposed system.

In this paper, we have proposed a privacy-preserving technique to control the spread of a virus in a population. The proposed technique is based on private set intersection between physical contact histories of individuals (that are recorded using smart phones) and a centralized database (run by a health authority) that keeps the identities of the positive diagnosed patients for the disease. We have shown that individuals can receive warning messages indicating their previous contacts with a positive diagnosed patient as a result of the proposed technique. While doing so, neither of the parties that involve in the protocol obtain any sensitive information about each other. We believe that the proposed scheme can efficiently help countries control the spread of a virus in a privacy-preserving way, Figure 5 : Details of the APSI based protocol between an individual (client) and the health authority (server).

without violating privacy of their citizens.

Countering GATTACA: Efficient and secure testing of fully-sequenced human genomes (full version)

Big brother is watching-using digital disease surveillance tools for near real-time forecasting

Google trends: a web-based tool for real-time surveillance of disease outbreaks

Contact tracing mobile apps for COVID-19: Privacy considerations and related trade-offs

Fast and private computation of cardinality of set intersection and union

Practical private set intersection protocols with linear computational and bandwidth complexity

Differential privacy

Distance estimation of smart device using Bluetooth

Governments around the world are increasingly using location data to manage the Coronavirus

Application -Fighting the Corona Virus

Apps gone rogue: Maintaining personal privacy in an epidemic

Privacy-preserving contagious disease tracking

Cellphone tracking could help stem the spread of Coronavirus. Is privacy the price

Position measurement using Bluetooth