key: cord-0553556-sui17lsf authors: Yu, Ssu-Hsin title: PrivyTRAC: Privacy and Security Preserving Contact Tracing System date: 2020-06-15 journal: nan DOI: nan sha: 0ee55dee595b8d84878146ee9ca25389769fcfb5 doc_id: 553556 cord_uid: sui17lsf Smartphone location-based methods have been proposed and implemented as an effective alternative to traditional labor intensive contact tracing methods. However, there are serious privacy and security concerns that may impede wide-spread adoption in many societies. Furthermore, these methods rely solely on proximity to patients, based on Bluetooth or GPS signal for example, ignoring lingering effects of virus, including COVID-19, present in the environment. This results in inaccurate risk assessment and incomplete contact tracing. A new system concept, called PrivyTRAC, preserves user privacy, increases security and improves accuracy of smartphone contact tracing. PrivyTRAC enhances users' and patients' privacy by letting users conduct self-evaluation based on the risk maps download to their smartphones. No user information is transmitted to external locations or devices, and no personally identifiable patient information is embedded in the risk maps as they are processed anonymized and aggregated locations of confirmed patients. The risk maps consider both spatial proximity and temporal effects to improve the accuracy of the infection risk estimation. Experiments conducted in the paper illustrate improvement of PrivyTRAC over proximity based methods in terms of true and false positives. An approach to further improve infection risk estimation by incorporating both positive and negative local test results from contacts of confirmed cases is also described. As severe travel restrictions due to COVID-19 are being gradually relaxed in order to minimize further damage to the economy, it is expected that there will be continued cases of infection and pockets of community transmission until vaccines become widely available. To prevent sporadic cases from becoming sources of another outbreak, rigorous contact tracing is essential. Compared to the traditional approach of interviewing patients, the smartphone location-based approach has proven to be an effective alternative to accomplish comprehensive contact tracing with much less labor demands. However, there are serious privacy and security concerns associated with the smartphone-based methods that may impede adoption and wide-spread use in the many societies [1] . Effectiveness of the smartphone contact tracing approach can be significantly improved with active cooperation from the public by alleviating the security and privacy concerns. We propose a privacy-preserving, secure and accurate COVID-19 contact tracing systems called PrivyTRAC: Privacy and Security Preserving Contact Tracing System. PrivyTRAC 1. is a smartphone location-based contact tracing system with security and privacy inherent in the tracing mechanism that protects the privacy of both the patients and the public, and 2. infers from confirmed cases COVID-19 infection risk at given places and time that allows individuals to self-assess exposure risk from their movement histories. The proposed system, illustrated in Fig. 1 , is built on two innovations (see Section 3 for details). One is the innovative mechanism that utilizes the infection risk maps (Step 1 in Fig. 1 ), which are processed from anonymized, aggregated patient locations information, that individual users can download to a smartphone App (Step 2). The users can then evaluate locally their own risks of contracting COVID-19 due to contacts (Step 3). This mechanism does not require users to upload their personally identifiable information (PII) to an external platform, nor do the users need to broadcast their PII to other smartphones in vicinity. The second innovation is the estimation of the spatio-temporal infection risk maps that allow users to self-assess their own risks. The risk maps consider not only the spatial proximity to COVID-19 patients, as in most contact tracing approaches, but also the temporal effect of how long the virus can stay virulent in the environment. We have learned that the novel coronavirus can remain contagious on surface for an extended period of time [2, 3] and virus-containing droplets can travel significant distances [4, 5] . The risk-based approach that considers not only instantaneous proximity to the patients but also the lagging effects after a patient has left thus offers a more accurate risk assessment of contracting the virus. Furthermore, as researchers continue to learn the factors that influence the virus spread, new findings can be quickly incorporated into our risk estimation model to refine the risk maps. Many smartphone location-based contact tracing approaches have been proposed or implemented recently [2, 5, 6, 7, 8, 9, 10] . They generally fall into two categories. One is to aggregate all smartphone movements of a population, whether a person is ill or not, in a centralized repository to identify encounters with the confirmed COVID-19 cases. The other category of methods is to use a phone's Bluetooth radio to record encounters of all other phones in proximity [3] and later alert the user when the unique cell phone ID of a confirmed case matches an ID in the user's phone. These two approaches suffer two main drawbacks -the first is privacy and security, and the second is accuracy. Their mechanisms require exposing individuals' locations and unique IDs, either in a centralized, externally maintained repository or to all other phones in vicinity. Privacy concerns may cause people to be less willing to adopt the tools, and hence render the tools less effective. Moreover, whether individuals' movement data and their encounters with other people are stored centrally or locally, there is always inherent risk that sophisticated and determined hackers can exploit software weaknesses to acquire personally identifiable information (PII) of individuals and their encounters, similar to the actions taken by some data brokers and aggregators for advertising purposes. In fact, Cybersecurity and Infrastructure Security Agency (CISA) has issued warning of Advanced Persistent Threat (APT) actors exploiting the COVID-19 pandemic to collect bulk personal information [4] . Besides privacy and security concerns, those contact-tracing tools offer an incomplete measure for determining the infection risk. As we have learned, novel coronavirus can survive on surface for an extended period of time and its droplets can travel for extended distances, depending on the surface materials and other environmental factors. A person does not need to be in the immediate vicinity of a patient to be exposed to the virus through indirect contacts. Hence, determining a person's risk of contracting the virus based solely on direct proximity to the infected does not provide an accurate risk assessment. PrivyTRAC Approach Our contact-tracing approach PrivyTRAC is illustrated in Fig. 1 . The system consists of two main components -one residing on individual users' smartphones and the other on the server. On the user side, the smartphone App regularly (or as requested by the users) downloads up-to-date spatio-temporal exposure risk maps from the server. The risk maps quantify the likelihoods of infection at particular locations and time. Based on the maps, the App would then cross-check with the user's smartphone locations data. Using the risk maps and the locations data, the App computes the aggregated risk of contracting the virus. If the person's risk exceeds a certain threshold, the App would notify the person and suggest follow-on actions, such as the testing sites for confirmation. Under this process, a user's private locations history and PII never leave his/her own phone nor are they being recorded by other phones. Patients' privacy is preserved as well, since the patients remain anonymous and the downloaded risk maps contain only processed and aggregated locations information from many patients, making it extremely challenging to extract PII. Furthermore, the ability for users to maintain control of their private data and decide when the service is activated will significantly encourage adoption by the public. On the server side, the spatio-temporal risk maps are computed and continually updated as new cases are reported. The movement histories of confirmed cases are collected by public health agencies. The sever software computes the infection risks at particular locations and time, based on the aggregated locations of the patients from the public health agencies and factors affecting the virus' persistent virulence. Those factors include distance from an infected person, durations of the virus' survival on various surface materials, length of exposure, and environmental conditions such as temperature and sunlight. By incorporating the disease vector and environment's effects on virulence rather than merely considering direct encounters based on smartphone proximity, the resulting infection risk estimation is more comprehensive and accurate. To evaluate the risk of contracting the disease from an infected person, we assume that the probability of infection decreases exponentially with distance and time. For the risk analysis, we consider a spatio-temporal grid consists of 1 meter by 1 meter by 1 second cells. If a person with the disease stays in a 1 meter by 1 meter area centered at location (xp, yp) for 1 second around time , a person in an area of the same size around location (x, y) for 1 second around time t is assumed to have the probability of contracting the disease ( = 1) as follows: If there are a total of N such spatio-temporal cells ( , , ) with a patient present, regardless of whether they are occupied by the same patient or not, a person in the 1 m. by 1 m. by 1 sec. cell ( , , ) has the probability of being infected as follows: Note that the cells ( , , ) are not necessarily due to the same patient. Hence, we can easily aggregate multiple patients in the same risk map. If a person is in the area for a certain duration, we consider each contact within one second as an independent event. Hence, the person's overall probability of contracting the disease is where s represents a sequence of M such 1 m by 1 m by 1 sec spatial-temporal cells ( , , ) that the person of interest occupies in the vicinity of infected people. Based on the above analysis, an area risk map can be created from Eq. (2). By using aggregated locations and time of people with the disease ( , , ), the risk of contracting the disease � = 1� , , , , , , = 1, … , � at any location and time ( , , ) can be computed. When a user downloads the spatio-temporal risk map into their smartphone App, their own individual risk ( = 1| ) can be computed by the App according to the location history s on the smartphone using Eq. (3). The App can apply a risk-based metric, for example, to advise whether a person should seek further medical help based on their probability of contracting COVID-19 due to the contacts. To illustrate the varying risk of contracting the disease depending on the distance from an infected person and the elapsed time since the person's presence, we plot in Fig. 2 Fig. 2 . The figure shows that despite the patient not being present (or in proximity) after the initial 1 second, the risk of contracting the disease is still present, albeit small as elapsed time increases. Hence, to fully account for the risk of contracting the disease for contact tracing purposes, it's essential not only to consider the immediate proximity to the patient but also the lingering effects of the virus in time. To illustrate the difference between the risk-based and the proximity-based contact tracing approaches, we conducted simulated experiments of the two methods. In the experiments, we consider an area of 100 meters by 100 meters (Fig. 3) For comparison purposes, we also implemented a spatial proximity-based metric. That is, a person is advised to seek testing if the person is within a certain distance from the patient at any time. We plot in Fig. 4 the probability of correctly identifying a person that actually contracted the disease (true positive) versus the probability of falsely advising a healthy person to seek medical help (false positive) by varying the risk threshold for the riskbased approach (blue lines) and the distance threshold for the proximity-based approach (red lines). An ideal system would have true positive probably 1 at 0 false positive probability, i.e. the upper left corner in the plots. The 4 plots in Fig. 4 from left to right show the different choices in Eq. (1) for = 10, 50, 100, 150 seconds respectively. If the ability for the virus to infect diminishes quickly over time (e.g. = 10 sec), the difference between risk-based and proximity-based contact tracing is small Fig. 4(a) . On the other hand, as the decay time increases ( = 50, 100 sec), the risk-based approach performs significantly better than the proximity-based approach Fig. 4(b) (c). It becomes apparent in the cases where the virus' ability to infect diminishes slowly that the proximity-based approach cannot fully identify the infected people without incurring unacceptable false positives. The reason the risk-based and the proximity-based measures differ can be seen in Fig. 5 . In the figure, we plot the probability of contracting the disease on the x axis and, on the y axis, the 1 over the exponential of minimum distance from the infected person (i.e. − min distance ), for = 50 in Eq. (1). − min distance is chosen such that a higher number means a person is closer to the patient at some point in time and hence subjects to a higher probability of being infected. Each point on the scatter plot represents a case in the experiment. The marginal distributions (histograms) of the cases are also plotted on the top and on the right for the risk-based and the proximity-based measures respectively. If the proximity-and risk-based measures are similar, we would expect to see positive correlation on the plot. Even though most cases that are in proximity to the patient tend to have higher probabilities of infection, there are significant number of cases that are far enough from the patient spatially but are still subjected to high infection probabilities. It is mainly due to the delayed effect when the patient has left a location but the virus in the environment still has the ability to infect. The experiments conducted in this paper utilize the probabilistic infection risk model in Eq. (1) that assumes exponential decay of infection risk in space and time. Since various environmental factors can impact how far virus-containing droplets can reach and how long the virus can remain contagious in air and on surface, the models can be extended to improve its accuracy by incorporating those factors if they are available. For example, the variances 2 and 2 in Eq. (1) can be replaced by a 2-dimensional covariance matrix to capture the effects of prevailing wind speed and direction on the spread of the droplets. The temporal variance 2 can be a function of the local temperature and humidity. In a confined environment, where infection by way of indirect contacts is possible, we can incorporate surface properties as part of the model. Another direction to refine the infection risk estimation is to consider reported cases of infection as observations. Consider the scenario where several people that came into contacts with a patient in a particular area have been tested for infection. Some of them were tested positive while others negative. The test results and their movement histories in the area are observations of the underlying risk model. The initial risk model is constructed mainly based on the knowledge of average reach and decay time of the virus. The new test results, positive or negative, provide additional information to adjust the model parameters so as to better align with local conditions. A Bayesian probabilistic model is well suited for this purpose: Based on the above Bayesian probabilistic model, we can refine the model parameters and by computing their posterior distributions: where 1 , 2 , … , are L individuals' test outcomes, which are either positive = 1 or negative = 0. Computational methods such as Markov chain Monte Carlo (MCMC) [12] can be applied to compute the posterior distribution. The refined model parameters and and the resultant spatio-temporal infection risk probability in Eq. (5) then provide a more precise local risk map. By using the improved risk map, we can further reduce the chance of missing positive cases and optimize the use of resources by avoiding unnecessary testing. The impacts of severe travel restrictions are enormous and their cost has been felt throughout the economy. Effective contact tracing is a key step in relaxing those measures while keeping the infection under control. Traditional contact tracing measures based on interviews with patients are labor intensive and error-prone. Contact tracing through personal electronic devices such as smartphones has been proposed as an effective measure to overcome these challenges. Although smartphone-based contact tracing has been successfully implemented in some countries, the privacy implication and security concerns can impede broad adoption of similar measures in other societies. Without wide-spread adoption, the effectiveness of electronic contact tracing can be severely limited. The proposed electronic contact-tracing approach PrivyTRAC respects privacy of individuals and increases security of the system through a de-centralized mechanism. By preserving privacy and enhancing security, it will significantly promote cooperation from the public and facilitate adoption by public health agencies. Additionally, PrivyTRAC improves the accuracy of infection risk estimation and consequently contact tracing effectiveness. As public health agencies are struggling to meet the expected demands of qualified personnel for traditional contact tracing measures, PrivyTRAC can be an effective tool to fill the resource gap. The proposed capability also provides valuable actionable information for authorities to better allocate resources and plan follow-on actions. First, based on the estimated infection risks and visitors/foot traffics, authorities can identify and prioritize areas that require disinfection. Second, decision makers can pre-position test kits and allocate other health resources according to locations, populations and severity of the virus exposure. Similarly, when a person is potentially exposed to the virus, the user App can suggest local test locations that best balance test site workloads and convenience. The system concept is applicable for COVID-19 as well as for other contagious diseases. By adjusting the disease vector, risk model and environmental effects, the system can be tailored to other infectious diseases in the future. The system architecture remains the same. The risk maps are tailored to different diseases in the same App environment. The effect of environmental parameters on the survival of airborne infectious agents Likelihood of survival of coronavirus in a respiratory droplet deposited on a solid surface How Coronavirus Spreads through the Air: What We Know So Far. Scientific America Turbulent Gas Clouds and Respiratory Pathogen Emissions COVID-19 contact tracing apps are coming to a phone near you. How will we know whether they work? Contact Tracing Coronavirus contact-tracing apps: can they slow the spread of COVID-19? Nature PACT: Private Automated Contact Tracing Exposure Notifications: Using technology to help public health authorities fight COVID-19 COVID-19 Contact Tracing and Data Protection Can Go Together CDC: How COVID-19 Spreads Gaussian Markov Random Fields: Theory and Applications. Chapman & Hall/CRC Monographs on Statistics & Applied Probability