key: cord-0556630-8xi7gjfb
authors: Huang, Jianwei; Yegneswaran, Vinod; Porras, Phillip; Gu, Guofei
title: On the Privacy and Integrity Risks of Contact-Tracing Applications
date: 2020-12-06
journal: nan
DOI: nan
sha: a1fd3653a6bed707c52e17bb6a6c273faf395600
doc_id: 556630
cord_uid: 8xi7gjfb

Smartphone-based contact-tracing applications are at the epicenter of the global fight against the Covid-19 pandemic. While governments and healthcare agencies are eager to mandate the deployment of such applications en-masse, they face increasing scrutiny from the popular press, security companies, and human rights watch agencies that fear the exploitation of these technologies as surveillance tools. Finding the optimal balance between community safety and privacy has been a challenge, and strategies to address these concerns have varied among countries. This paper describes two important attacks that affect a broad swath of contact-tracing applications. The first, referred to as contact-isolation attack, is a user-privacy attack that can be used to identify potentially infected patients in your neighborhood. The second is a contact-pollution attack that affects the integrity of contact tracing applications by causing them to produce a high volume of false-positive alerts. We developed prototype implementations and evaluated both attacks in the context of the DP-3T application framework, but these vulnerabilities affect a much broader class of applications. We found that both attacks are feasible and realizable with a minimal attacker work factor. We further conducted an impact assessment of these attacks by using a simulation study and measurements from the SafeGraph database. Our results indicate that attacks launched from a modest number (on the order of 10,000) of monitoring points can effectively decloak between 5-40% of infected users in a major metropolis, such as Houston.

Contact-tracing applications (CTAs) have emerged as a critical tool used by many countries to combat the spread of the Covid-19 virus within their borders. To date, more than 70 countries have launched contacttracing smartphone apps. These apps rely on either Bluetooth Low Energy (BLE) or GPS to track user movements and contacts. Many countries have mandated the installation of these apps on the smartphones of all of their citizens. Among the most widely deployed tracing apps is India's Aarogya Setu application [1] with over 50 million users in the first two weeks. Another widely deployed application is China's contact-tracing software that uses a color-coding system (red, yellow, green) to classify citizens based on their infection risk profiles [9] . Russia's CTA requires its citizens to download a QR code and to declare intended travel routes to navigate through major cities, such as Moscow [20] .

The value proposition of CTAs is that they provide a service to inform users when they have experienced close-proximity contact with another CTA user who has tested positive for Covid-19. When deployed ubiquitously throughout a community, the application enables users to continually assess their Covid-19 exposure risk. The majority of CTAs advertise an ability to preserve two important properties. (i) User privacy: the CTA will not reveal the identity of any Covid-19 infected CTA user. (ii) Exposure report integrity: the CTA service provides its users with accurate and timely exposure reports that enable them to continually monitor their personal infection risk.

However, the emergence of these applications has spawned significant community discussion and serious concerns regarding the extent to which these applications meet these privacy and reliability assertions. In particular, applications adopting a centralized tracking approach have faced significant criticisms from human rights organizations and the security community [4] . An appalling example is the case of the BeAware application from Bahrain, where users became unwitting contestants in a television game show, with the host randomly calling citizens and rewarding them for following social-distancing guidelines. Other countries adopting centralized data-collection approaches include Qatar and Norway. Wen et al. [46] analyzed the privacy usage in 41 CTAs and found that some applications expose identifiable information that can enable tracking of specific users. Sun et al. [42] evaluated existing applications and proposed a venue-access-based contact-tracing solution which preserves user privacy while enabling proximity contact tracing.

To alleviate the privacy concerns introduced by centralized data-collection strategies, both Google and Apple have released alternative CTA frameworks that have been adopted by many countries. These frameworks espouse a fully decentralized approach, with no exchange or upload of personal information or location data, except for random numerical strings that change every 10-20 minutes. Despite these efforts to reduce user privacy concerns, serious risks in using CTAs persist, and multiple recent studies [14, 42] have concluded that the vast majority lack security controls and often leak sensitive user information [46] . This paper provides a deeper examination of the privacy and integrity properties of the dominant CTA frameworks. We describe and formalize two adversarial strategies to violate the privacy and integrity assumptions of a wide range of CTAs, including those following the Google and Apple frameworks. The first attack, which leads to the de-anonymization of Covidpositive CTA users, is the contact-isolation attack. This attack uses a combination of device spoofing, pool testing, and device re-identification techniques to efficiently narrow potential contacts to a single victim. Pool testing has been used in various scenarios, including group blood testing [30] , de-anonymizing Internet sensors [25] , and most recently in waste-water-based Covid community tracking [16] . We extend this concept to the problem of Covid victim identification and demonstrate that it can be implemented across communities with a modest adversary work factor. In addition, we further evaluated how attackers may leverage the Bluetooth MAC addresses obtained in contact-isolation attack to deanonymize infected users by extracting persistent identifiers (WiFi MAC, SSID) and using online SSID location databases, such as Wigle [22] .

The second attack, contact-pollution, affects CTA integrity by polluting the Covid contact database with inaccurate contact information leading to false positive notification alerts. The contact-pollution attack combines contact-tracing relay attacks with replay attacks. While relay and replay attacks have been independently published in prior work [42] , the specific application of this attack mix in combination with Bluetooth range extenders has not been explored.

We implemented contact-isolation and contactpollution attacks on Android phones and evaluated them against the DP-3T (Decentralized Privacy-Preserving Proximity Tracing) framework [7] . We also assessed the implications of these attacks by using the point-of-interest (POI) data from SafeGraph [15] , which provides aggregated and anonymized datasets on social distancing and foot traffic to businesses. Such data is invaluable in tracking the spread of pandemics such as Covid. Based on our analysis, we found that a relatively small number of sensors (10,000), placed in strategic locations across a city, could be used to effectively identify devices associated with infected users. Large-scale de-anonymization of infected individuals would also be possible if these sensors could be integrated with video cameras and combined with face-search databases [21] .

In summary, the contributions in this paper include the following: -Description and formalization of two important attacks affecting CTAs. -A prototype system that demonstrates the feasibility of both attacks and supports their evaluation against the DP-3T application. -An analytical evaluation of an adversarial use of these attacks at scale, using simulation and the Safe-Graph database. -Discussion of the implications, ethical considerations, and potential countermeasures to the work reported here.

To mitigate the spread of Covid-19, CTAs are being widely deployed on smartphones all over the world. These applications work by automatically exchanging data with nearby devices. When an owner of a cellphone with CTA is identified as infected, the CTA uploads a report to the server. The reporting process varies across applications and countries, but in many cases, it is either voluntary or a combination of selfreporting, health-authority-based reporting, and analyses of data gathered from other sources like public transportation and banking. Infection reporting triggers contact-notification messages that will be distributed to all CTA users who may have had close contact with the patient over the past several days. By using CTAs, users and authorities have been able to effectively track and reduce the spread of Covid-19 in many countries. 

The high-level architecture of a typical CTA is illustrated in Figure 1 . When two users, A and B, become proximate, their phones will exchange a key code. The code is generated from the private information of users using advanced cryptographic algorithms (e.g., RSA, SHA-1). Most applications use the advertisement message in Bluetooth Low Energy (BLE) to exchange data, which persistently broadcasts messages to nearby devices. If user A subsequently becomes infected, he updates his status in the application. The design of the application determines which of the two possible strategies will be used for distributing the notification messages.

Centralized Applications. For centralized applications, user A provides his anonymized ID plus codes gathered from other proximate phones to the centralized server. The server decrypts these codes or does contact matching in the database to identify all potentially affected users and then alerts them.

Decentralized Applications. For decentralized applications, user A provides the code/key used in the past 14 days to the server. All users of the application periodically download the database and perform local contact matching. In this case, alerts are sent from client-side applications installed on user devices. Table 1 provides a summary of a few of the major published security flaws in CTAs. In most cases, governments have been quick to respond to published flaws, by either issuing security patches, rethinking architectural designs, or scrapping the application. While security flaws seem more egregious and pervasive in centralized architectures, they affect both classes of applications. None of these disclosures specifically discuss the attacks that are described in this paper, however, these currently documented adversarial techniques could be combined with the attacks presented here to increase their effectiveness. For example, the GPS-based triangulation attack could be combined with the Bluetoothbased contact-isolation attack.

As mentioned in 2.1, most CTAs will communicate with nearby devices to record close contacts. To satisfy the privacy and feasibility requirements of CTAs, the data-exchange protocol should be easily implementable and able to guarantee the anonymity of exchanged messages. Hence, Bluetooth, as one of the most widely used wireless communication protocol, is used in almost all the contact tracing applications as the data-exchanging protocol. Specifically, CTAs leverage the advertisement message to actively send generated code to nearby devices. Meanwhile, the applications will continue collecting advertisement messages and extract the code of nearby devices from these messages. The benefit of advertisement messages is that no device-specific information is included in the message. As an example, the detailed workflow of the sender part of the data-exchange protocol in Google and Apple contact tracing frameworks is shown in Figure 2 . When the local CTA requests to send the state code to nearby devices, the framework first generates a temporary key. Each time the framework sends an adver- tisement message, a rolling proximity identifier will be generated from the temporary key. The default interval of updating the identifier is 15 minutes. Since the identifier generation algorithm is open-sourced, it is possible to associate different identifiers and further identify the device with enough identifiers. To mitigate the security issue, the temporary key is updated every 24 hours. Different applications generate the exchanged code with different inputs. We collected 12 popular CTAs in different countries and analyzed the information used (see Table 2 ). Most applications generate a temporary key periodically and use an advanced cryptography algorithm to generate the code.

Central to the appeal of CTAs is the premise that they provide users a timely and accurate risk assessment service while also protecting each user's personal Covid-19 infection status. More specifically, CTAs typically advertise an ability to preserve two important properties: (i) User privacy, which ensure that each user's Covid-19 status is protected from other users, and (ii) Exposure risk integrity, which ensures that users receive timely and accurate risk indicators when near-proximity exposure to other Covid-19-infected CTA users occurs. One may observe that these two properties are mutually contradictory, as the satisfaction of one property will likely prevent satisfaction of the other property. For example, if a sequestered CTA user meets only one other person during a given week and then subsequently receives a Covid-19 exposure report, then it is trivial to infer that the other user is Covid-19 positive. However, withholding the exposure report to avoid such a deanonymization would violate the requirement for accurate and timely report delivery to ensure that the sequestered user can properly assess their personal risk.

Even when considered separately, we find that adversarial models exist that directly violate each of these two properties across multiple CTA frameworks. In this section, we discuss these models and present specific methods to implement attacks within a reasonable adversary work factor. The first attack, (i.e., contactisolation attack) enables one to selectively violate the user privacy property of CTAs. The second attack (i.e., contact-pollution attack) enables adversaries to prevent CTAs from satisfying the exposure risk integrity property. These attacks are quite general, and affect a broad class of applications, and are agnostic to whether the CTA follows the centralized or decentralized model.

Our threat model assumes that attackers have full access to their own devices. Attackers can modify the application stack running on phones, Bluetooth drivers of their devices, data exchanged with other devices, and data uploaded to remote servers from their devices. However, attackers have no access to other user devices and servers. We also make no assumptions about the ability of attackers to exploit remote servers or other user devices.

The model further assumes that attackers can fully reverse-engineer the workflow of the clients of target applications by analyzing the source code or binary file of the clients of target applications. Attackers know how to (i) create multiple accounts; (ii) upload records to servers; (iii) exchange data with other devices; and (iv) check notification messages. We do not consider the applications if strict restrictions are deployed on the server side to stop attackers from emulating these four operations. Although attackers know the workflow of clients, we assume that all communication between different user devices and all communication between devices and servers are well-protected by HTTPS and Bluetooth protocols.

For automated large-scale de-anonymization, we assume that these sensors are placed at popular locations (cafes, gas stations, etc.) in the city. We further assume that these sensors have integrated video cameras for effective device-to-victim persona mapping.

To clearly illustrate the attacks, we first formalize the workflow of typical CTAs, assuming an attacker is the user of the target application. When the attacker has close contact with other users, each of these devices will mutually exchange anonymized data. The data distributed by the attacker's device is indicated as c att , and the data that the attacker's device receives is

A user k who is confirmed to be infected, will report their infection to the server, which sets r k to true. If the attacker had recent close contact with user k, d k ∈ D att and r k = true. Now, the attacker is considered to be at risk for the infection. In this case, notification messages, denoted as N , will be sent to the attacker. Here, N is ΣR att .

By abusing notification messages, the contact-isolation attack allows an attacker to identify the device whose owner is confirmed Covid positive, which breaks the anonymity of CTAs.

When the devices of confirmed cases in R att are R att_conf irmed = {r c1 , r c2 , r c3 , ..., r cm }, the contactisolation attack aims to divide C att into many sublists

Then the attacker creates s new accounts. For each new account u i , the attacker uploads C i to the server. If the notification message is sent to u i , there must be at least one confirmed case in C i . If |C i | is 1, the device is exactly the one that belongs to the confirmed case. If not, by repeating the approach, the attacker can finally reduce |C i | to 1 and find the device.

False positive (and negative) alerts are a significant concern for most security applications and healthcare assessment tests. The same holds true for CTAs. For ex-Pool testing Fig. 3 . Illustration of the contact-isolation attack. Many applications are simulated in the device simulator. The attack device exchanges the data from simulated devices with normal devices, monitors 802.11 activity, as well as captures a video feed of visitors entering the location. When normal user updates his status in the server, only the simulated application whose data is exchanged with the confirmed case will receive the notification message.

ample, Schneier et al. [11] observe that CTAs typically define a risk exposure based on a six-foot proximity of two CTA users for more than ten minutes. Unfortunately, the dependencies on GPS and Bluetooth signaling are imprecise due to a range of factors that are inherent with RF propagation or mobile GPS signal inaccuracies or location update delays.

While our paper refrains from making any such high-level determinations on the utility of Covid CTAs, the key takeaway is that high false positive or false negative rates can render these applications useless. This section describes how a contact-pollution attack corrupts the contact database of target applications and results in many false positive notification messages. Increasing the false positive rate of a contact tracing application will decrease its reliability initially making them paranoid and subsequently cavalier to alerts produced by the system.

Existing CTAs assume that all the data from other devices is trustworthy because they use advanced cryptographic algorithms to generate the exchanged data. However, since we can both send and receive data, there are two avenues for the attacker to insert fake contact data that pollutes the contact database. tual range, they will still receive fake contact data from each other. 2. If the attacker has multiple devices in different places, these devices can communicate with each other and share the received data. Thus, C att contains the data we received through all attacker's devices, and the attacker keeps sending C att to all other devices in range of his devices. In this case, when Alice is shopping while Bob is attending classes, if they are in range of attacker's devices, they will receive fake contact data from the attacker, who is impersonating both of them.

While the first approach effectively doubles the attack range, the second approach infinitely extends it. Both of these approaches will result in additional r k in R Alice and R Bob . Thus this attack can stimulate a large number of fake notification messages when r j = true is inserted into R of other users.

This section describes how the previously described attacks can be implemented. As discussed in Section 2, most CTAs leverage Bluetooth to collect contact information, therefore, our attacks are primarily focused on the Bluetooth medium. We use the DP-3T framework as an example to illustrate how we can apply the attacks to real-world CTA scenarios.

The architecture of our prototype implementation for the contact-isolation attack is illustrated in Figure. 5. Our implementation is divided into four components. The first component, the Contact Collector, is deployed on the attackers' devices and is used to exchange data with normal CTA users. The second component, the Contact Distributor, separates the collected data into several groups and sends them to the Device Simulator based on the algorithm illustrated in Algorithm 1. The initial value of {C i } is C att , so we divide C att into n parts to produce C i for the first run. Each Device Simulator will have a unique account created by attackers. When the Device Simulator receives the contact data, it will upload the data to the server and wait for notification messages. All received notification messages will be forwarded to the Contact Analyzer, which extracts the device information of confirmed cases.

To collect the contact information, the target application will be running as a normal user. We use Frida [8] to dynamically instrument the target application. Since most CTAs are based on Bluetooth, we hooked the Bluetooth API to intercept the beacon data from other devices before they are handled by the application. More specifically, we hooked Mes-sageListener.onFound on Android.

When C i is sent to the Contact Distributor, C i is divided into several subsets and sent to different simulated devices to communicate with the server. Device Simulator. The Device Simulator creates a set of simulated devices. Each device will maintain a simulated user and the communication between the device and the server. Our implementation extracts the relevant logic of the target application and repackages it. So each simulated device is actually a process running on our device. The challenge in the Device Simulator is how to create the necessary number of accounts, as most CTAs require personal information to sign up, however, only a few required personal attributes are verified. Most applications only verify the phone number, which can be easily obtained from an online temporary phone number service. Since the number of accounts needed for most cases is under 30, we can manually register the accounts for our attacks. DP-3T does not have difficult restrictions for account creation. For confirmed infection cases, users are identified by a Covidcode, which is assigned to users by the government or hospital. Other users do not need to upload any information to the server. The only challenge in launching the contact-isolation attack is the possible firewall deployed on the server side, which is not included in the application itself. Therefore, we deployed DP-3T on our local machine with a simple firewall, which rejects connections from the same IP address when the number of connections reaches the threshold (10 in our evaluation).

Contact Analyzer. When all the N i have been collected, the Contact Analyzer will check if one of them is true. If so, the Analyzer will refine sets of C i for simulated users to get another set of N i . The algorithm to refine C i is as shown in Algorithm 1.

In this scenario, the target set includes frequent contacts (e.g., coworkers, neighbors). The attacker walks around if N k == true then 6: positive_counts = positive_counts + 1 7: T = T N k 8: set_pool = n/positive_counts 9: counts = 0 10: for Np ∈ T do 11: set_size = |Np|/set_pool 12:

for j from 1 upto set_pool do 13:

C counts * set_pool+j ={the (set_size*j)th to (set_size*j+j)th elements in Np} 14:

counts = counts + 1

(the neighborhood, apartment, dorm, office building) choosing a different (potentially overlapping) set of houses or offices in each round. In each round, the attacker spoofs a different device identifier. By repeating this experiment for a few rounds and using pool testing, the attacker can quickly narrow down to a small number of infected neighbors. The key point is that the attacker can have a potentially unlimited number of encounters with each contact since they reside in the same community. In addition, Bluetooth range extenders also improve the feasibility of the attack, as interactions no longer have to be close physical interactions. If the attacker creates n accounts for pool testing, and the number of users of the application in the community is N, the attacker only needs to do log n N rounds of pool testing to identify all the infected users.

This scenario requires three additional capabilities for uncontrolled de-anonymization of infected users to deal with the challenges involved in scaling and device reidentification. First, we address scaling by using a pooltesting strategy that involves reusing a set of identifiers across groups of contacts. Second, we will track WiFi broadcast activity in addition to Bluetooth activity, and use temporal correlations to map WiFi addresses with Bluetooth activity. While recent Android and iOS versions implement MAC address randomization, this happens only during the probe stage and per-SSID MAC consistency is maintained to facilitate automatic re-authentication. In addition, automatic WiFi probes also broadcast the names of recent networks to which a device has connected to that leak additional information, such as names of home or work networks, airports, cafes visited, etc. This information can then be used or combined with other device re-identification attacks [12] to track repeated visits by users that help to narrow the precision from groups of devices to individual devices. Thus, 802.11 (SSID, MAC) pairings provide a means to go from an ephemeral identifier (Bluetooth MAC) to a persistent identifier. Third, information from the integrated video camera could be used to derive the mapping from device identifiers to individuals. In addition, private SSID information can be used for location tracking of individuals using public databases, such as Wigle [22] .

To associate the WiFi probes to Bluetooth addresses, we differentiate the pair of probes and addresses by checking their existence and the time we first detected them. Since the signal strengths of WiFi and Bluetooth are different, we may not be able to identify all pairs if they only appear once. That is, for most devices the attacker will be able to associate the WiFi probe and the Bluetooth address if it communicates multiple distinct times (encounters) with the attack device. Based on our analysis in Section 5.3, most pairs can be identified with two encounters. Each visit to a monitoring location (e.g., store, cafe) would likely result in multiple encounters.

The prototype implementation of contact-pollution attack has three main components, as illustrated in Figure  6 . We continue to use the Contact Collector to gather contact information from the devices of other users. All collected contact information will be transmitted to the Contact Concentrator. Once the Contact Concentrator receives new data, all the Contact Polluters will receive a new database. The Contact Polluters use this information to continuously broadcast as many contact messages as they can. 1 Contact Concentrator. The Contact Concentrator is implemented as an HTTP server, which provides two API functions for the Contact Collector and Contact Polluter, respectively, to upload and download contact information.

To further spread false contacts, attackers need to collect the contact information from the original user 1 The actual number of messages in practice is limited due to the physical limitation of devices. Details will be described in Section 5.2) and replay it to another user. To introduce n false contacts to the database, attackers should participate in at least 2n interactions with other users.

This section presents our evaluation results that focus on the practicality and impact of the two attacks. More specifically, our evaluation seeks to answer the following research questions: R1. How feasible are the proposed attacks? I. Are the attacks practical for real-world applications? II. How much time do attackers need to complete the attacks? R2. How can the attacks affect the users of CTAs?

I. Is it possible to de-anonymize the victims? II. How many users are at risk?

To answer R1.I, we evaluated the challenges in launching the attacks on real-world applications (Section 5.2). We also analyzed the limitations and performance of each attack to determine the answer to R1.II. For R2.I, we tested the existing techniques of identifying users with Bluetooth MAC address (Section 5.3). The results show that user identities can be leaked to attackers in specific cases. Finally, we empirically evaluated the number of possible victims in different cities with the database from SafeGraph to answer R2.II (Section 5.4).

To answer the first set of questions, we evaluated our attacks on the sample application and the backend of DP-3T. We simulated the attacks with different victim devices and different attacker devices. The threshold of distance that devices will exchange data is 5 meters, and the time threshold is set to 1 second to simplify the evaluation process. We obtained POI datasets from SafeGraph. The datasets contain addresses, open hours, visitor counts, and popular times of over 1 million places. We leveraged these datasets to explore the question of large-scale attack deployment strategies that could maximize the attack coverage across city populations. We performed further analyses regarding de-anonymization based on the results of our attacks. This paper uses Houston and Bryan, TX and Chicago, IL as examples to show how the attacks can possibly affect the users of CTAs. 2

To simulate the attack on the sample application of DP-3T, we deployed the server on AWS. The devices we used in the simulation are shown in Table 3 . For iOS devices, we made some modifications to the source code of the client to avoid jailbreaking the devices. The modifications include the following elements. (i) When a device exchanges data with another device, the data and information of the other device will be stored into local file. (ii) All contact-tracing-related operations (i.e., broadcast data, retrieved data, data uploaded to the server) will be logged with timestamps.

For Android devices, we use Frida to instrument the application and implement functionalities analogous to those we implemented on iOS devices. To store the data and device information of another device, we hooked the MessageListener.onFound API function and extracted the parameters sent to it. To log all contact tracing related operations, we looked into the source code of the sample application and found corresponding methods ( Table 6 ) that were hooked to log operations. Account Creation.. The contact-isolation attack required the creation of accounts to simulate users. However, most applications limit the number of accounts each person can create by requiring unique information(e.g., medical ID, phone number) from users. Table  4 shows the required information for applications that we have tested. Unfortunately, due to possible privacy issues, not all applications require user to provide sensitive unique information (e.g. passport number, medical ID). Instead, they use a phone number or device ID to identify users, which can be easily forged by attackers. In our evaluation, we successfully registered users in all applications in Table 4 except tracetogether, which requires a valid ID in Singapore. Stop corona, swissCovid, and erouska use the Google and Apple framework to gather anonymous ID, from other users, and device ID is the only required information to create accounts. Thus, we created multiple users by emulating Android phones using Bluestack. Mytrace, aarogya, rakning and Covidsafe require phone number to create account but the phone number will only be verified during the registration. Hence, we leveraged online temporary phonenumber services to receive the tokens for registration and created accounts in the four applications.

To simulate the contactisolation attack, four devices (A, B, C, and D) are placed so that the attacker device (D) is the one and only device that communicate with all three other devices. A, B, and C are all normal devices and the user of C is assumed to be Covid positive. The attacker's device D has close contact with the three devices. The goal of the attacker is to pinpoint the confirmed case. As mentioned in Section 4, D will create three accounts to communicate with A, B, and C (i.e., denoted as V A , V B , and V C , respectively). Then D exchanges data with the three devices using the corresponding identity. When all data has been exchanged, C reports its status to the server and other devices pull the report from the server. The data stored in each device/account are as shown in Table 5. The recorded encounters of V A , V B , and V C are assigned by our tool, according to Algorithm 1. The results show that only data from C can be matched to the database from the server. Hence, C is the confirmed case, and D can obtain the corresponding device information from its log file.

We leveraged the Bluetooth advertisement message to simulate the contactpollution attack. The Bluetooth advertisement message is the most popular communication tunnel for CTAs. In our evaluation, we implemented a tool for Mac OS to automatically broadcast the advertisement messages it receives. The four devices are carefully placed such that A, B, and C can all communicate with D but are out of range to communicate with each other. Since D will replay the messages it receives, c A , c B and c C , will be broadcast. Hence, A, B, and C will receive the messages and record the message since they are within the specific range of D. Then we use C to report its positive diagnostics to the server. The results show that both A and B received the notification messages although they did not have close contact with C. The number of advertisement messages that a device can send at the same time is limited by its physical limitation. According to the Bluetooth specification, the maximum time interval between two messages is four seconds. For most applications, the duration for recording a close contact is at least five minutes, which means we need to continuously send the advertisement message during the five minute period. Each time we send an advertisement message, we need about 100 milliseconds (varies across devices). Summing the time cost of other logic (including regenerating and capturing messages) in the application, adding an advertisement message into our list will cost around 400 milliseconds. Thus, we can target at most 4000/400 = 10 devices at the same time with one attack device. Although the number of vic- Table 6 . Summary of operation methods hooked in Android Applications

GoogleExposureClient getTemporaryExposureKeyHistory Retrieves key history from the data store on the device GoogleExposureClient provideDiagnosisKeys Provide the keys of confirmed cases retrieved from server GoogleExposureClient getExposureInformation Provides a list of ExposureInformation objects, from which you can gauge the level of risk of the exposure tims are limited, the distance between attackers and victims can be much larger. By leveraging commodity longrange Bluetooth transmitters, attackers may be able to launch attacks from as far away as 100 meters.

Since devices often randomize their Bluetooth MAC addresses, it is still hard to identify the real person with the Bluetooth MAC address obtained by the basic contact-isolation attack. Hence, we evaluate how we can further de-anonymize user devices by associating WiFi probe messages with observed Bluetooth MAC addresses (M bt ). Our objective here is to move from an ephemeral identifier to a more persistent identifier, (i.e., WiFi MAC address and SSID tuple). A WiFi probe message is sent when WiFi is enabled but no connection has yet been established. These messages are typically broadcast with the name of recently connected SSIDs. Since most private SSIDs are unique, we can use a tag, (M wif i , SSID), to uniquely identify user devices if we can associate M bt with the corresponding tag. Both iOS and Android will randomize M wif i to preserve user privacy. However, the implementation randomizes the MAC address only for different SSIDs (i.e., the MAC address will be consistent with respect to specific SSIDs for extended periods). To associate M bt with the tag of the same device, we can monitor the entry and exit time of devices to find the possible pairs and further reconfirm the exact association when they subsequently reappear.

Based on the assumptions above, we designed the following approach to automatically identify the tag, tag inf , of a given M bt that has been identified to be an infected user. 1. Tag Collection. For each M bt detected at t 1 , we consider all tags detected 2 * (r wif i − r bt )/s seconds before t 1 . Here, r wif i and r bt represent the detection radius of WiFi probe messages and Bluetooth advertisement messages. s is the average speed of users, which varies in different scenarios. In other words, we consider all tags detected at t 2 if t 2 ≥ t 1 −2 * (r wif i −r bt )/s. All qualified tags will be stored in T and we use (M bt , T ) to record the encounter. 2. Tag Matching. When M bt_inf of infected user is identified by the Contact-Isolation Attack, we perform retrospective tag matching for (M bt_inf , T inf ). The candidate tags corresponding to an infected user are determined as follows: 1) tag cand ∈ T inf , 2) ∀(M bt , T ), if T ⊃ {tag cand }, M bt must have been identified as belonging to an infected user. Finally, among all candidate tags that satisfy the requirements, tag inf is the one that appears most frequently.

The aforementioned approach works well if all the infected users appear multiple times and their WiFi modules are always enabled. If an infected user only appears once or his WiFi module is disabled, false positives could be introduced when another tag happens to be in T inf , but never appears again. False negative cases may result from multiple reasons. For example, (1) users may have explicitly turned off WiFi probing or (2) stay in range for a very short amount of time. Finally, if an infected user always happens to appear along with another unrelated normal user, we will not be able to narrow down to a single infected user. However, such cases are a small fraction of the dataset, and even in these cases our approach can still help us narrow down to a smaller suspicious user set. We evaluated the feasibility of the approach in a simulated environment. Table 7 shows the parameters considered in the simulation whose range of values were derived from published literature. The frequency and radius values are defined based on the observations of existing works [31] [32] [19] . Here, 10000 users (of which 100 are infected) move from the outside toward our attack devices at a constant speed, stay for several seconds, and then leave at the same speed. When the data is distributed evenly, we generated 26,874 (M bt , T ) pairs in total and successfully identified 72 (out of 80) tags that belong to infected users. For the remaining 20 infected users, we observed neither the Bluetooth nor WiFi MACs because they never came in range. We also In each case, all other parameters for both infected tags and normal tags are chosen from a uniform distribution, as described in Table 7 . Detection radius of Bluetooth advertisement messages 10 m evaluated the effect of each parameter on the detection rate. We fixed the parameters of infected users to specific value and the parameters of other users are still evenly distributed. The simulation results are shown in Figure 7 . We find that WiFi probing frequency can significantly affect the detection rate. The reason is that once the frequency is too low, it is likely that the attack device will miss the messages, especially when users are moving at a high speed. Another important factor is visit encounter. The eight false negatives occur when the tags show up only once and other tags in the same pair cannot all be excluded. In this case, we cannot narrow down the suspicious infected users to the exact tag, but we can generate a smaller list of possible tags. Since the simulation does not incorporate information beyond what might be observed in the real environments, we believe that it demonstrates the practical viability of the attack. With the unique tag, attackers would be able to physically locate the user [44] (e.g. by leveraging WiFi location data published by Wigle [22] ) or track the user in the future [29] . In addition, video feeds could be combined with Tag database information to obtain a picture of candidate victims who might then be de-anonymized using face detection search engines like PimEyes [21] .

We leverage the POI database from SafeGraph to approximate the number of users possibly affected by our attacks. We assume that attackers are able to deploy as many attack devices as they want at many locations. The relevant fields we can get from the database include location_name, brands, street_address, latitude, longitude, city, raw_counts (number of unique visitors for the period), and visits_by_day. We cannot get the coverage of cities by adding up all the visitor counts because of the possible overlap between different places. Unfortunately, due to privacy considerations, the relevant data of the overlap (e.g., unique ID of visitors) is not available. Two considerations apply to the overlap: (i) overlap of visitors across days for the same sensor location and (ii) overlap across locations. Based on a 2017 study from Sense360, the average number of monthly customer visits at a single location for popular chains ranged between 1 and 2.2, with McDonalds being the most pop-

Input: Dataset of place pattern S, name of the target city N , population in the given city n, overlap ratio p and radius that the overlap applies to r. Output: Approximate coverage of victims in city c 1: L = ∅ 2: for i ∈ S do 3:

if i.city == N then 4: L = L i 5: for i ∈ L do 6:

for j ∈ L do 7:

if i = j then 8: dist = haversine(i.lat, i.long, j.lat, j.long) 9:

if dist ≤ r then 10:

i.visitor_counts = i.visitor_counts − p/2 * min(i.raw_counts, j.raw_counts) 11:

if i.visitor_counts ≤ 0 then 12:

remove i from L 13: unique_counts = 0 14: for i ∈ L do 15: unique_counts = unique_counts + i.visitor_counts 16: c = unique_counts/n [18] . Table 8 provides the summary of monthly customer frequency values for the most popular fast-food chains.

On the other hand, by comparing the raw_counts and the sum of visits_by_day, we can obtain the overlap across days in a given month for each location. Based on the dataset from May 2020 to June 2020, the average overlap for locations in the United States across days is 0.3674 (1 − raw_visitor_counts/Σ(visits_by_day)). That means, that the average frequency of the SafeGraph visitor at each location (Σ(visits_by_day)/raw_visitor_counts) was 1.58 during the month.

To address the cross-location overlap problem, we assume different ratio of overlap with different radius of the places and try to estimate the coverage of cities. For example, {overlap:0.05, radius: 5} means that we consider each two places of which the distance is 5 miles and the overlap between them is 5%. We use Haversine formula [40] to calculate the distance between each two places from the coordinates. The algorithm of cal-culating the coverage is as shown in Algorithm 2. In our evaluation, we tested the coverage for three cities: Houston, TX; Bryan, TX; and Chicago, IL. To calculate the coverage, we got the population of the two cities from the United States Census Bureau [17] . For Houston, the estimate population is 2,326,090. The coverage of possible victims with different overlap ratio and radii is illustrated in Figures 8(a), 8(b), and 8(c) . The results show that the attacks can possibly affect a relatively large portion of people. Even when we assume the visitor overlap for each two sensors within a 5 km range is as high as 30%, the coverage of possible victims is still about 25%. For Chicago, this number drops to 5%, which is still significant. From the analysis of SafeGraph data from these three cities, we can see that our attack is effective in both big cities and small towns.

While it is difficult to get exact numbers for overlap and coverage, a slight variance in these results does not fundamentally discount the potential of the attack. More importantly, they have implications for how the system is engineered. For example, a lower overlap number implies that coverage of the attack is higher; and implies that we may have fewer opportunities to narrow down the victim set using pool testing. Since the contact-isolation attack relies on encounters and not visits, and each visit might have multiple encounters, the de-anonymization attack may be successful with a single visit. On the other hand, a higher overlap number means less coverage but more opportunities to winnow down using pool-testing strategies. Specifically, a lower overlap corresponds to more victims of contact-pollution attack and a higher overlap corresponds to higher success rate in contact-isolation attack.

This paper is not the first to uncover security flaws in CTAs. However, to our knowledge this is the first academic work to systematically analyze CTAs and point out the use of device simulation, pool-testing,and online databases to enable large-scale de-anonymization of CTA users. We also raise awareness of the vulnerability of contact-tracing databases to data pollution attacks as well as the potential of Bluetooth range extenders in scaling such attacks. To our knowledge, we are also the first to use the SafeGraph database to assess the significance of contact-tracing attacks. The following summarizes and acknowledges related prior work that informed our research across various domains, including social media, Bluetooth, and contact-tracing protocols. [26, 35] , social-distancing compliance by region and political view [27, 28, 34] , and the impact of this pandemic on categories of retail business [35] . Government policy studies on the effects of uncoordinated Covid-19 social-distance measures have also employed SafeGraph [33] . These studies use the mobile device geographic reporting information anonymized within the SafeGraph POI dataset to measure human foot-traffic patterns. This paper uses this information to identify human cluster points in a region to maximize the number of potential CTAs that a stationary attack device would reach.

Bluetooth Security. As one of the most popular wireless communication protocols, the security of Bluetooth has been a hot topic for security researchers. Ben et al. [41] found eight vulnerabilities in the implementation of Bluetooth on different platforms, which are named Blueborne. Daniele et al. [23] proposed Bluetooth impersonation Attacks (BIAS) which allow attackers to impersonate other devices. BIAS abuses the switch-role message in the Bluetooth protocol to bypass the authentication process. The BlueFrag exploit (CVE-2020-0022) allows a remote attacker within proximity to silently execute arbitrary code with the privileges of the Bluetooth daemon [5].

To establish a common privacypreserving standard for contact-tracing protocols, Google and Apple announced a two-phase exposure notification solution on April 10, 2020 [32] . Their solution provides a set of API for developing CTAs. The specifications and code are all available from the official website. Almost at the same time, an independent group started by academics in EPFL released DP-3T [43] , which is fully open-sourced. We built and evaluated our prototype attacks on the DP-3T platform, but our attacks are generic and can be extended to these platforms. Lars et al. show how the Google/Apple contacttracing framework and DP-3T could be abused to physically locate infected persons in a large metropolitan city [24] . However, their de-anonymization technique differs from ours in that it relies on continuous tracking of Bluetooth activity from infected users across the city from multiple vantage points and does not leverage device spoofing, WiFi information, or online databases. In contrast, our attack only requires a single vantage point. Thus, the two efforts are complementary and could be combined for greater efficacy. Serge [45] also find that DP-3T is vulnerable to relay and replay attacks, which can introduce false alerts. However, this work does not actually demonstrate the feasibility of the attack and their proposed mitigation requires interactions between devices, which is not supported for Bluetooth advertisement messages. Krzysztof [38] proposed a more practical solution by explicitly also checking for time and location. Similarly, the PACT protocol [39] uses a seed and timestamp to generate encounter chirps. Both solutions improve resilience against the contact-pollution attack but are still vulnerable to contact-isolation attack. In addition, these mitigation may not always work against contact-pollution attacks, as sharing location with other devices is not always available and checking encounter time is insufficient to completely defend against contactpollution attacks (e.g., the attacker may still broadcast recent encounter codes collected from other users). Finally, there have been recent academic efforts such as Epione [37] , which uses cryptographic techniques such as private set intersection cardinality (PSI-CA) and private information retrieval (PIR) to protect user privacy. Our proposed contact-isolation and contact-pollution attacks are fundamental (much like DDoS) and also affect this system. Section 7 discusses some potential countermeasures that could be deployed to deal with such attacks.

This section discusses the ethical considerations of this work, potential countermeasures for the two attacks presented in the paper, and some limitations of the adversary models.

A full evaluation of attack efficacy and our analysis of deployment challenges using existing live deployed contact-tracing systems raises prohibitive ethical concerns. Such evaluations could hinder server operations, pollute contact-tracing databases, and cause false notification messages to be sent to users. Therefore, we do not propose the execution of experiments that interact with fielded contract-tracing services. Rather, the evaluation approach presented in Section 5 involves experimental devices that are divided into an attack set and a victim set, in which neither device set has contact with nonevaluation devices. All adversarial models evaluated in this paper were operated in isolation.

Another important ethical consideration is the appropriate strategy for vulnerability disclosure. Given that the specific attacks affect a broad class of applications, there is not an efficient and optimal strategy for privately disseminating this information. Instead, we intend to share the specific attacks with popular framework designers for further guidance.

Identity Verification. For the attacks presented in this paper, an adversary will need to create multiple accounts for uploading data and receiving notification messages. Based on our observations, most applications do not impose strict validations that could counter an attacker from producing a large account sets. For example, Smittestopp only requires a phone number to create a new account, which can be obtained using online phone-number services. Hence, employing multiple factors to reduce or restrict acceptable verification credentials can reduce the attack population that drives these attacks. Unfortunately, due to the possible privacy issues, to implement such an approach is difficult, may reduce the desired adoption rate of the CTA, or may be illegal in some countries.

Integrity/Consistency Check. An assumption of the attacks is that the server will not check the integrity of the data or the consistency between the data uploaded by users and its origin device. Though attackers are able to send fake data to other users or to the server, additional integrity and consistency checks could be applied to guarantee that the data-exchange process cannot be sniffed or spoofed.

To launch the contactisolation attack, attackers need to receive notification for each subset C i , which means they need to continuously request the notification with a small encounter list. To mitigate this attack, we can set restrictions on the server side before we send notifications to users. For example, we may require a minimum-length encounter list to receive notifications. However, if this becomes widely deployed, attackers may attempt to evade this restriction by introducing fake (synthetic) encounters into their submissions. Therefore, incorporating a thresholdlength encounter list should be coupled with integrityconsistency checks that vet and reject fake encounters.

We now discuss some limitations associated with our adversary models. First, our attacks require the deployment of devices that must achieve physical proximity to victims, which limits the scalability of the attacks to the number of devices that can be created, deployed, and maintained. The cost may be unacceptable if the attack goal is to introduce a large volume of fake data into the database. Second, our attacks are highly dependent on the application implementation. While the adversary models presented here are broadly applicable to a breadth of CTA schemes, the attack implementations need to be individually tailored per CTA. Finally, our attacks are most effective in scenarios where a large portion of the user population runs a monolithic CTA.

The paper examines two adversary models that can be applied to a wide range of Covid-19 CTAs that are emerging across the globe and are even being legally mandated among numerous countries. CTAs are broadly categorized into centralized coordination strate-gies, which raise inherent public privacy concerns, and decentralized peer-oriented strategies that claim to reduce end-user privacy concerns. However, we show that our adversary models apply equally well to both the centralized and decentralized strategies. The first adversary model, called contact-isolation attack, presents a privacy attack against app users who become infected with Covid-19, enabling an attacker to deploy probes that could essentially mine infected user identities at locations where humans commonly congregate. This attack combines large-scale device spoofing with pool testing, device re-identification strategies, and online databases to de-anonymize infected user devices. We evaluate the efficacy of this attack both using a simulation study and the SafeGraph POI dataset. The second adversary model employs a data-poisoning attack that enables an adversary to undermine the reliability of the CTA, resulting in false positive alarms that would be highly disruptive to any contract-tracing user base. We also discuss some mitigation strategies that could reduce the efficacy of both attacks.

kuwait and norway contact tracing apps among most dangerous for privacy

Coronavirus: Security flaws found in nhs contact-tracing app

Covid-19 apps pose serious human rights risks

Frida: A world-class dynamic instrumentation framework | inject javascript to explore native apps on windows, macos, gnu/linux, ios, android, and qnx

India's covid-19 contact tracing app could leak patient locations

Me on covid-19 contact tracing apps

Privacy issues discovered in the ble implementation of the covidsafe android app

Qatar's contact tracing app puts over a million people's info at risk

Report: The proliferation of covid-19 contact tracing apps exposes significant security risks

Great lakes researchers look to wastewater for data on covid-19

Mcdonald's customers more loyal than starbucks

Contact tracing apps: Russia is different

Pimeyes: The first face recognition search engine available to everyone

Bias: Bluetooth impersonation attacks

Anel Muhamedagic, Thien Duc Nguyen, Alvar Penning

Mapping internet sensors with probe response attacks

Konstantinos Pelechrinis, and Alexandros Labrinidis. Effectiveness and compliance to social distancing during covid-19

The spread of social distancing

Private precaution and public restrictions: What drives social distancing and industry foot traffic in the covid-19 era? Technical report

I know your mac address: Targeted tracking of individual using wi-fi

Combinatorial group testing and its applications

How talkative is your mobile device? an experimental study of wi-fi probe requests

Exposure notification

Christos Nicolaides, Dean Eckles, and Sinan Aral. Interdependence and the cost of uncoordinated responses to covid-19. Proceedings of the National Academy of Sciences

Are republicans or democrats social distancing more?

Foot traffic patterns by state and industry

Safegraph Incorporated. Stopping covid-19 with new social distancing dataset

Epione: Lightweight contact tracing with strong privacy

Delayed authentication: Preventing replay and relay attacks in private contact tracing

The pact protocol specification. Private Automated Contact Tracing Team

The cosine-haversine formula

Seyit Camtepe, and Damith Ranasinghe. Vetting security and privacy of global covid-19 contact tracing applications

Apostolos Pyrgelis, Daniele Antonioli

Why mac address randomization is not enough: An analysis of wi-fi network discovery mechanisms

Analysis of dp3t between scylla and charybdis

A study of the privacy of covid-19 contact tracing apps