key: cord-1051626-ox4jrism authors: Ahmed, Nadeem; Michelin, Regio A.; Xue, Wanli; Putra, Guntur Dharma; Ruj, Sushmita; Kanhere, Salil S.; Jha, Sanjay title: DIMY: Enabling privacy-preserving contact tracing date: 2022-03-26 journal: J Netw Comput Appl DOI: 10.1016/j.jnca.2022.103356 sha: 316f0428d857415c4da4b9ecbe7e98da15038f2d doc_id: 1051626 cord_uid: ox4jrism The infection rate of COVID-19 and the rapid mutation ability of the virus has forced governments and health authorities to adopt lockdowns, increased testing, and contact tracing to reduce the virus’s spread. Digital contact tracing has become a supplement to the traditional manual contact tracing process. However, although several digital contact tracing apps are proposed and deployed, these have not been widely adopted due to apprehensions surrounding privacy and security. In this paper, we present a blockchain-based privacy-preserving contact tracing protocol,“Did I Meet You” (DIMY). The protocol provides full-lifecycle data privacy protection on the devices as well as the back-end servers to address most of the privacy concerns associated with existing protocols. We have employed Bloom filters to provide efficient privacy-preserving storage and have used the Diffie–Hellman key exchange for secret sharing among the participants. We show that DIMY provides resilience against many well-known attacks while introducing negligible overheads. DIMY’s footprint on the storage space of clients’ devices and back-end servers is also significantly lower than other similar state-of-the-art apps. The outbreak of the COVID-19 pandemic has changed many aspects of people's daily lives. One of the characteristics of COVID-19 is its airborne transmission, which makes it highly contagious. Moreover, a person infected with COVID-19 can be asymptomatic, thus spreading the virus without showing any symptoms. Anyone who comes into close contact 1 with an infected person is at an elevated risk of contracting the coronavirus. The hike in number of infected cases due to recent strains such as the Delta variant has led governments to speed up the vaccination rates and enforce lockdowns, quarantines and recommend social distancing, aiming to prevent the spread of COVID-19. However, despite these precautionary measures, the rate of spread of COVID-19 is putting the public health systems of many countries under strain. Contact tracing is an established case investigation technique that has proven successful in dealing with other outbreaks such as Ebola and SARS. Contact tracing aims to establish the close contacts of an infected person so that they may be tested/isolated to break the chain of infection. Traditionally, the contact tracing process is performed manually in a reactive manner, triggered when a person tests positive for the virus. This is achieved by conducting a face-to-face interview to establish contacts made by the person while infectious 2 . This approach, although useful, has some limitations: i) It requires a large, trained workforce to cope with the caseload. ii) It is hard for people to remember everyone they have met while infected in the last 2-3 weeks. iii) A person may have met people that are strangers. Proactive contact tracing [1, 2, 3, 4] has recently been proposed to mitigate these issues by maintaining a record of all close contacts made by a person and utilising these records if that person tests positive. One way of implementing proactive contact tracing is to mandate record-keeping of people's attendance at public venues such as offices, restaurants, etc. This can be done manually, for example, through QR codes that direct attendees to register their details. However, this increases the risk to individuals' data privacy and allows for tracking the user's behaviour. A more popular option is to employ smartphone-based digital contact tracing apps that can exchange Bluetooth Low Energy (BLE) messages with each other to record this contact. The digital contact tracing app is typically composed of two main entities, the smartphones acting as clients and a back-end server. In this model, the smartphones of two individuals with tracing apps installed would exchange some random identification code (this identification code does not reveal any sensitive information about their actual identities) when they were in close J o u r n a l P r e -p r o o f Journal Pre-proof proximity. The backend is typically maintained by health organisations (or the government), and once a person is diagnosed with COVID-19, they can opt to share the local list of contacts stored on their smartphone with the back-end server to identify at-risk users. The popularity of digital contact tracing apps can be gauged by the fact that more than 45 such apps have been proposed or are being used in different countries [5] . These COVID-19 digital contact tracing apps are based on different architectures distinguishable in several aspects, including anonymous ID generation and exchange, risk analysis and notifications, etc. For details, readers are referred to a recent survey on digital contact tracing apps by Ahmed et al. [6] , in which the architectures are classified as centralised, decentralised and hybrid, according to the distribution of functionalities among the clients and the back-end server. However, recent security and privacy analysis of these apps has revealed several risks and issues [6] , [7] , [8] . These apps operate on different trust models. Apps based on the centralised architecture (such as [9] , [10] , etc.) generally collect sensitive data at a trusted server and only provide the privacy protection against the malicious users. This trust model makes these apps vulnerable to server-side breaches and malicious actions by the server. On the other hand, apps based on decentralised and hybrid architectures assume an honest-but-curious server model whereby the server will try to harvest sensitive information, if available. Apps such as [11] and [2] that are based on the decentralised architecture share the anonymous identifiers of the positive cases with all users for matching, making these apps vulnerable to linkage attacks, whereby malicious users can discover the real identities of persons diagnosed with COVID-19 [7] . Apps based on hybrid systems perform the risk analysis and notification process at the server instead of revealing the anonymous IDs of positive cases to other users for matching, as proposed in the decentralised architecture. However, these apps suffer from high communication and processing costs. For example, the DESIRE protocol [12] uses three BLE messages to advertise a single anonymous ID from a device [11] , while the ContraCorona app [13] employs three non-colluding servers (submission, matching and notification servers) to manage the contact tracing process. This paper proposes a new hybrid privacy-preserving digital contact tracing protocol called "Did I Meet You" (DIMY). This protocol is designed to provide full life cycle data privacy protection that prevents contact tracing data from being used arbitrarily. We take a holistic view of the security, privacy, and operational requirements for digital contact tracing (details appear in Table 3 and Section 6). We employ Eliptic Curve Diffie-Hellman key exchange, Shamir secret sharing, Bloom filters and the blockchain technologies in an integrated manner to address most of the concerns associated with existing contact tracing protocols. We make the following specific contributions: • We take a security by design approach and provide location privacy, the confidentiality of sensitive information (health, personal and social contacts), and protec-tion against common security attacks on the system. This is achieved by taking a systems-oriented approach to employ a combination of techniques such as ephemeral identifiers, Diffie-Hellman key exchange, Shamir secret sharing mechanism, Bloom filters and the blockchain. • We propose the use of Bloom Filters to store close contact information at the individual device level as well as the backend. The use of Bloom filters serves three important purposes: (i) It prevents information leakage not only at the client level (for example, because of device theft or coercion attacks) but also from authorities operating the backend and governments that can issue subpoenas to access information stored on the backend. (ii) It considerably reduces client device and backend storage requirements. (iii) Bloom filter storage hides the details of the encounter to attenuate the privacy threat at the backend. • As opposed to traditional apps that employ centralised servers at the backend, we have used a blockchain-based backend design in the system. This provides scalability, transparency and trust on backend operations, and ensures the integrity, non-repudiation and verification capabilities for the management of the data uploaded from positively identified cases once appended as blockchain transactions. Although, it is well known that Blockchain suffers from performance overheads as compared with a centralised solution, the properties of a blockchain-based solution are necessary for the design of a comprehensive privacy-preserving contact tracing protocol. We also evaluate the performance of our implementation based on the Hyperledger and show that DIMY provides low latency and consumes less resources while supporting high throughput under moderate loads. • We consider a comprehensive threat model and provide safeguards against several types of adversaries, including malicious users, external actors, backend (administrators) and government (discussed in detail in Section 6.1). We also provide comprehensive security and privacy analysis of our proposed solution and show that DI-MY provides resilience against common attacks such as linkage, enumeration, social graph construction and replay (discussed in detail in Section 6.3). This paper is organised as follows. We discuss related work in Section 2. Section 3 introduces the background information necessary to understand the building blocks of our proposed solution. We detail the design of our DIMY protocol in Section 4. In Section 5, we compare the salient features of the DIMY with other existing protocols. Section 6 provides a security and privacy analysis of DIMY protocol, while Section 7 details the performance analysis of our proposed solution. Section 8 concludes this paper. There have been a number of digital contact tracing apps proposed, developed and deployed. The MIT Technology Review summarised the salient features of 47 such apps [5] . These apps follow different development approaches and address multiple aspects in terms of privacy, security, performance, reliability, etc. In an earlier work [6] , we classified these tracing apps in centralised, decentralised and hybrid categories according to the underlying application architecture and the functionalities delegated to client devices and the server. BlueTrace protocol [1] is one of the first proposed digital contact tracing protocols that is based on a centralised architecture. This protocol was used in the Singaporean TraceTogether [9] and the Australian CovidSafe [10] apps. ROBERT [14] is Another protocol that is based on the centralised architecture named was proposed by Inria and Fraunhofer AIESEC that is also based on the centralised architecture. In the centralised architecture, a central server is responsible for handling major tasks such as ID generation, risk analysis and notification, etc. Typically in this architecture, a user enrols with the central authority, which periodically (typically every 10-15 min) generates a unique temporary ID for each client. This temporary ID is sent to the user and is used in his/her advertisement messages. The user records the received temporary IDs locally when in the proximity of other contacts running the same app. If a user gets diagnosed with COVID-19, a health officer authorises the user to upload (share) the list of all captured IDs to the centralised server for risk analysis and notification of the close contacts. The central server in the BlueTrace protocol can access the personally identifiable information collected at the registration stage and map each client to their temporary IDs. This raises privacy issues as this sensitive data can be used for other purposes besides digital contact tracing. In contrast to BlueTrace, the ROBERT protocol does not store any user identifiable information on the server. Temporary IDs are still created at the server without been linked with the devices used by the clients. ROBERTS's notification process requires the uploading of IDs used by a device to check whether they have come in contact with a COVID-19 positive case or not. This is in contrast with BlueTrace, where the server can identify at-risk users and contact them proactively. ROBERT protocol, however, similar to other protocols based on centralised architectures, has a high potential to function creep, in which it can be re-purposed into a mass surveillance system [11] . Another potential issue associated with centralised architectures is the construction of partial social graphs (discussed in detail in Section 6) that enable linkability of infected cases and their contacts. A server breach can also result in the loss of sensitive data stored at the server. The decentralised architecture differs from the centralised version by pushing more functionalities to the user's devices. There is still a server involved, however, the role played by the server is more in terms of orchestrating the clients. This approach claims to improve user privacy by generating temporary IDs in the user's devices. Additionally, exposure risk processing is also performed at the device level. Generally, devices generate random seeds for forming their temporary IDs. These IDs are exchanged with other users who they come in contact with. Once a user is diagnosed positive with COVID-19, all the device's seeds (some of the apps upload IDs instead of seeds) are uploaded to the server. Any user who wishes to check whether they are at-risk can download the seeds (or IDs) uploaded by the diagnosed users. The device can then perform matching locally, with the user notified of the result. The server is neither involved in the ID generation nor the atrisk analysis and notification process. There are a number of protocols that follow the decentralised architecture, such as DP-3T [11] , PACT-East Coast [2] , Google Apple Encounter Notification (GAEN) [3] , [4] and TCN [15] . They have minor differences in the implementation of sub-components with the basic design following the general functionality described in this section. Apps based on decentralised architectures provide enhanced privacy protection against server-based attacks as devices generate their own anonymous IDs. However, decentralised apps are known to be vulnerable to linkability attacks, whereby a user who has received the IDs generated by an infected user can link the IDs with the real user's identity [7] . These apps are also subject to enumeration attacks, enabling counting all positive cases by each user. Hybrid architectures balance the tasks between the client and the server. The server is responsible for performing the risk analysis and notification process, while the client manages the generation of temporary IDs. Desire [12] is one of the example protocols that follow the hybrid architecture. Devices using the hybrid protocol cryptographically generate and exchange IDs with other devices. A contact between two devices is represented by a unique encounter ID, which the app generates by combining its own and the received temporal IDs. A user who tests positive can optionally upload the generated encounter IDs to the server. Any user who wants to check the risk of exposure sends their collected encounter IDs to the server for matching. The server performs risk analysis and notifies any user who is deemed to be at risk. The Desire protocol uses 256-bit IDs that are broadcast for generating encounter IDs (or tokens) using the Diffie-Hellman key exchange. This design choice requires at least three BLE message exchanges (Advertisement, Scan Request and Scan Response), resulting in an increase in energy consumption for devices [16] . We have listed the modalities involved in the design of the three types of architectures commonly used for digital contact tracing. We have also highlighted some common issues related J o u r n a l P r e -p r o o f Journal Pre-proof to privacy and security associated with apps based on these architectures. Our proposed solution, DIMY, can be broadly classified as a hybrid architecture that generates the IDs on the devices and performs risk-analysis and notification tasks at the server. For DIMY, we utilise Bloom filters to encode the encounter ID generated by the devices and to store the encounters at the backend. The DP-3T protocol also suggests the use of Cuckoo filters but in a different context. In their proposal, the server has access to all the seeds uploaded by users who have tested positive and can generate the IDs used by positive cases. They proposed encoding these IDs in a Cuckoo filter to hide them from other users who are performing local risk-analysis. In comparison, our use of Bloom filters provides better privacy protection as these are used to hide the encounter information both at the device level and at the backend. We also employ blockchain technology to manage the backend processing. BeepTrace [17] is another framework that has proposed using two blockchains: 'tracing chain' to manage tracing/contact matching with anonymised user data, and the other 'notification chain' to manage notifications at the backend. In contrast, we propose using a single blockchain (Hyperledger Fabric) and enhancing its privacy protection further by using Bloom filter-encoded data storage. Additionally, we rely on the smart contract functionality to perform the exposure risk-analysis and matching in a privacy-preserving manner. Lv et al. [18] propose Bychain, a new blockchain that stores contact information securely. Bychain protects user identities using Zero-Knowledge protocols. Though the contact information is stored on the blockchain, the authors do not discuss how data is retrieved and used when individuals test positive for COVID-19. The proposed protocols rely on support from GPS-equipped provers and witnesses to record contact information using LTE, WiFi, and BLE. Our proposed design, in contrast, is an end-toend BLE based solution for contact tracing relying on widely available BLE modules on smartphones. In the context of digital contact tracing, we also note that the use of cryptographically generated IDs and the Diffie-Hellman key exchange mechanism has been proposed in [13] , [19] , [12] . Similarly, the k-out-of-n secret sharing mechanism for ID distribution has been proposed as an extension to the standard protocols in [11] , [13] . In our proposed protocol, the secret sharing mechanism is coupled with the Diffie-Hellman key exchange. We integrate these security and privacy-preserving techniques with efficient set membership using Bloom filters and additionally employ blockchain technology at the backend. Table 1 highlights the key technologies used in DIMY and places our proposal in the context of existing protocols. This section introduces key technologies that form the building blocks of our proposed solution, including Diffie-Hellman key exchange, Shamir secret sharing, Bloom filters, and blockchain. [12] × × × H Contra Corona [13] × × H BeepTrace [17] × × × H ByChain [18] × × × H DIMY H Diffie-Hellman (DH) [20] is a public key distribution system that addresses secret key distribution over an insecure channel. It enables two users to communicate with each other to arrive at a common symmetric secret key to encrypt/decrypt their future communications. This secret key is computed in such a manner that an eavesdropper cannot reconstruct the shared secret key, in a computationally feasible context, even if they have heard all the messages exchanged. This key distribution mechanism is based on the discrete logarithm problem. Let G be a multiplicative group of prime order n. Let g ∈ G be a generator. Party A chooses r 1 ∈ Z p , computes g r 1 and sends to party B. B chooses r 2 ∈ Z p , computes g r 2 and sends to party A. On receiving g r 2 , A computes (g r 2 ) r 1 = g r 1 r 2 , similarly, on receiving g r 1 , B computes (g r 1 ) r 2 = g r 1 r 2 . Due to the hardness of the discrete logarithm problem, an adversary cannot compute r 1 , given g r 1 . Hence, it cannot construct the common key. In our contact tracing protocol, G is an elliptic curve group. In our proposed protocol, we use a secret sharing scheme [21] to provide information privacy and secure communication between the devices participating in contact sharing. The basic idea revolves around making shares of a secret that can be securely distributed over many devices by a threshold secret sharing mechanism. A secret sharing scheme consists of two phases, called sharing and reconstruction. In a k-out-of-n secret sharing scheme (also referred to as (k, n)-secret sharing scheme), there is a unique player called the dealer who wants to share parts of secret S among n players, P 1 , P 2 , ..., P n . The dealer, assumed as honest, creates n shares of the secret S (S 1 , S 2 , ..., S n ) and sends every player a share (say S i to player P i ) of the secret S in a way that any group of k or more players can reconstruct the secret. All shares are necessary for the reconstruction of the secret if we keep k = n. A k-out-of-n secret sharing scheme, in general, has to satisfy the following two properties: 1. Recoverability: The secret can be reconstructed given any k shares. 2. Secrecy: No information can be known about the secret given any number of shares < k. We employ Bloom Filter (BF) for logging contact information on the devices and at the back-end blockchain. A Bloom filter [22] is a probabilistic data structure used to represent set membership. It supports an efficient mechanism for set membership queries. When queried, the BF will return true (with a false positive) if the queried data exists in the filter. A BF is implemented as a bit array BF[i], i ∈ (1, n), of n bits accessed via h independent hash functions H 1 (x)...H h (x), each of which maps an element x in a set S of m elements to one of the l bits within the bit array. Querying the presence/membership of an element x in the set using a BF requires checking h ] returns 1 only if all h corresponding bits are set to 1). In BFs, false-positives (FP) are possible, but false-negatives (FN) are not. A FP ψ(m, k, n) is the probability that a membership test performed for an element x not stored in BF(S ) returns 1, in which the parameter m specifies the size of the bit array (Bloom filter length), k specifies the number of hash functions, and n is the cardinality of the stored set. Even if an exact expression for ψ(m, k, n) is available [23] , virtually all work in the field relies on a simple, but tight, approximation: Simply, an FP is due to the collision of two different elements being mapped to the same bit position. Blockchain technology was initially introduced in 2008 to maintain a public ledger of Bitcoin transactions [24] . This technology allows network participants to create chronologically sequential immutable blocks ensuring integrity, trust and transparency and providing digital evidence [25] . It enables creating solutions that do not rely on a central authority; rather, the chain is spread over several nodes in a distributed manner. It also ensures information integrity by linking the blocks in the chain by hashing the previous blocks. There are three main types of blockchain; public, private and permissioned. The public instances are blockchains that allow any peer to participate in the network. Some examples are Bitcoin [24] and Ethereum [26] . Private blockchains are restricted networks that allow only some nodes to participate while relying on a central authority to manage the nodes. For example, Ethereum can also run in a private instance, however, while executed privately, there is no connection/interaction with the public instance. In the permissioned blockchain, a group of participants perform the node access control. The main example is Hyperledger, in which organisations are responsible for managing the network. The Hyperledger Fabric [27] blockchain instance supports the deployment of chaincode, a small piece of source code developed and embedded in the blockchain. This code is executed once a node sends a transaction to its address. It uses a consensus based on the Byzantine general's problem, known as Redundant Byzantine Fault Tolerance (RBFT), for generating an agreement on the order and correctness of the set of transactions in a block. RBFT defines a leader to conduct an election with the existing nodes connected to a given organisation, providing speedy agreement. The validation of the transactions takes place in the smart contract. This consensus protocol makes the Hyperledger Fabric blockchain a good choice to support our solution, where the organisations are modelled as health authorities (see Section 4) . Using this blockchain technology in our solution ensures the data integrity, transparency of operations and decentralised data storage. We first provide an overview of the DIMY protocol with a detailed description of the building blocks appearing in subsequent sections. Figure 1 shows the overall architecture of our proposed solution. Consistent with other decentralised and hybrid architecture-based contact tracing approaches, devices participating in DIMY periodically generate random ephemeral identifiers. These identifiers are used in the Diffie-Hellman key exchange (Refer to Section 3.1) to establish a secret key representing the encounter between two devices that come in contact with each other. For example, Alice generates a random number X At at time t and calculates its ephemeral identifier E phID At = g X At ∈ {0, 1} 128 (g ∈ G is a generator and G is an elliptic curve group of order p). After generating their E phID, devices employ the k-out-of-n secret sharing scheme to produce n secret shares of the E phIDs. Devices now broadcast these secret shares, at the rate of one share per minute, through BLE advertisement messages. A device can reconstruct the E phID advertised from another device if it has stayed in the communication range of this device for at least k minutes, enabling it to collect k secret shares of E phIDs. Assume that Alice is able to reconstruct the E phID Bt = g Y Bt advertised by Bob where Y Bt is a random number generated by Bob at time t. Alice now computes the secret encounter identifier EncID ABt = (g X At ) Y Bt . Bob also computes the same encounter identifier Enc ABt having received k advertisements from Alice. A novel aspect of our proposed solution is the use of Bloom filters for storing contact information. Each device maintains a Daily Bloom Filter (DBF) and inserts all the constructed encounter identifiers in the DBF created for that day. The encounter identifier is deleted as soon as it has been inserted in the Bloom filter. Devices maintain DBF on a 21 days rotation basis, identified as the incubation period for COVID-19. DBFs older than 21 days automatically get deleted. Our solution employs blockchain at the backend. Once a user is diagnosed with COVID-19, they can volunteer to upload their encounter information to the blockchain. Health Authorities (HA) then generate an authorisation access token from the blockchain that is passed on to the device. The user's device combines 21 DBF into one Contact Bloom Filter (CBF) and uploads this filter to the blockchain. The blockchain stores the uploaded CBF as a transaction inside a block (in-chain storage) and appends the block to the chain. A user who wants to check whether they have come in close contact with any user who was diagnosed positive can query the blockchain on a daily basis. A device combines all of the locally stored DBFs (the maximum number is limited to 21) in a single Bloom filter called the Query Bloom Filter (QBF). The QBF is part of the query that gets uploaded to the blockchain. The blockchain matches the QBF with CBF stored as a transaction in the blockchain and returns "matched" or "not matched" as a response. If the response from the blockchain is negative, the device deletes its QBF. Conversely, if the user is at-risk, the QBF is stored separately for further verification by HA in the contact tracing process. All communications between the HAs and the blockchain is assumed to be secure and authenticated using standard techniques such as digital certificates and TLS. Communications between the client's devices and the backend are also encrypted. As an extension to our base protocol, we also assume an anonymous communication channel (through the use of proxies or anonymous networks) [13] between the users and the backend in an attempt to hide the linking of IP addresses with the real user's identities. We now explain each component of the proposed solution in more details. In this section, we briefly discuss the notion of encounter representation in the context of contact tracing apps. One simple way to achieve contact representation involves using device IDs. In this scheme, which we refer to as an ID-based scheme, each device is assigned a temporal ID, either by a central authority server or computed locally at the device. The devices advertise and exchange these IDs. The presence of an ID in the local storage thus represents an encounter with that user (device). In an alternate scheme that we refer to as the shared secret-based scheme, encounters can be represented by a shared secret between two participants. Both participants exchange specific messages to arrive at a shared secret only known to the parties in communication. In ID-based schemes, all devices in the vicinity of the device A store the same ID advertised by A. In contrast, in the shared secret-based scheme, each device pair computes a different shared secret among them. Concretely, if three devices A, B and C meet each other and advertise ID A , ID B and ID C respectively, according to the ID-based scheme, A would store: {ID B , ID C }, B will store: {ID A , ID C } and C will store:{ID A , ID B }. If these devices are instead using the shared secret-based scheme, they will end up storing secrets as A: We have used the shared secret-based representation for recording the encounter between neighbouring devices as these provide more resilience against replay attacks discussed in Section 6. This component pertains to generating anonymous device IDs that are used as device advertisements. We consider two common design options: i) Each device generates its pseudoanonymous random identifier. This is the approach taken by most decentralised and hybrid contact tracing apps such as PACT-East Coast, DP-3T and GAEN. ii) A centralised server generates these identifiers for the registered devices that are then periodically transferred to the devices. This approach is used in apps based on a centralised architecture, such as Trace-Together and COVIDSafe (AU). In our solution, each device generates its ephemeral IDs, which are valid for 30 minutes. This provides privacy protection against exposing a user's contact details (mapping of IDs to real identities) to the backend. We have kept the size of E phID as 16 Bytes (128 bits Once devices generate the E phIDs, the advertisement phase can commence. For this phase, instead of directly advertising the E phID, we use a k-out-of-n secret sharing (Shamir's J o u r n a l P r e -p r o o f Journal Pre-proof There are multiple advantages of using this k-out-of-n secret sharing. First, the devices need to be in contact for at least k minutes to receive at least k parts of the secret. Setting k = 15 minutes automatically takes care of the duration of close contact. Second, using secret sharing eliminates the possibility of replay attacks (discussed in Section 6.3). Finally, the combination of Diffie-Hellman key exchange and Shamir secret sharing provides extra protection against an adversary trying to capture the EphIDs for malicious use. A computationally bounded adversary, Eve, who is listening for BLE advertisements from Alice and Bob, has to first collect at least k shares of Alice and Bob's advertisements. Then, given Eve knows g a and g b , it can still not compute g ab because of the hardness of the computational Diffie-Hellman problem [28] . Additionally, we advertise part of the hash of the E phID in each share. Figure 3 shows the BLE advertisement message format with the 3-Byte hash of the E phID (Hash(E phID)) included in the advertisement. 4 The hash value helps in identification of shares belonging to the same E phID, and as integrity check for the reconstructed E phID. In our scheme, a device will simply discard the shares without attempting reconstruction if it has not received at least k shares or if the hash values fail the integrity check. An additional issue associated with using k-out-of-n secret sharing is the address carryover mechanism due to the rotation of the E phID after each Epoch (30 minutes) 5 . Suppose a re-ceiver device B comes into contact with A when the 10th share of a particular E phID is being broadcast by A at time t 0 and moves away when it has received the 10th share of the next E phID generated by A at time t 1 . Device B has thus maintained contact for 20 minutes, however, the logging mechanism would fail to register this contact as it has only received 10 chunks each of two different E phIDs. To address this issue, we use the simultaneous advertisement of multiple E phIDs with overlapping intervals as proposed in [13] . A device always broadcast two E phIDs, rotating one identifier in such a way that the start of each identifier is staggered by 15 advertisement intervals. In Figure 4 , a device is broadcasting two E phIDs, alternately every 30 sec. A receiver device which it comes into contact with at time t 0 and leaves at time t 1 is able to register this contact as it has received enough shares of E phID B while in contact. Figure 3 shows the message format of the BLE advertisement messages employed in our solution. We have used the ADV NON CONN IND message format for connectionless advertisements for the chunks of EphID. After sufficient shares of E phIDs are exchanged with neighbouring devices, each pair of devices can compute a secret symmetric encounter identifier (referred to as EncID). Each device inserts the encounter identifier in the local DBF and then discards it. We have used a Bloom filter to preserve the data privacy, reduce the storage requirement and improve the query efficiency compared with other normal data storage structures. The design of the filter involves various parameters such as the number of entries to be stored in the filter (n), the size of the filter in bits (m), the number of hashes (k) and the false positive rate (p). Figure 5 shows the false-positive rates for increasing values of encounters n inserted in the DBF and CBF with different values of m and k. Considering DIMY uses the secret-sharing mechanism (with at least 15 minutes to register a contact), we assume n as 1,000 for DBF as a worst-case representing the maximum number of close contacts per day. As the CBF can hold a maximum of 21 DBFs, the worst-case n for CBF is 21000. The FPR analysis shows that the worst FPR is given by m-k combination of 50 KB-2 Hashes, and the best FPR by 100 KB-4 Hashes. As the hashing is performed at client devices, we take the size of the filter m = 100KB with k as 3 to reduce the computations and battery consumption. The DBF and CBF, in this setting, have FPR of 1 in 19 Million and 1 in 2303, respectively. Once a user is diagnosed with COVID-19, they can get an authorisation code from the health authorities to upload their locally stored contact data to the back-end blockchain. Figure 2 shows the information exchange (CBF and QBF to the backend and response from the backend). All communications between the HAs and the blockchain is assumed to be secure and authenticated using standard techniques such as digital certificates and TLS. Communications between client's devices and the backend is also encrypted. As an extension to our base protocol, we also assume an anonymous communication channel (through use of proxies/anonymous networks) [13] between the users and the backend in an attempt to hide the linking of IP addresses with the real user's identities. The device combines their DBF covering the last 21 days into a single CBF of size 100KB (equal in size to the DBF). The set union function is utilised as the combination process for the DBFs to construct a CBF. For example, all '1'-bit existing information in the DBFs are accumulated into one CBF by performing a bit-wise OR merging [29] . This merged CBF is theoretically equivalent to performing 21 i=1 DBF i and its false probability is similar to using a standard Bloom filter. The CBF is then sent to the back-end blockchain for logging as a transaction. The system supports querying by uploading the QBF (encoded with the DBF from the last 21 days). The user's device uploads this query to the blockchain to check whether someone in close contact has tested positive. DBF, CBF, and QBF are of the same size of 100KB, serving three distinct purposes: First, it reduces the amount of data transferred to the backend, i.e., instead of uploading 21 100KBsized DBFs, we only use one 100KB CBF or QBF for upload. Second, this aggregation of multiple DBFs into a single CBF and QBF hides the details about the day/time of encounter to attenuate the privacy threat at the backend. Third, equal-sized CBF and QBF are employed to support efficient Bloom filter matching through set intersection operation at the backend (explained further in Section 4.6). We use Hyperledger Fabric [30] for the blockchain's implementation, as it provides a modular permissioned blockchain platform, which allows flexibility in modelling the Bloom filter on transactions. The Hyperledger Fabric network is designed to be maintained by a consortium of Health Authorities (HAs), which comprises stakeholders in the healthcare sector, e.g., relevant government agencies and hospitals. Each HA maintains a set of peer nodes to host the ledgers, execute smart contracts, and maintain a set of orderer nodes for consensus protocol. HAs and their corresponding peers are identifiable by cryptographic primitives that comply with the X.509 standard for public-key certificates. The HAs interact with the blockchain through multiple smart contracts. We have designed a smart contract that is capable of performing the following functionalities: • Issuing access tokens: A user who tests positive requires authorisation by HAs to upload their CBF to the blockchain. The HA transacts with the blockchain to issue a temporary access token by providing corresponding HA credentials to the smart contract. Upon successful credential validation, the smart contract records the token to the blockchain. HA provides this access token to the user, that is only valid for 24 hours. • Processing CBF: The smart contract validates the temporary access token provided by the user who uploads their CBF. Upon successful validation, the smart contract records the CBF permanently to the blockchain and updates the ledger's state. • Processing QBF: The smart contract handles queries from users concerning contacts with positive cases by checking the user's QBF against stored CBF in the ledger. Then, the smart contract returns the matching result, which will either be matched or not matched. CBFs stored at the blockchain are managed and queried by the Hyperledger. Given the on-chain data storage capacity of a single transaction is 4MB [31] , the Hyperledger can add a minimum of 1 or a maximum of 40 CBFs (40x100KB = 4MB) in a single transaction. Interaction with the blockchain is provided only through REST APIs. The query mechanism does not require any access authentication. The REST APIs are provided by HAs, which imply that multiple identical APIs are available. The contact verification process is performed through a smart contract at the blockchain. Each day 6 the app combines all the DBFs of the last 21 days (or the available DBFs if the app has been in operation for less than 21 days) into a single QBF of size 100KB by performing a bit-wise OR operation on the DBFs ( 21 i=1 DBF i ). The user also appends to the J o u r n a l P r e -p r o o f Journal Pre-proof This search equates to finding the intersection of the two equal-sized Bloom filters CBF and QBF, by performing a bitwise-AND operation on CBF and QBF to approximate their set intersection. As we have used three hashes for constructing the DBFs, at least three bits must match in the CBF and QBF to indicate a possible close contact between the person who has uploaded the QBF and a positive case encoded in the CBF. Let t denote the number of bits set in the intersection set, the backend returns "No match" if t < 3. For t >= 3, a match is declared and returned to the app in the form of a warning. We can approximate the FPR for the intersection set by (t/m)k. Since t is always less than or at most equal to the set bits in any CBF and QBF, the FPR for the intersection set is ≤ FPR for CBF, and is ≤ FPR for QBF [29] . Note that we have calculated the FPR for CBF and QBF to be 1 in 2303 for worst case analysis assuming 21000 close contact entries (Section 4.4.1). We introduced three architectures commonly used for designing digital contact tracing apps in Section 2, and discussed the design of our proposed solution in the previous section. This section compares the salient features of our proposed solution with representative apps from the three architectures. Table 2 highlights the salient features and their equivalent in selected apps. Our proposed solution, DIMY, generates a temporal ID on the client's device in line with other decentralised and hybrid apps. DIMY is also optimised for storage, both on 7 T old date can be a maximum of 21 days old, thus any CBF stored at the blockchain that is older than 21 days is not matched. This automatically takes care of CBFs pertaining to COVID-19 cases that are no longer infectious. the client's device as well as the backend. The design involves storing contact information in fixed-size DBFs. The back-end blockchain only stores a single Bloom filter (CBF, size 100KB) per positive case that has encoded information of DBFs for up to the last 21 days. This reduces the storage requirement at the backend considerably compared with other apps listed in Table 2 . Another salient design option is where to perform the risk analysis and notification. Apps based on centralised and hybrid architectures perform this step at the centralised server, while apps based on a decentralised architecture perform this locally, on the device. DIMY performs the matching of contact information on the back-end blockchain. However, the blockchain cannot infer any extra information as the matching is performed on contact information encoded in Bloom filters. Our proposed solution is device-centric because it performs most of the privacy-preserving operations on the device. This includes EphID generation, computing k-out-of-n shares and broadcasting these shares using BLE messages, and encoding received contact information on DBF after enough shares have been received to construct a shared secret. Desire also uses the Diffie-Hellman key exchange and the generation of local IDs on the devices. In contrast, Desire uploads the shared secrets collected by a user who has been diagnosed with COVID-19 to a server for matching, to be performed at a later stage. DIMY requires uploading the least amount of data when compared with other apps. A single 100KB sized CBF is uploaded from a COVID positive client to the blockchain. This is in contrast with uploading all contact IDs required in apps that follow the centralised architecture, and uploading multiple seeds or the PETs table on apps that use decentralised and hybrid architectures. DIMY also requires client devices to upload QBL, a Bloom filter for matching transactions stored on the blockchain. DIMY client devices only download the risk analysis results in the form of a binary notification similar to centralised and hybrid apps. Apps based on the decentralised architecture involves the downloading of either seeds/chirps from the server in order for matching to be performed on the devices. This section is dedicated to an analysis of security and privacy guarantees provided by the design of DIMY. In this section, we describe the adversaries considered in the design of DIMY protocol, their capabilities and the risks that they pose. We categorise the adversaries into four groups, users, external actors, back-end server (administrators), and the government. i) Users have access to in-app information as well as passive information captured through eavesdropping. App users are also assumed to have access to the open-source app code. Furthermore, we assume users can only have access to data stored on other smartphones through theft or coercion. ii) External actors are devices that are active on the Bluetooth network that are able to either passively eavesdrop on messages exchanged between app users or that can actively insert their own messages. iii) Back-end server is assumed to be honestbut-curious (follows protocols honestly, is curious to know the data but does not alter the data), where the administrators have access to all data received and stored at the back-end server. iv) The government can access any information stored on individual smartphones or the back-end server through subpoenas to investigate a group or individual user of the app. HAs have a special status in our threat model. They are assumed semi-trusted in that any app user can voluntarily contact them for upload authorisation, advise or manual contact tracing thus revealing their true identity. However, without this voluntary contact, HAs cannot learn anything about the contact tracing data captured by the system. We list the requirements associated with digital contact tracing protocols in Table 3 . We classify these requirements into three categories: security, privacy and operational. We also highlight which of these properties are achievable in DIMY. In this section, we provide additional details about these properties. A QBF is sent to the back-end blockchain for matching with stored CBFs. Since the check is performed on Bloom filters by a smart contract run by peer nodes, it returns a match only if a match exists. This is ensured by the fact that false negatives are not possible in Bloom filters matching and the assumption that the smart contracts implementation is correct. We note that both CBF and QBF are sent through a secure channel. Thus DIMY satisfies the completeness property. A user who was not in close proximity with a COVID positive patient will receive a match (False positive) with low probability due to the following reasons. i) Bloom filter matching process produces false positives (as computed by Equation 1). ii) False positives due to replay/relay attacks. Our design does J o u r n a l P r e -p r o o f Journal Pre-proof not permit replay attacks, however relay attacks are still possible due to inherent characteristics of radio-based communication. An attacker, who was not in proximity with a COVID positive patient, cannot construct an encounter identifier for a QBF that will match with the CBF. Moreover, an attacker needs to be in close proximity of a user to collect sufficient shares before trying to break the encounter identifier established through Diffie-Hellman process. The attacker will not be able to compute the encounter identifier because of the hardness of the computational DH problem. In our proposed protocol, the blockchain is adopted at the backend, which provides transparency on the integrity, trustworthiness, and availability of the data stored on the chain. This is assuming correct implementation of smart contracts for the blockchain. The blockchain stores only CBFs. A user who receives a match from the blockchain in response to the QBF, cannot be certain which COVID positive user they have met as the riskanalysis is performed at the backend without sharing the CBFs uploaded by infected users. There is additional protection as the match response could be a false positive due to probabilistic nature of Bloom filters. Here we ignore the case in which a person has been in contact with only one person during the last 21 days or uploads only one-entry QBF and receives a "match". In such a case, the identity of the COVID positive patient is known. Similarly, identities of users who have been warned are also kept private assuming risk-analysis is performed by a correctly implemented smart contract without any human involvement, and use of secure communication channel between users and the blockchain. Assuming a device is compromised. Since the EncID and EphIDs are not known, and the secrets corresponding to the EphIDs of the device X At are not known, the attacker cannot know the identity of users in close proximity with the device as all contacts are encoded in Bloom filters using one-way hashing. Bloom filters also provide an implicit privacy protection against possible back-end breaches significantly reducing the backend's ability to construct social graphs based on the diagnosed user's contacts. The only information the backend can infer is an estimate of the number of contacts encoded in the CBF uploaded by a positive case, without identifying who these contacts are. As users do not have direct access to the CBF stored in the blockchain, they are unable to extract any sensitive information. The devices only broadcast shares of the anonymous identifiers E phID via BLE advertisements. These E phIDs are first used to construct encounter IDs that are then encoded in the Bloom filter, which stops underlying linkages from being created between these pseudo-identifiers and concrete IDs in the real world. This binary data encoded in a Bloom filter becomes semantically meaningless to any other user, and even the backend cannot associate the reported data with an infected person or any specific individual. Moreover, all encounter information constructed through the Diffie-Hellman exchange is deleted once it is encoded in the DBF, hence protecting data in case a device gets physically stolen or the user is forced to reveal app data under coercion. No location information is collected by the protocol. Thus, it is hard to extract any extra information that could assist a compromised backend to use the stored data for any other purpose. However, the DIMY protocol is also susceptible to privacy attacks launched by malicious users/external actors who may use a modified application to collect other contextual information regarding the contacts. Multiple malicious users may work together, combining their information, to collect a large number of recorded broadcasts with metadata on time and location, etc. Another potential privacy concern around contact tracing apps is known as the function creep [11] . Function creep refers to the evolution of the app to include functionalities other than the original ones, i.e., the app has the potential to be turned into an instrument of mass surveillance, violating human rights. For our proposed protocol, data storage in bloom filters and the use of blockchain provides trust in the intended use of the data. The operational requirements listed in Table 3 have already been discussed in Section No 4. In this section, we will explore the resilience of our proposed design against common attacks launched against digital contact tracing apps. In this type of attack, the goal of an adversary (an external actor) is to inject malicious contact entries such that these result in false positives during the contact verification stage. An adversary can capture the advertised messages by a user's device and later on replays these at another location. Our proposed solution provides a safeguard against such attacks by using the secret sharing scheme. An adversary must capture at least k shares of a message, before taking these shares to another location for rebroadcasting. However, in order to be counted as false positives, the originator of the messages must also have matching entries in their logs. The only way this attack would be possible is if the adversary moves back and forth between two different locations, collecting shares and rebroadcasting these to ensure the existence of symmetric contact information. An adversary's objective during a relay attack is the same as it is in a replay attack. An adversary can capture a user's advertised shares and immediately relay the captured message at the same location, extending the range of the message. Our proposed solution is susceptible to relay attacks that are inherent to all schemes using BLE messages. It is possible to rebroadcast shares such that two users, Alice and Bob, have symmetrical contact information even though they were not in direct contact with each other. We point out that the adversary has to be in direct communication range of both Alice and Bob for k minutes to force the symmetric contact information. As a consequence, if either Alice or Bob tests positive, the other user would be informed of a 'false positive' close contact. The adversary's goal in this type of attack is to link the BLE information broadcasts with a device identifier to track a particular device. A passive actor can listen for BLE messages and transfer these to a central tracking server. The server can diffuse information from multiple tracking devices to estimate the position and movement pattern of the device being tracked. In most cases, the Bluetooth MAC address is randomised for a short period to limit this tracking. In our proposed solution, we use chunks of E phIDs that are different from each other and use the hash of E phID to link all these shares together. An adversary can use this hash in combination with the randomised MAC address to perform limited local tracking. This attack can be launched by an app user or an external actor with the goal to discover the presence of a user at a known location unless some location privacy preservation mechanism is employed [32] . This is accomplished by linking contextual information, such as the mobile phone model advertised in some of the apps that are based on centralised architectures. This type of attack is not possible in our proposed protocol, due to the use of ephemeral identifiers and the suppression of other information that links the device with a particular user. An adversary (external actor or malicious app user) may want to estimate the number of users who have uploaded their contact tracing data after testing positive with COVID-19. In our proposed protocol, the encounter data is first encoded in Bloom filters before being stored on the blockchain. A user is allowed to query the blockchain for matching any encounter record, without revealing the records stored on the blockchain. An adversary is thus unable to launch an enumeration attack. Note that as the HAs authorise all uploads of CBFs to the blockchain, and there are multiple HAs that exist in the system, they can collude with each other to arrive at the total number of COVID-19 cases that have uploaded CBFs to the blockchain. In this type of attack, an adversary generates fake advertisements to consume the storage and battery resources of other devices. Digital contact tracing apps are prone to this attack irrespective of their underlying architecture. In our proposed protocol, a malicious user can force other devices to process BLE advertisements containing shares of fake E phIDs. The aim of this attack is to deanonymise a user based on the information collected either through the system or by using a side-channel. The system has to be resilient against this attack launched by any of the adversary described in Section 6.1. For malicious users and external actors, our system provides safeguard by not sharing the information regarding close contacts and positive cases with other users. Additionally, all encounter data is stored in Bloom filters at the devices as well as the backend. This provides protection from anyone who can get access (legal/illegal) to the devices or the backend. An address carryover attack is possible for contact tracing apps when the change over time of a randomised Bluetooth MAC address and the temporary identifier are not synchronised. A listener can thus easily link the multiple Bluetooth MAC addresses advertised within the same identifier's life time or vice versa. Our proposed solution relies on the simultaneous advertising of two identifiers to enable the correct contact information to be captured (discussed in Section 4.3). This mechanism may result in a carryover attack for tracking purposes, whereby an adversary can associate multiple advertised identifiers with the BT MAC addresses used by a device. An adversary can associate the E phID hash that is being advertised along with random MAC and chunks of the raw E phID to track a user's device, as long as that device is within the communication range. Social graph construction enables the identification of a person's close contacts. This is an imperative part of manual contact tracing in which a health official conducts interview with a positive case to identify their at-risk close contacts. In our proposed solution, we have employed two mechanisms that prevent the construction of social graphs. First, we generate ephemeral IDs on the devices as opposed to by the server. This means that the backend cannot link an E phID with a user. Second, we have employed Bloom filters to hide the contact information of positive cases from the backend. The backend is thus unable to construct a social graph without access to the contact history. Table 4 summarises this section with details of the attacks that could be launched against various architectures, including against our proposed design. In this section, we present a quantitative evaluation of our proposed blockchain-based backend solution, in terms of throughput, latency and resource consumption. This section is divided in to two parts. We first describe the proof of concept implementation using a local back-end server hosting the blockchain. We note that for these experiments, we generated synthetic data on the device level and supplied this to the blockchain. In the second part, we elaborate on a real-world deployment scenario, whereby we implemented the front-end apps on both Android and iOS, and hosted the blockchain on the AWS cloud [33] . We implemented a proof-of-concept of our proposed framework using Hyperledger Fabric v2.0, as it allows flexibility in modelling Bloom filters in a permissioned blockchain environment. We opted to use a permissioned blockchain to control the app user's access by regulatory organisations, such as the health authority. We consider the standard configuration of a solo orderer node with one communication channel as the consensus mechanism. Experiments have been conducted on a GPUequipped server with 12 cores of CPU and 64GB of memory, running Ubuntu Linux 18.04 LTS. We deployed different numbers of Hyperledger nodes as Docker containers on the server. We implemented the core functions of DIMY transactions as chaincodes written in the Go programming language by selecting native Go implementation of the Bloom Filter v2.0.3 8 and a non-cryptographic murmur hashing function for Go in the Hyperledger Fabric. We benchmarked our proof-of-concept implementation using Caliper v0.3.2 9 , an official tool from the Hyperledger foundation that allows blockchain designers to measure the performance of the implementation of a specific blockchain. We measured the performance of our implementation in terms of throughput, latency, CPU and memory consumption, repeating the measurements for 30 seconds. In this experiment, we examined the throughput and latency of different DIMY blockchain transactions of uploading CBF, token issue and querying through QBF, when a load of 50 tx/second was sent to the blockchain. We define throughput as the rate at which transactions are successfully executed and latency as the time required to complete the transactions. Please note that although a complete processing cycle of our architecture includes the serial execution of multiple DIMY transactions, we examined the throughput and latency only for each individual transaction. For instance, we assume that the CBF is already uploaded to the blockchain and we only measure the throughput and latency for executing query QBF. This definition does not consider the network latency, which could be impacted by different external factors. We plot the results in Figure 6 with a constant transaction rate of 50 tx/second. Among the three DIMY blockchain transactions, QBF upload and matching show the lowest latency, while uploading CBF has the highest latency, at around 4700 ms. Additionally, issue tokens and upload CBF operations failed to achieve the throughput of 50 tx/second with peaks at about 45 and 39 transactions per second, respectively. The latency for uploading 50 simultaneous CBFs is the highest compared to other operations, as this operation involves transaction insertion and consensus mechanisms at the blockchain. QBF matching, on the other hand, performs very well in terms of latency and throughput. We note that upload CBF and issue token transactions are only performed once for each identified COVID-19 positive case. On the other hand, devices upload QBF for matching once in a 24-hour cycle, making this the most frequently executed operation. The blockchain thus performs well to support low latency and high throughput for the upload QBF operation. We compare the CPU and memory consumption of different operations performed on the blockchain and show the results in Figure 7 . We captured CPU and memory usage of the backend employing a Caliper monitor when we applied a load of 50 tx/second to the Hyperledger Fabric network for 30 seconds. The results show that the query operation using QBF requires the most resources in terms of average memory consumption and CPU utilisation. Moreover, the CPU utilisation considerably varies between the three test operations. The upload QBF operation provides high throughput with low latency while requiring the most resources. This result highlights that resource requirement for QBF matching operation is a fundamental design factor for provisioning the backend, as QBF matching is the most used operation on the backend. The following experiment aims to examine the effect of using different sized Bloom filters on the resulting latency of QBF matching operations on the blockchain. We executed different transaction rate loads, namely 75, 100 and 125 tx/second, noting both maximum and the average latency for QBF sizes varied from 10KB to 1000KB. The results shown in Figure 8 demonstrate that the average latency remains less than 100 ms for all QBF sizes less than 900KB, while the maximum observed latency remains less than 100 ms for QBF sizes up to 100KB. The maximum latency starts increasing considerably, especially for higher transaction rates, if the size of the QBF is increased beyond 100KB. This result shows that our chosen value of 100KB for Bloom filters is optimal to minimise the maximum observed latency for different transaction rates. Lastly, we investigate the throughput and latency for querying QBF with different number of Hyperledger peer nodes (2, 4, and 8 nodes) and plot the results in Figure 9 . During this experiment, we started from 50 QBF tx/second and gradually increased the load to 500 tx/second. The blockchain was able to deliver increasing throughput in line with the transaction workload. Besides, maximum latency remains below 100 ms for transaction rate of up to 500 tx/second for all explored cases except the 2-node configuration that shows an increase in latency beyond the 250 tx/second. However, the average latency stays low (less than 50 ms) for all cases. We note that this observed performance is achieved with the hardware resources used for this proof of concept described in Section 7.1. Next, we implemented the DIMY protocol on both Android and iOS platforms and deployed the backend blockchain implementation on the AWS cloud. Figure 10 shows the screenshots of the DIMY app running on an iOS device. For the backend, we instantiated a single AWS EC2 instance (t2.small with a single 2.4GHz CPU and 2GB RAM) and ran two HyperLedger nodes and one orderer node as Docker containers. Note that this AWS instance has considerably lower resources as compared to our local GPU based server. We ran the scalability analysis on the AWS setup measuring the throughput and the latency. Figure 11 shows the throughput and average latency analysis for this setup when a load of 50 tx/sec is applied for three different DIMY operations i.e., query QBF, token issue and upload CBF. The backend can provide a throughput that matches the applied load for both query QBF and token issue operations. However, the resource-constrained AWS instance could not cope with the 50 tx/sec load for upload CBF operation resulting in very low throughput and high average latency. Figure 12 show the throughput and latency analysis for the query QBF operation when the rate of transactions is varied from 10 to 90 tx/sec. The backend is able to match the throughput with the applied load up to 60 tx/sec giving less than 100 ms latency. The average and maximum latencies increase with a drop in throughput for any further increase in applied load beyond 60 tx/sec. In summary, results from this small scale deployment of the DIMY protocol with the back-end blockchain hosted on AWS indicate the importance of adequate resource provisioning on the cloud to achieve effective performance especially for the upload CBF operation. In this paper, we have presented the design and security and privacy evaluation for DIMY, a new privacy-preserving digital contact tracing protocol. Our protocol design integrates several privacy-preserving techniques, assuming a threat model that includes malicious users and an honest-but-curious backend. We employed Bloom filters to enhance the privacy protection as well as to considerably cut down storage requirements both on the client's device and the backend. Our protocol is resilient against most of the security and privacy attacks commonly launched against digital contact tracing apps. The proposed protocol incurs negligible overheads and supports low latency operations on the backend side, as demonstrated in our performance evaluations. As our future work, we plan to deploy DIMY on our university campus, for which we have already acquired the ethical approval for the pilot deployment. We also aim to perform J o u r n a l P r e -p r o o f Journal Pre-proof experiments to evaluate the performance of the frontend implementations for iOS and Android in terms of battery and storage consumption. Finally, we plan to conduct further scalability analysis of the blockchain-based backend operating on the AWS cloud for the real-world setting. Nadeem Ahmed, Regio A. Michelin, Wanli Xue, Guntur Dharma Putra, Sushmita Ruj, Salil S. Kanhere, Sanjay Jha  We focus on the privacy and security of digital contact tracing data to prevent its misuse  Recent security and privacy analysis of existng apps has revealed several risks and issues  Our proposed method enhances the data privacy and reduces the storage requirements both on mobile devices and the backend  We consider a comprehensive threat model and provide safeguards against several type of adversaries  Performance evaluatons have proved the efcacy of the proposed protocol Bluetrace: A privacy-preserving protocol forcommunity-driven contact tracing across borders The pact protocol specifications Exposure notification api A flood of coronavirus apps are tracking us. now it's time to keep track of them Centralized or decentralized? the contact tracing dilemma Analysis of dp3t: Between scylla and charybdis Desire: A third way for a european exposure notification system leveraging the best of centralized and decentralized systems Contra corona: Contact tracing against the coronavirus by bridging the centralized-decentralized divide for stronger privacy TCN protocol for decentralized, privacy-preserving contact tracing DESIRE: A practical assessment Beeptrace: Blockchain-enabled privacy-preserving contact tracing for covid-19 pandemic and beyond Decentralized blockchain for privacy-preserving large-scale contact tracing Towards defeating mass surveillance and sars-cov-2: The pronto-c2 fully decentralized automatic contact tracing system New directions in cryptography How to share a secret Space/time trade-offs in hash coding with allowable errors Probability and computing: Randomized algorithms and probabilistic analysis Bitcoin: A peer-to-peer electronic cash system Block-def: A secure digital evidence framework using blockchain Hyperledger -open source blockchain technologies The decision diffie-hellman problem Cardinality estimation and dynamic length adaptation for bloom filters Hyperledger fabric: a distributed operating system for permissioned blockchains Fastfabric: Scaling hyperledger fabric to 20, 000 transactions per second Location privacy challenges in mobile edge computing: Classification and exploration 2021 IEEE International Conference on Blockchain and Cryptocurrency (ICBC) His research interests cover distributed systems and the IoT. He also looks into blockchain applicatons for securing IoT She serves as a reviewer of Mathematcal Reviews, and an Associate Editor of Elsevier Journal, Informaton Security and Applicatons. She is a recipient of the Samsung aRh award, NetApp Faculty Fellowship, Cisco Academic arant and IBM hCSP grant He is a Professor of Computer Science and Engineering at UNSW Sydney, Australia. His research interests include the IoT, blockchain, pervasive computng, cybersecurity and applied machine learning. He is a Senior Member of the IEEE and ACM, an Humboldt Research Fellow and an ACM Distnguished Speaker. He serves as the Editor in Chief of the Ad Hoc Networks journal and as Associate Editor of IEEE TNSM, ChMChM and PMC. He has served on the organising commitee of several This work has been supported by the Cyber Security Cooperative Research Centre Limited (CSCRC), whose activities are partially funded by the Australian Government's Cooperative Research Centres Programme. The authors would like to thank Wei Song for help in implementation of the DIMY front-end mobile apps.