key: cord-0158858-q7ieqx83 authors: Ratliff, Zachary; Khoury, Joud title: SNARKs to the rescue: proof-of-contact in zero knowledge date: 2020-05-26 journal: nan DOI: nan sha: fa74213290a4d36045d28b2d61dc3270b6932cc1 doc_id: 158858 cord_uid: q7ieqx83 This paper describes techniques to help with COVID-19 automated contact tracing, and with the restoration efforts. We describe a decentralized protocol for"proof-of-contact"(PoC) in zero knowledge where a person can publish a short cryptographic proof attesting to the fact that they have been infected and that they have come in contact with a set of people without revealing any information about any of the people involved. More importantly, we describe how to compose these proofs to support broader functionality such as proofs of nth-order (transitive) exposure which can further speed up automated contact tracing. The cryptographic proofs can be publicly verified. In both cases, the burden (and control) is on (and with) the person proving contact or health and not on (with) third parties or healthcare providers rendering the system more decentralized, and accordingly more scalable. Contact tracing, identifying and notifying individuals who have been in close contact with an infected individual, is widely recognized as an essential tool in protecting against the spread of the novel COVID-19 virus. Automated approaches to contact tracing can help significantly scale the effort relative to manual approaches alone which tend to be slower and more labor intensive. Implementations of automated contact tracing systems must however address the privacy concerns of individuals in order to enjoy widespread adoption, something that early straightforward attempts failed to do [1, 2] . There is a large body of recent proposals for auto-mated Bluetooth-based contact tracing systems with differing privacy guarantees. We refer the reader to [3] for a recent survey. The systems fall into two general categories based on their information flows: decentralized vs centralized. With decentralized approaches, a user's mobile device generates ephemeral randomized tokens that are regularly broadcast to (and received by) nearby mobile devices. Devices save the tokens they broadcast and the ones they receive for a defined period of time. Once an individual tests positive, he can opt to report all the tokens his application generated. Reporting is done with the help of the healthcare provider or some third party. Other individuals who saw the token, and accordingly were in close physical proximity to the infected individual, learn they are at-risk and may seek testing and/or quarantine as a result. On the other hand, centralized approaches use a central server to generate ephemeral tokens that users share with each other and reporting involves the central server making the connections and alerting users [3] . The majority of these existing proposals for automated contact tracing are slow to react and do not adequately address exposure risk. Specifically, they only alert the first level of individuals who have come in close contact with the infected individual after the latter tests positive. Such first order contact tracing may not be fast enough to control the spread of the virus in a timely manner given that there is a period of time in which individuals can be asymptomatic but infectious, and this period is generally longer than the virus incubation period. Consider for example the following scenario. Alice is asymptomatic but infectious at time t 0 and comes in contact with Bob. Bob gets infected and comes in contact with Charlie at time t 1 ≥ t 0 + P I where P I is the virus incubation period. Alice starts showing symptoms and tests positive at time t 2 ≥ t 1 , at which point Bob gets notified. Bob may not show symptoms, may wait to get tested, or may not even get tested. Even if Bob gets tested at time t 3 > t 2 , there is a period of time (could be several days) during which Charlie is not even aware of the exposure risk, and is going about his business as usual. We propose a new protocol for privacy-preserving contact tracing that does not suffer from these limitations. Our protocol permits an individual A who tests positive to quickly furnish a cryptographic proof attesting to the the following statements: Anyone, including, B can publicly verify the proof, and seek testing if the proof checks and they are involved. Using this first proof, individual(s) B who came in close proximity to A can then quickly publish a cryptographic proof attesting to the fact that B was in close proximity to the individual who tested positive (in this case A), and that B was in close proximity to other individual(s) C at t ≥ t + P I . This allows individual(s) C who came in close proximity to B to realize their exposure risk in a timely manner, and act accordingly. Our protocol relies on zero-knowledge succinct non-interactive arguments of knowledge (zk-SNARK) as the cryptographic building block. Cryptographic proofs are succinct, consisting of only a few hundred bytes, and take just a few milliseconds to verify. The zero knowledge property ensures that a person verifying these proofs only learns the statement "I was close to someone who tested positive for the virus" or "I was close to someone who was close to someone who tested positive for the virus" and so on, but nothing else such as who the person is or where the interaction occurred (with some caveats discussed later in the paper). Our approach is fully decentralized requiring little assistance from the healthcare provider, and may be extended to support broader functionality (e.g., proof of health/immunity). We start by describing the simple proof-of-contact protocol, and extend it to support transitive exposure by composing proofs of contact using proof-carrying data (PCD) [4] . Additionally, we demonstrate a simple SNARK-based construction of proof-of-health/immunity, useful for a society in the restoration phases of a pandemic. In summary, our protocol offers the following benefits: • Efficiency is achieved using efficient preprocessing zk-SNARK construction and performing the signature verification outside the SNARK to reduce prover cost [5] . • No trusted third parties or databases required. The public registry need not be trusted. We only require that the zk-SNARK for the desired functionality is correctly setup. • Strong end-to-end privacy guarantees. Proximity tokens are not shared; not with third parties nor with healthcare providers. • Correctness. A valid proof guarantees the authenticity of the user's test results and the validity of the statement. • Adoption/Practicality. A medical organization only needs to sign records using an existentially unforgeable and publicly verifiable signature scheme. This is a simple task for the medical organization and it deters malicious (noninfected) users from seeking signatures. • Decentralization. The burden is on the infected person to actually prove and publish. This allows better scaling (instead of requiring providers or third parties to centrally manage patient proximity data) and potentially better privacy since the user (the stakeholder) has full control over their private data and can share at will. • Exposure risk using proof composition. Transitive exposure risk is available in a timely manner through proof composition. This allows users who may have had secondhand contact with an individual who tested positive for a virus, to learn in zero-knowledge about this exposure. Contact tracing requires monitoring and recording physical interactions between clients. For example, if Alice walks into a cafe where Bob is eating, a method for detecting and measuring their proximity is needed. There have been several works proposing various means of proximity sensing between mobile phones, including using Bluetooth [6] , WiFi [7] , and audio [8] signals. Regardless of the underlying technology, we assume a mobile phone frequently broadcasts proximity tokens that are received by nearby phones. For example, within each epoch, the phone frequently broadcasts its unique token, and receives tokens from nearby phones. This simplified model has been adopted by the majority of decentralized privacypreserving contact tracing protocols [3] . We review the definitions of arithmetic circuits, preprocessing zero knowledge succinct non-interactive arguments of knowledge (pp-zk-SNARKs) and we refer the reader to [9] for details. First, we introduce arithmetic circuit satisfiability in Field F. An F-arithmetic circuit C : Here a is called the witness (auxiliary input) and x is the public input and the output is 0 l . The language of the circuit is defined by L C = {x : ∃a, C(x, a) = 0 l }. Here x ∈ F n (i.e., x is represented as n field elements), a ∈ F h , and the output in F l . A hashing circuit for example takes the (private) input/witness a and its hash x, and asserts that H(a) = x. A preprocessing zk-SNARK (pp-zk-SNARK) for Farithmetic circuit satisfiability comprises three algo-rithms (G, P, V ), corresponding to the Generator, the Prover, and the Verifier. G(λ, C) → (pk, vk) Given a security parameter λ and the F-arithmetic circuit C, sample a keypair comprising a public proving key pk and a public verification key vk. P (pk, x, a) → (π) Given the public prover key pk and any (c, a) ∈ R C , generate a succinct proof π attesting that Proof-carrying data (PCD) captures the security guarantees necessary for recursively composing zk-SNARKs. More specifically, given a compliance predicate , a PCD system checks that a computation involving a set of incoming messages z in , private local data z loc , and outgoing message z out , is -compliant. Formally, a proof-carrying data system consists of three polynomial-time algorithms (G, P, V ) corresponding to the Generator, Prover, and Verifier. G(λ, ) → (pk, vk) Given a security parameter λ and the compliance predicate expressed as a F-arithmetic circuit, sample a keypair comprising a public proving key pk and a public verification key vk. P (pk, z in , π in , z loc , z out ) → (z out , π out ) Given the public prover key pk, a set of input messages z in along with compliance proofs π in , local input z loc , and output z out , generate a succinct proof π out attesting that z out is -compliant. Consider an existentially unforgeable signature scheme S = (G S , S S , V S ) (e.g., ECDSA) with private signing key v s and public verification key p s . Let H, H 1 , H 2 be three collision-resistant hash functions. Let (G, P, V ) be a pp-zk-SNARK. The baseline protocol builds on [10] and works as follows: • Trusted setup phase: a trusted entity sets up the system and runs the generator algorithm G(λ, C) → (pk, vk); we describe the circuit C in more detail shortly. During this phase, each healthcare provider obtains a certificate for its signing key signed by a trusted certification authority. • Each user generates a private random string S • User A generates a random token every time period t (the epoch e.g., 5 minute intervals) as T A,t = H 1 (S, t), and frequently broadcasts the token. We omit the time subscript hereafter whenever it is clear. • Whenever user A receives a proximity token from user B at time t, she computes h = H 2 (T A , T B , t) and stores it for 14 days. User B computes the same output. Here we sort the tokens (e.g., lexicographically) before passing them to the hash function. • User A then generates a short cryptographic proof using P (pk, (h, h s ), (S, T A , T B , t )) → π attesting to these facts • User A publishes tuple (π, h, h s , s) to some public registry. If the public registry already contains a tuple with the value h, then the user does not upload these values (in order to prevent linkability). Several techniques may be used here for network unlinkability (e.g., the user app can either use mixing or onion routing solutions, or the provider can publish the material on behalf of the user). • User B checks the public registry periodically to find a matching h and can quickly verify the proof using V (vk, (h, h s ), π). If the proof checks, user B verifies the signature V S (p s , s, h s ) given h s and the public verification key p s of the healthcare provider. • User B seeks testing, and can show the proof-ofcontact to her healthcare provider to expedite the process if needed. Linkability Tokens are never shared, or published. Only the hash of two tokens is published after a user tests positive. This means different tokens may not be linked as belonging to the same user. The same is true with linking different hashes. Recall when reporting a positive test, user A publishes h = H 2 (T A , T B , t) for all proximity edges. Only user B or some dishonest user C who forms a clique with A and B at time t may learn h. Since user C is part of the clique, h does not leak additional information. User C cannot use h to create valid proofs on behalf of A or B without knowledge of their private strings S. Identification After seeing a proof containing h, a curious user B who keeps track of all physical encounters can a posteriori identify the infected person in some form. This attack is common to the majority of the decentralized systems [3] . We observe that some form of this leakage is inherent to the protocol. For example, if user B has only encountered one person before getting alerted, user B will be able to identify the infected person no matter how privacy-preserving the alert/protocol is. This may be acceptable in some cases, for example, learning that the "tall person in the dairy aisle at the grocery store" tested positive. A more recent decentralized protocol that mitigates identification attacks has been proposed using "parroting" [11] . As discussed earlier, we think our general idea can be applied to this class of protocols as well. The same technique may be used by user A to prove their health i.e., that user A received a negative test from her healthcare provider within the past day or week. The same is true for proving immunity with the antibodies test that is gaining traction as communities look towards restoration. Using zk-SNARKs to do this avoids any dissemination of the sensitive test results to third parties keeping users in control and maintaining decentralization. It also provides assurances to the other party that allows them to trust the result. As discussed earlier, it can be beneficial to provide more granular nth order exposure risk data to users to limit the spread of the virus. For example, a user may want to know whether they have had transitive exposure to a virus. Consider that Alice comes in contact with both Bob and Charlie independently of one another. Later, Bob tests positive for the virus, and Alice is alerted that she is at risk. Although Charlie did not directly come in contact with a carrier of the virus, he may find it useful to know that someone he came in contact with has. This transitive approach to contact tracing could enable more informative statistics for users such as a risk profile, i.e., a risk score based on how many degrees of exposure an individual has. Someone who is four transitive hops away from a virus carrier would be at lower risk from someone who is two hops away. A strawman approach to extending the proof-ofcontact protocol for transitive proofs works as fol-lows: • As in the original protocol, a trusted entity sets up the system and runs the generator algorithm G(λ, C 2 ) → (pk 2 , vk 2 ); here C 2 is an additional circuit with corresponding prover and verifier keys (pk 2 , vk 2 ), for proving transitive exposure. • User B checks the public registry periodically to find a matching h i (from some user A who tested positive) and can quickly verify the proof using V (vk, (h i , h s ), π). If the proof checks, user B verifies the signature V S (p s , s, h s ) given h s and the public verification key p s of the healthcare provider. • User B then generates a short cryptographic proof using → π attesting to these facts • User B publishes tuple (π, h i , h j ) to the public registry. • User C checks the public registry periodically to find a matching h j and can quickly verify the proof using V (vk 2 , (h i , h j ), π). If the proof checks, user C can recursively verify the next proof in the chain until eventually arriving at the original proof. Finally, user C verifies the original proof using V (vk, (h i , h s ), π). Observe that in this case the SNARK includes the constraint t 2 − t 1 ≤ 3, corresponding to the 3 day incubation period of COVID-19. The parameter is configurable, however, in general the time that Bob comes in contact with Charlie should come after the time Bob came in contact with Alice plus the incubation period. This will reduce the number of false positives that arise when Bob alerts Charlie of 2nd-order exposure even though Bob could not have possibly become contagious from Alice yet. The above protocol suffers from a linkability flaw with the uploaded (h i , h j ) pairs. An adversary observing the public registry can deduce that whoever uploaded h i must have came in contact with the person who uploaded h j . In order to circumvent this drawback, we modify the protocol to use proof-carrying data (PCD). Using PCD, previous proofs in the chain are verified and a proof that this verification was performed correctly is provided. The PCD system hides the details of intermediate proofs, while allowing a user to verify that the entire chain is valid. Instead of uploading the pairs (h i , h j ), transitive proofs consist only of single h values which are indistinguishable from random. For proof-of-contact, we represent the compliance predicate as the hospital signature verification algorithm V S (p s , s, h s ), coupled with the steps necessary to prove that the randomness of h i is consistent with the randomness of some h j . More formally, a user who tested positive can perform the -compliant computation M 0 that takes as input z in = (h s , s, p s ), z loc = (t, t , T B1 ) and outputs h i satisfying the following constraints: The user then uploads the value h i along with a cryptographic proof attesting that M 0 is compliant. For proving transitive exposure, a user B who sees the value h i along with the PCD proof π i attesting to first-hand exposure, performs the -compliant computation M 1 that takes as input z in = (h i , π i ), z loc = (t 1 , t 2 , T C ) and outputs h j satisfying the following constraints: Additionally, user B runs a verifier circuit over π i and provides a cryptographic proof that V (vk, h j , π i ) = 1 and M 1 is -compliant. Figure 1 illustrates the complete flow from proof-of-contact to proof of transitive exposure. Choice of digital signature scheme Encoding the digital signature verification scheme inside the compliance predicate is expensive with respect to circuit size. For this reason, we choose the RSA digital signature scheme which can be represented efficiently over F p by choosing public exponent e = 3 and performing modular multiplication via radix √ p arithmetic as suggested in [5] . In some cases, contact tracing by measuring proximity between users may not be sufficient for effectively curbing the spread of a virus. A virus that lives for extended periods on surfaces could transmit from one user to another even though they have never been in close contact. For example, if a contagious user Alice sits on a park bench, Bob, who visits the park the next day, may become infected from sitting on the same bench. If Alice tests positive, it would be ideal that users who are at risk from the surface spread of the virus are alerted. One approach is to place Bluetooth devices around public spaces, and have them participate in the contact tracing protocol. The devices could exchange tokens with users and verify proofs in the usual way. After discovering a matching token, and verifying the corresponding proof, the device uploads a transitive proof of exposure, which alerts users of the surface transmission risk. Suppose rather than using PCD, the Bluetooth device on the park bench simply uploads its secondary tokens, i.e., the tokens exchanged with users within 14 days of Alice's park visit. Although Bob is alerted of his surface contact, he must trust that the park Figure 1 : Overview of proof-carrying data for transitive exposure bench Bluetooth device is acting honestly since he has no way of verifying that an infected user actually came in contact with the park bench. By using PCD, Bob maintains all the security and privacy guarantees that hold from the original contact tracing protocol. We implemented a simplified proof-of-concept zeroknowledge proof-of-contact SNARK using the libsnark library [12] . The library uses the NP-complete language R1CS to express the arithmetic circuits that represent the SNARK. There are existing R1CS gadgets for performing useful functionality, such as comparisons and collision-resistant hashing. It includes an implementation of the subset-sum collisionresistant hashing gadget, which we use as an efficient one-way hash. We characterize the performance of our proof-ofcontact SNARK in terms of the running time and key sizes for both the prover and verifier (Table 1) . Since the generator phase is only executed once during setup, we provide concrete numbers on the size of the arithmetic circuit (3060 gates) but disregard the time of the generator (166 ms). The circuit did not account for sorting. There are a few existing proposals for privacypreserving contact tracing [ [17] . Although most of these works suggest similar tech- niques for estimating and exchanging proximity information between users, the underlying cryptographic protocols and their privacy guarantees differ. [10] [14] use randomly generated pseudonyms that nearby users can exchange over Bluetooth. Individuals who test positive for a virus can upload their generated pseudonyms to a public registry, allowing other users to match the tokens they have collected with those in the registry. Both works suggest that healthcare workers should be the ones to upload users' tokens to the public registry after giving a positive test diagnosis in order to prevent malicious polluting of the database. Similar to the protocol introduced in this work, mixing can be applied to prevent linkability via traffic analysis. Apple and Google have released a protocol specification that closely resembles that of [10] and [14] . Users generate a rolling pseudorandom identifier and some associated encrypted metadata, that nearby users exchange over Bluetooth. The pseudorandom identifiers are derived using the current time and temporary exposure keys, which get distributed after a positive diagnosis. Finally, [16] proposes partitioning GPS and time data into discrete spatiotemporal points and obfuscating these points using a one-way hash function. Infected users upload their obfuscated location histories after redacting personally identifiable information such as the GPS coordinates that represent a home or work address. Using private-set intersection (PSI), individuals can privately determine whether or not their location history overlaps with that of infected users. The privacy guarantees of such an approach differ significantly from that offered in [10] [14] [15] , and those presented in this paper. The approaches described above provide different flavors of privacy and decentralization. How-ever, each solution places an increased burden on the healthcare providers relative to the zero-knowledge SNARK technique we have outlined. Our approach requires only that healthcare workers sign positive diagnoses rather than generate one-time codes or upload tokens to a public registry. Additionally, our approach is fully decentralized and supports broader functionality such as proofs of transitive exposure, not currently supported by the other proposed solutions. Help speed up contact tracing with tracetogether Contact tracing mobile apps for covid-19: Privacy considerations and related trade-offs Centralized or decentralized? the contact tracing dilemma Proof-carrying data: Secure computation on untrusted platforms (high-level description) Photoproof: Cryptographic image authentication for any set of permissible transformations Face-to-face proximity estimationusing bluetooth on smartphones Inferring personto-person proximity using wifi signals Soundbased proximity detection with mobile phones Scalable zero knowledge via cycles of elliptic curves Anonymous collocation discovery: Harnessing privacy to tame the coronavirus Automated exposure notification with enhanced privacy guarantees Decentralized privacy preserving proximity tracing Pan-european privacy-preserving proximity tracing Privacy-preserving contact tracing -apple and google Assessing disease exposure risk with location data: A proposal for cryptographic preservation of privacy Apps gone rogue: Maintaining personal privacy in an epidemic