key: cord-0521684-74we0ppx authors: Murphy, Kevin; Kumar, Abhishek; Serghiou, Stylianos title: Risk score learning for COVID-19 contact tracing apps date: 2021-04-17 journal: nan DOI: nan sha: d08d5808d5ded846f2d0885adfddb2535877b7a0 doc_id: 521684 cord_uid: 74we0ppx Digital contact tracing apps for COVID, such as the one developed by Google and Apple, need to estimate the risk that a user was infected during a particular exposure, in order to decide whether to notify the user to take precautions, such as entering into quarantine, or requesting a test. Such risk score models contain numerous parameters that must be set by the public health authority. In this paper, we show how to automatically learn these parameters from data. Our method needs access to exposure and outcome data. Although this data is already being collected (in an aggregated, privacy-preserving way) by several health authorities, in this paper we limit ourselves to simulated data, so that we can systematically study the different factors that affect the feasibility of the approach. In particular, we show that the parameters become harder to estimate when there is more missing data (e.g., due to infections which were not recorded by the app), and when there is model misspecification. Nevertheless, the learning approach outperforms a strong manually designed baseline. Furthermore, the learning approach can adapt even when the risk factors of the disease change, e.g., due to the evolution of new variants, or the adoption of vaccines. Digital contact tracing (DCT) based on mobile phone technology has been proposed as one of many tools to help combat the spread of COVID-19. Such apps have been shown to reduce COVID-19 infections in simulation studies (e.g., (Ferretti et al., 2020b; Abueg et al., 2021; Cencetti et al., 2020) ) and real-world deployments (Wymant et al., 2021; Salathe et al., 2020; Masel et al., 2021; Rodríguez et al., 2021; Kendall et al., 2020; Huang et al., 2020) . For example, in a 3-month period, Wymant et al. (2021) estimated that the app used in England and Wales led to about 284,000-594,000 averted infections and 4,200-8,700 averted deaths. DCT apps work by notifying the app user if they have had a "risky encounter" with an index case (someone who has been diagnosed as having COVID-19 at the time of the encounter). Once notified, the user may then be advised by their Public Health Authority (PHA) to enter into quarantine, just as they would if they had been contacted by a manual contact tracer. The key question which we focus on in this paper is: how can we estimate the probability than an exposure resulted in an infection? If the app can estimate this reliably, it can decide who to notify based on a desired false positive / false negative rate. To quantify the risk of an encounter, we need access to some (anonymized) features which characterize the encounter. In this paper, we focus on the features collected by the Google/ Apple Exposure Notification (GAEN) system (Google-Apple, 2020), although our Generalizable Insights about Machine Learning in the Context of Healthcare In this paper, we show that it is possible to learn interpretable risk score models using standard machine learning tools, such as multiple instance learning and stochastic gradient descent, provided we have suitable data. However, some new techniques are also required, such as replacing hard thresholding with soft binning, and using constrained optimization methods to ensure monotonicity of the learned function. We believe these methods could be applied to learn other kinds of risk score models. The probability of infection from an exposure event depends on many factors, including the duration of the exposure, the distance between the index case (transmitter) and the user (receiver), the infectiousness of the index case, as well as other unmeasured factors, such as mask wearing, air flow, etc. In this section, we describe a simple probabilistic model of COVID-19 transmission, which only depends on the factors that are recorded by the Google/Apple app. This model forms the foundation of the GAEN risk score model in Sec. 3. We will also use this model to generate our simulated training and test data, as we discuss in Sec. 4. Let τ n be the duration (in minutes) of the n'th exposure, and let d n be the distance (in meters) between the two people during this exposure. (For simplicity, we assume the distance is constant during the entire interval; we will relax this assumption later.) Finally, let σ n be the time since of the onset of symptoms of the index case at the time of exposure, i.e., σ n = t sym in − t exp jn , where i n is the index case for this exposure, j n is the user, t sym in is the time of symptom onset for i n , and t exp jn is the time that j n gets exposed to i n . (If the index case did not show symptoms, we use the date that they tested positive, and shift it back in time by 7 days, as a crude approximation.) Note that σ n can be negative. In (Ferretti et al., 2020a) , they show that σ n can be used to estimate the degree of contagiousness of the index case. Given the above quantities, we define the "hazard score" for this exposure as follows: where τ n is the duration, d n is the distance, σ n is the time since symptom onset, f dist (d n ; φ) is the simulated risk given distance, f inf (σ n ; φ) is the simulated risk given time, and φ are parameters of the simulator (as opposed to ψ, which are parameters of the risk score model that the phone app needs to learn). The "correct" functional form for the dependence on distance is unknown. In this paper, we follow , who model both short-range (droplet) effects, as well as longer-range (aerosol) effects, using the following simple truncated quadratic model: They propose to set D 2 min = 1 based on an argument from the physics of COVID-19 droplets. The functional form for the dependence on symptom onset is better understood. In particular, (Ferretti et al., 2020a) consider a variety of models, and find that the one with the best fit to the empirical data was a scaled skewed logistic distribution: In order to convert the hazard score into a probability of infection (so we can generate samples), we use a standard exponential dose response model (Smieszek, 2009; Haas, 2014) : where y n = 1 iff exposure n resulted in an infection, and x n = (τ n , d n , σ n ) are the features of this exposure. The parameter λ is a fixed constant with value 3.1 × 10 −6 , chosen to match the empirical attack rate reported in (Wilson et al., 2020). A user may have multiple exposures. Following standard practice, we assume each exposure could independently infect the user, so where E j are all of j's exposure events, and X j is the corresponding set of features. To account for possible "background" exposures that were not recorded by the app, we can add a p 0 term, to reflect the prior probability of an exposure for this user (e.g., based on the prevalence). This model has the form Suppose we have a set of J users. The probability that user j gets infected is given by Eq. (8). However, it remains to specify the set of exposure events E j for each user. To do this, we create a pool of N randomly generated exposure events, uniformly spaced across the 3d grid of distances (80 points from 0.1m to 5m), durations (20 points from 5 minutes to 60 minutes), and symptom onset times (-10 to 10 days). This gives us a total of N = 33600 exposure events. We assume that each user j is exposed to a random subset of one or more of these events (corresponding to having encounters with different index cases). In particular, let u jn = 1 iff user j is exposed to event n. Hence the probability that j is infected is given by P j = 1 − exp[−λS j ], where S j = N n=1 u jn s n and s n = f hazard (x n ; φ), where x n = (τ n , d n , σ n ). We can write this in a more compact way as follows. Let U = [u jn ] be an J × N binary assignment matrix, and s be the vector of event hazard scores. For example, suppose there are J = 3 users and N = 5 events, and we have the following assignments: user j = 1 gets exposed to event n = 1; user j = 2 gets exposed to events n = 1, n = 2 and n = 3; and user j = 3 gets exposed to events n = 3 and n = 5. Thus The corresponding vector of true infection probabilities, one per user, is then given by p = 1 − exp[−λUs]. From this, we can sample a vector of infection labels, y, one per user. If a user gets infected, we assume they are removed from the population, and cannot infect anyone else (i.e., the pool of N index cases is fixed). This is just for simplicity. We leave investigation of more realistic population-based simulations to future work. In the real world, a user may get exposed to events that are not recorded by their phone. We therefore make a distinction between the events that a user was actually exposed to, encoded by u jn , and the events that are visible to the app, denoted by v jn . We require that v jn = 1 =⇒ u jn = 1, but not vice versa. For example, Eq. (9) might give rise to the following censored assignment matrix: where the censored events are shown in bold. This censored version of the data is what is made available to the user's app when estimating the exposure risk, as we discuss in Sec. 4. We assume censoring happens uniformly at random. We leave investigation of more realistic censoring simulations to future work. DCT apps do not observe distance directly. Instead, they estimate it based on bluetooth attenuation. In order to simulate the bluetooth signal, which will be fed as input to the DCT risk score model, we use the stochastic forwards model from (Lovett et al., 2020) . This models the atttenuation as a function of distance using a log-normal distribution, with a mean given by Using empirical data from MIT Lincoln Labs, Lovett et al. (2020) estimate the offset to be φ α = 3.92 and the slope to be φ β = 0.21. In this paper, we assume this mapping is deterministic, even though in reality it is quite noisy, due to multipath reflections and other environmental factors (see e.g., (Leith and Farrell, 2020)). Fortunately, various methods have been developed to try to "denoise" the bluetooth signal. For example, the England/Wales GAEN app uses an approach based on unscented Kalman smoothing (Lovett et al., 2020) . We leave the study of the effect of bluetooth noise on our learning methods to future work. In this section, we describe risk score model used by Google/Apple Exposure Notification (GAEN) system. This can be thought of as a simple approximation to the biophysical model we discussed in Sec. 2. For every exposure recorded by the app, the following risk score is computed: r n = f risk (τ n , a n , c n ; ψ) = τ n × f ble (a n ; ψ) × f con (c n ; ψ) where τ n is the duration, a n is the bluetooth attenuation, c n is a quantized version of symptom onset time, and where f ble and f con are functions defined below. We can convert the risk score into an estimated probability of being infected using q n = Pr(y n = 1|x n ; ψ) = 1 − exp[−µr n ] (13) Figure 1 : 3 thresholds defines 4 attenuation buckets, each of which can be given a weight. We define θ ble 0 = 0 and θ ble 4 = 100 as boundary conditions. These weights encode the assumption that smaller attenuations (corresponding to closer exposures) have higher risk than larger attenuations. wherex n = (τ n , a n , c n ) are the observed features recorded by the GAEN app, and µ is a learned scaling parameter, analogous to λ for the simulator. The GAEN risk score model makes a piecewise constant approximation to the risk-vsattenuation function. In particular, it defines 3 attenuation thresholds, θ ble 1:3 , which partitions the real line into 4 intervals. Each interval or bucket is assigned a weight, w ble 1:4 , as shown in Fig. 1 . Overall, we can view this as approximating f dist (d n ; φ) by f ble (a n ; ψ), where f ble (a n ; ψ) = The health authority can compute the days since symptom onset for each exposure, σ n = t sym in − t exp jn , by asking the index case when they first showed symptoms. (If they did not show any symptoms, they can use a heuristic, such as the date of their test shifted back by several days.) However, sending the value of σ n directly was considered to be a security risk by Google/Apple. Therefore, to increase privacy, the symptom onset value is mapped into one of 3 infectiousness or contagiousness levels; this mapping is defined by a lookup table as shown in Fig. 2 . The three levels are called "none", "standard" and "high". We denote this mapping by Each infectiousness level is associated with a weight. (However, we require that the weight for level "none" is w con 1 = 0, so there are only 2 weights that need to be specified.) Overall, the mapping and the weights define a piecewise constant approximation to the risk vs symptom onset function defined in Eq. (3). We can write this approximation as f inf (σ n ; ψ) ≈ f con (LUT(σ n ); ψ), where 3.3. Comparing the simulated and estimated risk score models In Fig. 3(left) , we plot the probability of infection according to our simulator as a function of symptom onset σ n and distance d n , for a fixed duration τ n = 15. We see that this is the convolution of the (slightly asymmetric) bell-shaped curve from the symptom onset function in Eq. (3) along the x-axis with the truncated quadratic curve from the distance function in Eq. (2) along the z-axis. We can approximate this risk surface using the GAEN piecewise constant approximation, as shown in Fig. 3(right) . We combine the risk from multiple exposure windows by adding the risk scores, reflecting the assumption of independence between exposure events. For example, if there are multiple exposures within an exposure window, each with different duration and attenuation, we can combine them as follows. First we compute the total time spent in each bucket: where K n is the number of "micro exposures" within the n'th exposure window, τ nk is the time spent in the k'th micro exposure, and a nk is the corresponding attenuation. (This gives a piecewise constant approximation to the distance-vs-time curve for any given interaction between two people.) We then compute the overall risk for this exposure using the following bilinear function: where N B = 4 is the number of attenuation buckets, and N C = 3 is the number of contagiousness levels. If we have multiple exposures for a user, we sum the risk scores to get R j = n∈E j r n , which we can convert to a probability of infection using Q j = 1 − e −µR j . In this section, we discuss how to optimize the parameters of the risk score model using machine learning. We assume access to a labeled dataset, {(X j , Y j ) : j = 1 : J}, for a set of J users. This data is generated using the simulator described in Sec. 2. We assume the lookup table mapping from symptom onset to infectiousness level is fixed, as shown in Fig. 2 ; however, we assume the corresponding infection weights w con 2 and w con 3 are unknown. Similarly, we assume the 3 attenuation thresholds θ ble 1:3 , as well as the 4 corresponding weights, w ble 1:4 , are unknown. Finally, we assume the scaling factor µ is unknown. Thus there are 10 parameters in total to learn; we denote these by ψ = (w con 2:3 , θ ble 1:3 , w ble 1:4 , µ). It is clear that the parameters are not uniquely identifiable. For example, we could increase µ and decrease w con and the effects would cancel out. Thus there are many parameter settings that all obtain the same maximum likelihood. Our goal is just to identify one element of this set. In the sections below, we describe some of the challenges in learning these parameters, and then our experimental results. We optimize the parameters by maximizing the log-likelihood, or equivalently, minimizing the binary cross entropy: where Y j ∈ {0, 1} is the infection label for user j coming from the simulator, and is the estimated probability of infection, which is computed using the bilinear risk score model in Eq. (17). A major challenge in optimizing the above objective arises due to the fact that a user may encounter multiple exposure events, but we only see the final outcome from the whole set, not for the individual events. This problem is known as multi-instance learning (see e.g., (Foulds and Frank, 2010) ). For example, consider Eq. (9). If we focus on user j = 2, we see that they were exposed to a "bag" of 3 exposure events (from index cases 1, 2, and 3). If we observe that this user gets infected (i.e., Y j = 1), we do not know which of these events was the cause. Thus the algorithm does not know which set of features to pay attention to, so the larger the bag each user is exposed to, the more challenging the learning problem. (Note, however, that if Y j = 0, then we know that all the events in the bag must be labeled as negative, since we assume (for simplicity) that the test labels are perfectly reliable.) In addition, not all exposure events get recorded by the GAEN app. For example, the index case i may not be using the app, or the exposure may be due to environmental transmission (known as fomites). This can be viewed as a form of measurement "censoring", as illustrated in Eq. (10), which is a sparse subset of the true infection matrix in Eq. (9). This can result in a situation in which all the visible exposure events in the bag are low risk, but the user is infected anyway, because the true cause of the user's infection is not part of X j . This kind of false positive is a form of label noise which further complicates learning. In this section, we discuss some algorithmic issues that arise when trying to optimize the objective in Eq. (18). We want to ensure that the risk score is monotonically increasing in attenuation. To do this, we can order the attenuation buckets from low risk to high, so τ n1 is time spent in lowest risk bin (largest attenuation), and τ n4 is the time spent in the highest risk bin (smallest attenuation). Next we reparameterize w ble as follows: Then we optimize over (w ble 1 , ∆ ble 2 , ∆ ble 3 , ∆ ble 4 ), where we use projected gradient descent to ensure ∆ ble b > 0. We can use a similar trick to ensure the risk is monotonically increasing with the infectiousness level. The loss function is not differentiable wrt the attenuation thresholds θ ble b . We consider two solutions to this. In the first approach, we use a gradient-free optimizer for θ ble in the outer loop (e.g., grid search), and a gradient-based optimizer for the weights w ble b in the inner loop. In the second approach, we replace the hard binning in Eq. (16) with soft binning, as follows: where σ t (x) = 1 1+e −tx is the sigmoid (logistic) function, and t is a temperature parameter. The effect of t is illustrated in Fig. 4 . When learning, we start with a small temperature, and then gradually increase it until we approximate the hard-threshold form, as required by the GAEN app. The advantage of the soft binning approach is that we can optimize the thresholds, as well as the weights, using gradient descent. This is much faster and simpler than the nested optimization approach, in which we use grid search in the outer loop. Furthermore, preliminary experiments suggested that the two approaches yield similar results. We will therefore focus on soft binning for the rest of this paper. In this section, we describe our experimental results. We generated a set of simulated exposures from a fine uniform quantization of the three dimensional grid, corresponding to duration × distance × symptom onset. 2 For each point in this 3d grid, we sampled a label from the Bernoulli distribution parameterized by the probability of infection using the infection model described in Sec. 2. After creating this "pool" of exposure events, we next assigned a random bag of size k of these events to each user. The value k is sampled from a truncated negative binomial distribution with parameters (p, r), where the truncation parameter is the maximum bag size b. Figure 5 shows the probabilities of bag sizes for different settings of maximum bag size b. Note that larger bag sizes make the multi-instance learning problem harder, because of the difficulty of "credit assignment". Negative bags are composed of all negative exposures. We consider two scenarios for constructing positive bags: (i) each positive bag contains exactly one positive exposure and rest are negative exposures, (ii) each positive bag contains N positive exposures, where N is sampled uniformly from {1, 2, 3}. When there is only one positive in a bag, the learning problem is harder, since it is like finding a "needle in a haystack". (This corresponds to not knowing which of the people that you encountered actually infected you.) When there are more positives in a bag, the learning problem is a little easier, since there are "multiple needles", any of which are sufficient to explain the overall positive infection. To simulate censored exposures, we censor the positive exposures in each positive bag independently and identically with probability varying in the range {0.05, 0.1, 0.2, . . . , 0.8}. We do not censor the negative exposures to prevent the bag sizes from changing too much from the control value. Fig. 7) . The AUC worsens with increasing model mismatch but the learned scores still outperform the 'Swiss' manual configuration. We use 80% of the data for training, and 20% for testing. We fit the model using 1000 iterations of (projected) stochastic gradient descent, with a batch size of 100. We then study the performance (as measured by the area under the ROC curve, or AUC) as a function of the problem difficulty along two dimensions: (i) multi-instance learning (i.e., bag-level labels), and (ii) label noise (i.e., censored exposures). We compare the performance of the learned parameters with that of two baselines. The first corresponds to the oracle performance using the true probabilities coming from the simulator. The second is a more realistic comparison, namely the risk score configuration used by the Swiss version of the GAEN app (Salathe et al., 2020) , whose parameters have been widely copied by many other health authorities. This parameter configuration uses 2 attenuation thresholds of θ ble 1 = 53 and θ ble 2 = 60. The corresponding weights for the 3 bins are w ble 1 = 1.0, w ble 2 = 0.5 and w ble 3 = 0.0 (so exposures in the third bin are effectively ignored). The infectiousness weight is kept constant for all symptom onset values, so is effectively ignored. We do 5 random trials, and report the mean and standard error of the results. Figure 6 and Fig. 7 show the results for two ways of constructing positive bags: (i) each positive bag containing exactly one positive exposure, and (ii) each positive bag containing up to three positive exposures. Not surprisingly, the oracle performance is highest. However, we also see that the learned parameters outperform the Swiss baseline in all settings, often by a large margin. In terms of the difficulty of the learning problem, the general trend is that the AUC decreases with increasing bag size as expected. The AUC also decreases as the censoring probability is increased. The only exception is the case of fully observed exposures, when each positive bag can have up to three positive exposures, where the AUC increases in the beginning as the bag size is increased (up to bag size of 4) and then starts decreasing again. This is expected since the learning algorithm gets more signal to identify positive bags. This signal starts fading again as the bag size increases beyond 4. For a given risk score r n , we take the functional form of the learned risk score model to be p l (y n |x n ) = 1 − exp(−µr n ) in all earlier experiments. This functional form matches the exponential dose response model used in our simulator where the probability of infection is given by p s (y n |x n ) = 1 − exp(−λs n ) for hazard s n (note that r n and s n still have different functional forms and are based on different inputs). Here we simulate model mismatch in p(y n |x n ) and do preliminary experiments on how it might impact the performance of the learned risk score model. In more detail, we use p s (y n |x n ) = 1 − f t (−µs n ) for the dose response model in the simulator, where f t is the truncated Taylor series approximation of the exponential with t terms. This reflects the fact that our simulator may not be a very accurate of how COVID-19 is actually transmitted. The functional form of learned model p l (y n |x n ) is kept unchanged, since that is required by the app. We vary t ∈ {2, 4, 6, 8} and plot the AUC for the learned model in Figure 8 . We fix the bag size to 4 with each bag containing up to three positive exposures randomly sampled from {1, 2, 3}. As expected, the AUC gets worse as the model mismatch increases, however, the learned scores still outperform the manual 'Swiss' configuration. We hypothesize that increasing the model capacity for computing the risk score r n (e.g., using a multilayer neural network or Gaussan Processes) can impart some robustness to model mismatch, but we leave as a direction for future work. Although there have been several papers which simulate the benefits of using digital contact tracing apps such as GAEN (e.g., (Ferretti et al., 2020b; Abueg et al., 2021) ), all of these papers assume the risk score model is known. This is also true for papers that focus on inferring the individual risk using graphical models of various forms (e.g., (Cencetti et al., 2020; Herbrich et al., 2020) ). The only paper we are aware of that tries to learn the risk score parameters from data is (Sattler et al., 2020) . They collect a small ground truth dataset of distance-attenuation pairs from 50 pairs of soldiers walking along a prespecified grid, whereas we use simulated data. However, they only focus on distance and duration, and ignore infectiousness of the index case. By contrast, our risk score model matches the form used in the GAEN app, so the resulting learned parameters could (in principle) be deployed "in the wild". In addition, (Sattler et al., 2020) uses standard supervised learning, whereas we consider the case of weakly supervised learning, in which the true infection outcome from each individual exposure event is not directly observed; instead users only get to see the label for their entire "bag" of exposure events (which may also have false negatives due to censoring). Such weakly supervised learning is significantly harder from an ML point of view, but is also much more realistic of the kinds of techniques that would be needed in practice. We have shown that it is possible to optimize the parameters of the risk score model for the Google/Apple COVID-19 app using machine learning methods, even in the presence of considerable missing data and model mismatch. The resulting risk score configuration resulted in much higher AUC scores than those produced by a widely used baseline. Limitations The most important limitation of this paper is that this is a simulation study, and does not use real data (which was not available to the authors). The second limitation is that we have assumed a centralized, batch setting, in which a single dataset is collected, based on an existing set of users, and then the parameters are learned, and broadcast back to new app users. However, in a real deployment, the model would have to be learned online, as the data streams in, and the app parameters would need to be periodically updated. (This can also help if the data distribution is non-stationary, due to new variants, or the adoption of vaccines.) The third limitation is that we have not considered privacy issues. The GAEN system was designed from the ground-up to be privacy preserving, and follows similar principles to the PACT (Private Automated Contact Tracing) protocol (https://pact.mit.edu/). In particular, all the features are collected anonymously. However, joining these exposure features x with epidemiological outcomes y in a central server might require additional protections, such as the use of differential privacy (see e.g., (Abadi et al., 2016) ), which could reduce statistical performance. (The privacy / performance tradeoff in DCT is discussed in more detail in (Bengio et al., 2021) .) We could avoid the use of a central server if we used federated learning (see e.g., (Kairouz and McMahan, 2021) ), but this raises many additional challenges from a practical and statistical standpoint. The fourth limitation is more fundamental, and stems from our finding that learning performance degrades when the recorded data does not contain all the relevant information. This could be a problem in practical settings, where app usage may be rare, so that the cause of most positive test results remain "unexplained". Without enough data to learn from, any ML method is limited in its utility. Deep learning with differential privacy Modeling the combined effect of digital exposure notification and nonpharmaceutical interventions on the COVID-19 epidemic in washington state Digital proximity tracing app notifications lead to faster quarantine in nonhousehold contacts: results from the zurich SARS-CoV-2 cohort study Inherent privacy limitations of decentralized contact tracing apps Risk scoring calculation for the current NHSx contact tracing app Using real-world contact networks to quantify the effectiveness of digital contact tracing and isolation strategies for covid-19 pandemic. medRxiv The timing of COVID-19 transmission Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing A review of multi-instance learning assumptions Exposure notifications: Using technology to help public health authorities fight covid-19 Quantitative Microbial Risk Assessment CRISP: A probabilistic model for Individual-Level COVID-19 infection risk estimation based on contact data Performance of Digital Contact Tracing Tools for COVID-19 Response in Singapore: Cross-Sectional Study Advances and Open Problems in Federated Learning Epidemiological changes on the isle of wight after the launch of the NHS test and trace programme: a preliminary analysis. The Lancet Digital Health Measurement-based evaluation of Google/Apple exposure notification API for proximity detection in a light-rail tram Configuring Exposure Notification Risk Scores for COVID-19 (Linux Foundation Public Health Inferring proximity from bluetooth low energy RSSI with unscented kalman smoothers Quantifying meaningful adoption of a SARS-CoV-2 exposure notification app on the campus of the university of arizona A population-based controlled experiment assessing the epidemiological impact of digital contact tracing Effy Vayena, and Viktor von Wyl. Early evidence of effectiveness of digital contact tracing for SARS-CoV-2 in switzerland Risk estimation of SARS-CoV-2 transmission from bluetooth low energy measurements A mechanistic model of infection: why duration and intensity of contacts should be included in models of disease spread Quantifying SARS-CoV-2 infection risk within the Apple/Google exposure notification framework to inform quarantine recommendations. medRxiv, (medrxiv;2020.07 The epidemiological impact of the NHS COVID-19 App