key: cord-1034512-wepfozf8 authors: Bulchandani, Vir B.; Shivam, Saumya; Moudgalya, Sanjay; Sondhi, S. L. title: Digital Herd Immunity and COVID-19 date: 2020-04-15 journal: Physical biology DOI: 10.1088/1478-3975/abf5b4 sha: 05fb980e114aaf619558a02d14cff684cacc2efe doc_id: 1034512 cord_uid: wepfozf8 A population can be immune to epidemics even if not all of its individual members are immune to the disease, so long as sufficiently many are immune - this is the traditional notion of herd immunity. In the smartphone era a population can be immune to epidemics even if not a single one of its members is immune to the disease - a notion we call"digital herd immunity", which is similarly an emergent characteristic of the population. This immunity arises because contact-tracing protocols based on smartphone capabilities can lead to highly efficient quarantining of infected population members and thus the extinguishing of nascent epidemics. When the disease characteristics are favorable and smartphone usage is high enough, the population is in this immune phase. As usage decreases there is a novel"contact-tracing phase transition"to an epidemic phase. We present and study a simple branching-process model for COVID-19 and show that digital immunity is possible regardless of the proportion of non-symptomatic transmission. Recent events have challenged the public health infrastructure worldwide for controlling the spread of contagious diseases. This difficulty is partly due to the novel pathogen involved and partly due to some unusual characteristics of COVID-19 [1] . Specifically, the infection appears to be transmitted through a large number of asymptomatic and pre-symptomatic cases [1, 2] . In the early days of the pandemic, few public health interventions were available; access to testing was limited, there was no vaccine and mechanisms for transmission of the disease were poorly understood. Faced with this situation, most governments chose to restrict their populations' movements through the introduction of social distancing measures. Some countries complemented this approach with the established method of "contact tracing" [3] , in which individuals who have been exposed to newly identified infections are isolated before they have a chance to infect others. Both interventions have proved to be effective for reducing the effective reproduction number R. At the time of writing, it is widely appreciated that multiple "layers of protection" serve to diminish R; interventions adopted more recently include vaccines [4] [5] [6] (though these are still far from ubiquitous) and technological improvements to face masks and indoor air filtration [7, 8] . This paper is motivated by an urgent practical question that arose at the start of the COVID-19 pandemic: given a disease with a high rate of non-symptomatic transmission, is contact tracing a useful intervention? One reason for focusing on contact tracing is that it is among the first lines of defence when faced with a new contagious disease; it requires little research overhead compared to pathogen-specific interventions such as vaccines. As such, this question has practical relevance well beyond the spe-cific context of COVID-19 [3] . Conventional wisdom dictates that traditional contact tracing, executed by teams of health officials who interview newly infected individuals, fails if non-symptomatic transmission is too frequent [3, 9] . Fortunately, we live in the smartphone era, and it has been noted that using these devices to record contacts can make the task of tracing them entirely solvable by automating it. This idea has been spelled out in a series of papers [10, 11] (see also [12] [13] [14] ) and is the basis for a large set of contacttracing apps and an Exposure Notifications System developed by Apple and Google [15] . It is reasonably clear that a perfect digital contact-tracing network will halt a nascent epidemic. The point of this paper is to quantify how far an imperfect contact-tracing network can hinder an epidemic with substantial non-symptomatic transmission. We contribute to this emerging field of infectious disease control along two axes. First, we present a simple model of the early stages of the spread of COVID-19, which allows us to obtain analytical estimates, as a function of a varying amount of non-symptomatic transmission, of the fraction of the population that needs to participate in a digital contact-sharing network in order to prevent new epidemics. While modeling with various differences from our own became available while we were working on this problem [10, 16, 17] , we feel that our approach has the virtue of making the existence and values of the estimated compliance thresholds transparent. Our estimates for the fraction of the population that needs to own a contacttracing app to avert a COVID-19 epidemic range from 75% − 95% for R 0 = 3, depending on the fraction of asymptomatic transmission, θ = 20% − 50%, that takes place. For smaller values of R 0 due to social distancing this fraction is lower. Our second contribution is to frame the overall dis-cussion in a language more familiar to physicists and students of complex phenomena more generally-that of phases, phase transitions and emergent properties. The bottom line here is the idea that the immunity of a population to epidemic growth is an emergent, or collective, property of the population. For traditional vaccination or epidemic-induced "herd immunity", this feature gets conflated with the fact that individuals can be immune to the disease at issue. But mass digital contact tracing now makes it possible for the population to be immune to epidemic growth even as no individual has immunity to the underlying disease. We propose to refer to this as the existence of a "digital herd immunity". This fits well into the general idea of an emergent property, which does not exist at the level of the microscopic constituents but exists for the collective [18, 19] . We would be remiss if we did not note that epidemiologists have previously referred to this state of affairs as "herd protection" and "sustained epidemic control" [10] . Our intention with the proposed terminology is both to frame a public health goal by including the word digital and to emphasize the emergent nature of a herd immunity. Our application of ideas from statistical physics to the theory of contact tracing yields three main dividends compared to earlier approaches. First, we are able to analytically quantify the effectiveness of contact tracing for any recursive depth n, from the limit of traditional, manual contact tracing, for which n = 1, to perfect digital tracing, for which n = ∞ in principle. Describing this crossover, which was beyond existing theoretical techniques [3, 20, 21] , allows us to sharply formulate an important, general principle of disease control: for any proportion of non-symptomatic transmission, tracing and isolation based on a sufficiently widespread contacttracing network, of sufficient recursive depth, can prevent epidemic spread. The second insight gained through our approach is a point of principle that has been neglected in the public discourse surrounding this issue, namely that once effective contact-tracing protocols are in place, nascent epidemics are extinguished with a probability near one [22] . Third, our study of the universal properties of the contact-tracing phase transition allows us to capture the full probability distribution for epidemic sizes as the critical threshold for epidemic control is approached. Our analysis suggests that the contact-tracing phase boundary is not even visible at the resolution of influential previous studies for COVID-19 [16] , that attempted to estimate the critical threshold for contact tracing by numerically simulating the epidemic-size distribution function. In the balance of this paper we do the following. We begin by summarizing the case for contact tracing as an effective strategy for combating COVID-19. We then introduce a simple branching-process model for the spread of this disease, that incorporates the key features of asymptomatic transmission, pre-symptomatic transmission and recursive contact tracing. We find that there is always a critical fraction 0 ≤ φ c < 1 of app own-ership, such that take-up of contact-tracing apps by a fraction φ > φ c of the population is sufficient to prevent epidemic spread. We provide an analytical formula for this threshold, which is verified against detailed numerical simulations, and characterize the universal features of the resulting "contact-tracing phase transition". In the final discussion, we turn to practical matters that arise in real-world implementations of digital contact tracing, such as the need for population-wide testing. A. Motivation and relevance to COVID-19 Traditional contact tracing is a multi-stage process. First, one identifies symptomatic, infected individuals. Next, one finds the people they came into close contact with during their infectious period. Finally, one treats or isolates these people before they can go on to infect others. Manual contact tracing becomes difficult for infections that have a period before the onset of symptoms when an exposed person is contagious (the Ω period). Further delay in finding the symptomatic person and their contacts could lead to tertiary infections, making it difficult to control an outbreak. For COVID-19, the incubation period is thought to be around 5-6 days, while the Ω period is estimated to be 1-3 days [2, 23]. The time before becoming contagious, or the latent period L, is around 4 days. Stochasticity of these times aside, it is reasonable to expect that if Ω < L on average, and if the exposed contacts of an individual can be traced before they become infectious, then an epidemic could be prevented. However, the delays typical for manual contact tracing, even just one or two days, can render contact tracing completely ineffective for COVID-19, given the typical L and Ω periods; this conclusion is supported by detailed numerical simulations [10, 16] . This is where digital contact tracing comes in. A smartphone application could enable instant isolation of an infected person and their network of contacts. This halts the transmission chain, because infected contacts cannot infect others during their latent period. The question immediately arises of how widespread such tracing needs to be in order to prevent an epidemic, and this question is the focus of our paper. Below we present a simple model that captures the essential features of disease spread necessary to tackle this problem. To place our work in context, the classic quantitative analyses of the efficacy of contact tracing [3, 9] , from before the smartphone era, showed that traditional, manual contact-tracing protocols become useless when the rate of non-symptomatic spreading, θ, is too high. By contrast, app-based approaches allow for "recursive" contact tracing, whereby contacts of contacts can be traced to an arbitrary recursive depth, at no additional cost. The effectiveness of recursive contact tracing has been studied in previous work; mathematically rigorous results exist in simple limits [21, 24] and detailed numerical simulations have been performed in analytically inaccessible regimes [20] . Some recent works have provided quantitative estimates for the effectiveness of non-recursive contact-tracing in the specific context of COVID-19 [10, 11, 16] . Our results should be viewed as complementary to these studies. One advantage of the model that we propose is its simplicity; this allows for a more thorough analytical understanding of the contacttracing phase transition than in previous works, for any recursive depth, 0 ≤ n ≤ ∞. Suppose we have an epidemic spreading through an infinite population of susceptibles, in discrete time, and infecting R t members of the population at each time step t = 0, 1, 2, . . . (here, R is an "effective reproduction number" that depends on the detailed properties of the epidemic spread, including the basic reproduction number R 0 ). This is a generic model for a spreading epidemic at short times. The total number of infections I tot scales as and if R < 1, the epidemic has been controlled. We want to understand which R best captures the effect of mobile-phone-based contact tracing. From a statistical physics perspective, R is the single relevant parameter controlling the epidemic spread, and drives a phase transition from an "epidemic phase" to an "immune phase" as R decreases below R = 1, which we shall elaborate on below. For the purposes of modelling epidemic spread, the key question is which "microscopic" degrees of freedom must be included to obtain a realistic estimate for R. To this end, we consider three parameters that implicitly determine R : the fraction of the population that will present asymptomatic cases (θ), the fraction of the population using a contact-tracing application (φ), and the basic reproduction number for an individual who eventually shows symptoms (R S ). R S is a combined measure of the number of pre-symptomatic infections and the efficacy of quarantine: in the limit of perfect isolation after showing symptoms, R S is precisely the number of people that a symptomatic individual infects during their Ω period, as defined above. We assume that R S is independent of whether a symptomatic individual is on the contact-tracing network or not. The effects of these parameters on the growth of the epidemic (or R) are studied using a simple branchingprocess model, where all infectious individuals are either symptomatic (S) or asymptomatic (A), and either on the app-based contact tracing network (C) or not (N). In an an uncontrolled setting, all types of infectious cases are FIG. 1. An illustrative realization of our branching-process model. a) An in-network asymptomatic individual (CA) infects R0 = 3 people, whose category of infection is chosen independently and uniformly subject to parameters θ, the fraction of the population that presents asymptomatic cases, and φ, the fraction of the population using a contact-tracing app. b) A CS infection triggers an alert on the contact network, but the CS individual and everybody else in their generation is still able to infect people before the contact network is triggered (while we depict RS = 2, we also simulate more realistic values RS = 0, 1). The arrows show the alerts sent to everyone connected to the CS individual by the contact network. c) Once the alert is sent out, everybody in the contact-network-connected component of the CS individual can be quarantined immediately (thick circles) without giving rise to further disease spread, since they are in the latent (non-contagious) periods of their infections. Meanwhile, every individual off the contact network continues to transmit the disease freely. The "recursive" aspect of such contact tracing corresponds to the arrow going back in time in Fig. (1b) ; the middle branch of infections would be missed by a traditional, non-recursive approach. assumed to proliferate with R 0 = 3, which is a reasonable estimate [25] for COVID-19 [26] . Suppose that the outbreak starts from a single infected individual at time t = 0 ("Patient Zero"). In our discrete time (generational) model, Patient Zero infects R 0 other people at time t = 1, and each new infection is assigned to one of the categories {CA, CS, N A, N S} randomly, with probabilities that are determined by the values of θ and φ. Individuals infected at the beginning of each generation are assumed not to infect anyone else after that generation has elapsed. Whenever a symptomatic individual on the contact network (CS) is encountered during this branching process, the contact network is triggered, and all people connected to the CS individual by the network, through either past or present infections, are placed in quarantine. As discussed earlier, since pre-symptomatic infections are common for COVID-19, our model includes the possibility that a CS individual infects R S people by the time they trigger the contact network. As a consequence, non-CS individuals in the same generation are also allowed to infect the next generation before the activation of the contact network (see Fig. 1c ). A few timesteps of the model are illustrated explicitly in Fig. 1 , together with the implementation of recursive contact tracing via removing connected components of the contact graph. Different combinations of the parameters θ, φ and R S lead to an effective reproduction number R distinct from the bare reproduction number R 0 , and we expect epidemic growth to be suppressed whenever R < 1. Numerical simulations of this model were performed on 10, 000 nodes with 100 initial infections, without replacement; the results are summarized in Fig. 2 . The location of the phase boundaries were verified to be independent of both doubling the system size and doubling the number of samples averaged per point shown on the phase diagram, to within the resolution of the phase diagram. Our numerics are consistent with the hypothesis that for any given fraction of asymptomatic transmission 0 ≤ θ < 1, and any presymptomatic reproduction number 0 ≤ R S ≤ R 0 , there is a critical point φ c (θ), corresponding to the onset of "digital herd immunity": epidemic control occurs for a fraction of app owners φ c (θ) < φ ≤ 1. For realistic COVID-19 parameter values, R 0 = 3, R S = 1 and θ = 0.2 − 0.5 [10, 25] , we find that φ c (θ) = 75%−95%, illustrating that when both presymptomatic and asymptomatic transmission are taken into account, the rate of app coverage necessary to prevent an epidemic can be rather high. Some practical implications of this point are raised in the final discussion. We now describe the sense in which our branchingprocess model exhibits a phase transition. Consider the disease dynamics seeded by a single initial infection, Patient Zero, at t = 0. As the disease spreads, there are two possibilities: either the epidemic seeded by Patient Zero terminates at some finite time, or it continues to spread indefinitely. In branching process theory [27] , this dichotomy is captured by the "probability of ultimate extinction", q, which is the probability that the epidemic seeded by Patient Zero terminates at some t < ∞. To make the connection with statistical physics, consider the quantity ρ = 1 − q, which is the probability that the epidemic seeded by Patient Zero spreads for all time. This defines an order parameter for the epidemicto-immune phase transition, in the following sense. If ρ > 0, an epidemic can spread with non-zero probability, and the population is in an "epidemic phase". If ρ = 0, epidemics are almost surely contained, and the population is in an "immune phase". In fact, ρ is precisely where the tuning parameters are θ, the rate of asymptomatic transition, and φ, the fraction of contact-tracing app ownership among the population. RS denotes the basic reproduction number for pre-symptomatic transmission. Each phase diagram was generated from 4000 microscopic simulations of 20 generations of disease evolution on 10, 000 nodes, of which 100 nodes were initially infected at random. A black square denotes epidemic control (average growth in the cumulative number of infections over the generations 16 − 20 is less than 0.25% of total population). This is grayscaled continuously to white for late-time growth exceeding 2.5% per generation or full epidemic spread before 20 generations have elapsed. Dashed red curves show exact results for 10-step contact tracing, while dashed blue curves show an easy-to-use approximation to the exact result. Both formulae are presented in Sec. II C. The exact critical point for θ = 0 and RS = 2, derived in Appendix A, is marked by a cyan arrow. Bottom: Sample simulations from the encircled region in the RS = 0 phase diagram. Curves (solid) denote cumulative number of infections as a percentage of total population, averaged over 10 samples (dashed), with θ = 0.5 fixed and φ varied from φ = 0.6 to φ = 0.9. the order parameter for a site percolation phase transition [28] on the infinite, rooted, Cayley tree with bulk coordination number z = 1 + R 0 . We argued above that the epidemic-to-immune phase transition is driven by a single relevant parameter, the effective reproduction number, R. Let us now make this statement precise. For simple epidemic models, the underlying branching process is Markovian, and we can define R to be the mean number of new infections generated by an infected node. It is then a rigorous result that ρ = 0 for R ≤ 1 and ρ > 0 otherwise [27] . For such models, the connection between epidemic spread and percolation transitions has been known for some time [29] [30] [31] [32] [33] . By contrast, for the model studied in this paper, the possibility of tracing successive contacts means that the disease dynamics is no longer Markovian; there are correlations between generations that preclude a simple definition of R, and earlier theoretical results do not apply. For the same reason, our model has no simple interpretation in terms of site or bond percolation once contact tracing is included (i.e. for φ > 0 and n ≥ 1). The main technical innovation in our work is surmounting this breakdown of the Markov property: we develop generating function methods that allow for exact summation of non-Markovian contact-tracing processes, to any desired order. We show that despite intergenerational correlations, the critical behaviour is determined by a function R n (φ, θ), which can be viewed as a "mean number of new infections", suitably averaged over time. In particular, R n (φ, θ) controls the ultimate fate of the epidemic, and the critical line for n-step contact tracing is given by an implicit equation (2) in φ and θ; details of the calculation and a full expression for R n (φ, θ) are presented in the Supplementary Material. Taking the limit as n → ∞ yields the exact critical line for contact tracing to arbitrary recursive depth; however, for any n ≥ 1, this does not seem to be expressible in closed form, except at its endpoints. Fortunately, the function R n is found to converge rapidly in its arguments with increasing n. Fig. 2 depicts results from Eq. (2) with ten-step contact tracing, n = 10, and shows excellent agreement with stochastic numerical simulations. We note that the values obtained for n = 10 are already well-converged, in the sense that they do not vary significantly for n > 10. The sensitivity of the efficacy of contact tracing to tracing depth is addressed in more detail in a companion paper [34] . We now discuss the universal properties of this phase transition. For concreteness, let us parameterize the critical line as (φ c (θ), θ). For fixed θ in the domain of φ c , a transition from an epidemic to an immune phase occurs as φ → φ c (θ) − . We call this transition the "contacttracing phase transition", because it is controlled by the population fraction on the contact-tracing network. To capture the universal properties of this transition, it is helpful to consider the random variable |C|, which is the size of the infected cluster seeded by Patient Zero in a single realization of the branching process. On the immune side of the transition, |C| is almost surely finite, and the risk of epidemics is captured by the mean cluster size, E φ,θ (|C|). On the epidemic side of the transition, the mean cluster size diverges, and the order parameter ρ φ,θ = P φ,θ (|C| = ∞) better quantifies the risk of epidemic spread. In the vicinity of the critical point φ = φ c (θ), both quantities are characterized by universal The non-locality of the branching-process dynamics increases with the contact-tracing depth, n. The figure depicts a growing cluster of asymptomatic infections that are all on the contact network. As the recursion depth n for contact tracing increases, a single symptomatic infection allows for isolation of increasingly large asymptomatic clusters, up to a size that grows exponentially in n. In the limit n → ∞, this results in a discontinuous phase transition at (φ, θ) = (1, 1); see Appendix C for details. critical exponents, In the Supplementary Material, we show that γ = β = 1 for any finite n, demonstrating that the contact-tracing phase transition lies in the universality class of mean-field site percolation [35] . The scaling theory of percolation transitions then implies a universal scaling form for the distribution of epidemic sizes, where f is a scaling function with exponentially decaying tails, and the cluster correlation length As the recursive tracing depth n → ∞, we find that such mean-field behaviour breaks down at the endpoint of the critical line (φ, θ) = (1, 1), which exhibits a discontinuous phase transition, reflecting the non-locality of the underlying branching process; see Fig. 3 . The emergence of a discontinuous percolation transition on the Bethe lattice is highly unusual, and suggests that the non-local character of the n = ∞ contact-tracing transition fundamentally distinguishes it from the percolationtype phase transitions that have arisen in related settings [29-33, 36, 37] . Although Eq. (2) is exact, the full expression for R n is a little cumbersome to rapidly adapt and use. We therefore present a simpler, approximate formula for R = R ∞ (φ, θ), based on linear interpolation and a "mean-field" assumption, whereby inter-generational correlations are neglected. The resulting approximation to R, derived in Appendix D, is given by In Fig. 2 , the critical threshold R = 1 predicted by Eq. (6) is compared to the exact result, Eq. (2), for 10-step contact tracing, and shows reasonably good quantitative agreement. We have introduced a simple branching-process model for early-stage epidemic spread, which both retains a degree of analytical and numerical tractability and is sufficiently expressive to model complicated features of COVID-19 spreading and control, for example presymptomatic transmission, as distinct from asymptomatic transmission, and recursive contact tracing. Using this model, we have obtained predictions for the app take-up fraction needed to provide digital herd immunity as a function of R 0 and the asymptomatic transmission frequency θ. Our conclusions are significant beyond the immediate context of COVID-19: this paper can be read as a theoretical demonstration that digital contact tracing is effective even when the rate of non-symptomatic transmission is high and not everybody is on the contact network. This was not at all clear, even as a matter of principle, at the start of the COVID-19 pandemic. Subsequent developments, in countries such as South Korea, Vietnam, Japan and Taiwan, have confirmed that digital contact tracing can be deployed successfully against COVID-19 [38] . India's digital contact tracing effort, which provided the initial stimulus for our study, was also substantial [39] , with over 100 million individual downloads of the mobile application. However, despite such successes, serious adoption of digital contact tracing during the present crisis has been rather limited. This is surprising, given its basic importance as a tool for epidemic prevention [3] . We now examine some of the reasons behind this neglect. One concern, which has severely hindered the widespread adoption of digital contact-tracing technology in Europe and the USA, is protecting the privacy of individuals on the contact tracing network. As a purely technical problem, this is easily surmountable but a lack of public faith in governmental handling of data has no simple remedy. The seriousness of this obstacle depends on the details of the social contract between the state and its population, which tends to be more rigid in countries where digital contact tracing has been successful [38] . A more tractable problem is that recursive digital contact tracing can rapidly lead to an excessive number of "alerts", in the sense of Fig. 1 , unless it is accompanied by a widespread testing program to minimize the number of false positives. Conversely, if widespread testing is already in place, then digital contact tracing becomes a highly effective tool for epidemic prevention. Unfortunately, widespread testing was not in place for many months after the start of the pandemic. Published estimates for the required testing rate [40, 41] further tended to underestimate the testing overhead due to recursive tracing of asymptomatic infections. A revised estimate, which correctly accounts for both tracing contacts of contacts and asymptomatic infections, is given in Appendix E. We end with some comments towards the future. In terms of statistical mechanics, it would be useful to generalize our computations to take real-world heterogeneity of R 0 into account [42] and to examine the course of epidemics on realistic graphs when φ < φ c (see Ref. [34] for an extension of the model introduced above to arbitrary contact networks). In terms of infectious disease control we believe digital control has a promising future-it is a non-trivial idea with enormous scalability and can be greatly strengthened by combining it with wearable diagnostics that track the health status of the wearer. Variants of COVID-19 will be with us for the forseeable future and influenza is already endemic. It seems realistic to aim at greatly reducing the annual incidence of both of these diseases by adding digital control to the existing toolkit. Finally, we believe that one should start to think of the herd immunity of a population as deriving from a combination of natural immunity, vaccination, use of personal protective equipment and digital immunity. We would like to thank the Principal Scientific Adviser to the Government of India, Professor K. VijayRaghavan, for interesting us in this question and for discussions of India's Aarogya Setu contact tracing app, Professor Bryan Grenfell for sharing his wisdom regarding epidemiology at an extremely hectic time and Dr. Shoibal Chakravarty for continuing discussions on all aspect of India's COVID-19 challenges. In this appendix, we derive the critical behaviour along the line θ = 0. Since all infections on this line are symptomatic, we denote R S ≡ R. As we demonstrate below, the exact critical point can be obtained in closed form, and is found to be which matches our numerical phase diagram to within the resolution of the plot; see Fig. 2 for an example with R = 2. Let us now summarize some basic facts about the critical behaviour of percolation-type transitions. Recall [35] that standard site percolation on the Bethe lattice exhibits five metric-independent critical exponents, which we may denote {α, β, γ, δ, ∆}. The three scaling relations imply that only two of these are independent. There are three more metric-dependent critical exponents, {ν, ρ, η}, which can be expressed in terms of the previous exponents using the hyperscaling relations in six dimensions. We will focus on the two independent exponents γ and β, since these are the easiest to calculate. They control the critical behaviour of the mean cluster size E p (|C|) and the percolation probability P p (|C| = ∞) (probability of formation of an infinite cluster) respectively. In terms of the site occupation probability p and its critical value p c , they are defined by Site percolation on the Bethe lattice with z ≥ 3 lies in the universality class of mean-field percolation, and exhibits critical exponents γ = β = 1 [35] . To compute the mean cluster size, we adapt a method introduced by Fisher and Essam for counting clusters of a given size in the Bethe lattice [43] . We define a probability generating function for the cluster size where |C| denotes the cluster size and P φ (|C| = s | initial node infected) denotes the probability of obtaining a cluster of size s from an initial infected node. Since susceptible individuals are on the network (C) with probability φ and off the network (N ) with probability (1 − φ), B(φ, x) can be expressed as Using Eq. (A4), the expression for the mean cluster size reads If a node of type C infects another node of type C, the latter cannot transmit infection further. However, a node of type N can infect other nodes freely. This implies the recurrence relations for the coefficients of each probability generating function. Differentiating at x = 1, and using the normalization constraint B C/N (φ, x = 1) = 1, we obtain Solving these linear equations, we find that Thus, using Eqs. (A10) and (A5), we obtain the mean cluster size: . Denoting the roots of the denominator by we may write Since 0 ≤ φ ≤ 1, it follows that the mean cluster size has a simple pole at φ = φ + , and assumes a physical, positive value only when φ > φ + . Thus, the exact critical point lies at φ c = φ + . (The unphysical, negative value in Eq. (A12) in the percolating regime φ < φ c reflects the divergence of the mean cluster size due to the infinite cluster. Meaningful results in the percolating regime can be recovered by conditioning on the event {|C| < ∞} [35] , but we will not pursue this here.) In the vicinity of the critical point φ c = φ + , we obtain from which the critical exponent γ = 1 can be read off. To derive β, we define the probabilities of infinite cluster formation from a source infection that is respectively on or off the contact network: We now derive recurrence relations to compute ρ C and ρ N . Since a C node can give rise to an infinite cluster only through infecting an N node, we can obtain an expression for ρ C in terms of ρ N as follows. Note that (1 − (1 − φ)ρ N ) is the probability that an infected N node does not lead to an infinite cluster, and hence (1 − (1 − φ)ρ N ) R is the probability that none of the R infected nodes lead to infinite clusters. Thus, the probability that at least one of the nodes infected by an initial C node leads to an infinite cluster is given by Similarly, noting that an N node can lead to an infinite cluster via either C or N nodes, the probability that at least one of the infected nodes leads to an infinite cluster is given by Eqs. (A15) and (A16) reduce to a single equation for ρ N : Expanding to second order in ρ N yields In terms of the roots defined in Eq. (A11), we have Hence we recover the result φ c = φ + . It further follows that and consequently that which suggests a critical exponent β = 1. To confirm this exponent, we define the probability of formation of an infinite cluster from a single infected site, ρ = φρ C + (1 − φ)ρ N . Since ρ N is small in the vicinity of φ c , we can linearize Eq. (A15), to obtain Rφ c + 1 (R + 2)φ c + 1 from which the critical exponent β = 1 is immediate. First, it is useful to define one probability generating function per type of initial node: where α ∈ {CA, CS, N A, N S}. The probability generating function for cluster sizes, given any type of infected initial node, is then We shall find the exact critical line for n-step contact tracing by determining when the mean size of an infected cluster, We first obtain exact recurrence relations for the generating functions B α by enumerating possibilities at a given node. When the initial infected node is off the contact network, i.e. of type N A or N S, we obtain since nodes off the network can infect any other type of node (cf. Eq. (A7)). When the initial infected node is of type CS, any infections in the next generation that are on the network will be detected (cf. Eq. (A6)), and we obtain where it is useful to define The analogous result for B CA is rather more involved. The essential difficulty is that for n-step contact tracing, the recurrence relation for B CA involves n generations beyond the initial node, rather than just one. This is because the possibility arises of multi-generational clusters of CA nodes, that escape detection until they infect a CS node at some later generation 1 < m ≤ n. (Put differently, the underlying branching process is not Markovian.) The simplest way to proceed is to study all the configurations of clusters originating from an initial infected node of type CA, and organize the sum according to the generation in which the CA cluster connected to the initial CA node is detected, schematically We first define a function and its composition to recursively "propagate" the generating function from one generation to the next, after the addition of a CA node: f (x, . . . f (x, y) . . .)) j times j ≥ 1 . (B7) The generating function for the processes without detection in generations j ≤ n reads {no detection, gens. j ≤ n} = f (n−1) (x, g (1) ), where g (1) is the generating function for all processes that do not lead to a CS node in one generation of disease spread. Similarly the generating function for all processes that end in detection in generation j, reads {first detection in gen. j} = (f (j) (x, g (2) ) − f (j) (x, g (3) )), where g (2) (resp. g (3) ) are generating functions for all processes that lead to (resp. do not lead to) creation of a CS node in generation j, and the subtraction ensures that only processes that give rise to at least one CS node in generation j are included. Using Eqs. (B6), (B8), and (B9), we find that It can be verified that B CA is correctly normalized, i.e. that B CA x=1 = 1. To compute the mean cluster size, we first compute derivatives of the generating functions B α . Noting that B α x=1 = 1 and using Eqs. (A7) and (B4), we obtain Eliminating all variables other than ∂ x B x=1 , we can write the derivative of B CA in the form and in terms of these coefficients {E n , F n }, the mean cluster size reads where is is useful to define It is clear that the mean cluster size diverges when It remains to compute F n , as defined in Eq. (B12). To this end, let us introduce functions of φ and θ, with arguments other than x suppressed. By the chain rule, these satisfy the recurrence relations where we defined a (i) j = f (j) (1, g (i) (1)). Since we are only concerned with the coefficient of ∂ x B x=1 in ∂ x B CA x=1 , let us write c for the terms b (i) j of interest. Combining the above expressions, we find where the b with b (1) and The exact critical line for n-step contact tracing is thus given by the implicit equation for φ and θ. We now outline a procedure to obtain the critical line using the percolation probability of an infected initial node. The probability of formation of an infinite cluster is given by where ρ α is the probability of formation of an infinite cluster starting from a node of type α. Since nodes off the network (N S and N A) can infect any type of node, we obtain the recurrence relations Further, since the CS node can only infect anyone outside the network, we obtain Similar to the case of the generating function in the previous section, obtaining the recurrence relation for ρ CA is more involved. We proceed by enumerating the minimum number of processes such that the expression for the formation of an infinite cluster ρ CA can be expressed in terms of the ρ α 's. For depth n contact tracing, this yields a term that schematically reads {first detection in gen. j}. To determine the critical line, it is sufficient to linearize the recurrence relations for {ρ α } similar to the calculation in App. A. We thus obtain where R N is defined in Eq. (B14). To count the different kind of processes that enter Eq. (B27), we divide all processes of j generations into three types, which we denote as follows. (B29) Note that processes in C (j) and E (j) lead to no detection in generation j and T (j) leads to a detection in generation j that activates the contact network. Examples of these processes for j = 2 are shown in Fig. 4 . To linear order in p, we find that the terms in ρ CA in Eq. (B27) can be expressed as where τ runs over all the processes in the sets described in Eq. (B29), α is summed over all types of nodes unless other wise stated, and we have defined n α (τ ), P τ , and ρ (D) CA as follows. n α (τ ) denotes the number of nodes of the type α on the edge of a process τ . P τ is the probability of having a process τ , i.e. the product of the individual probabilities {p α } of all the nodes in the process including the root CA node, and in Eq. (B30) we divide by a factor of p CA in order to determine the probabilities of the processes given that the root is of type CA. ρ (D) CA is the probability of formation of an infinite cluster due a CA that has been detected due to a CS node in the same generation, which reads ρ (D) where we have used Eq. (B28). To evaluate the terms in Eq. (B30), we define generating functions corresponding to each of the processes in Eq. (B29) as (B32) These generating functions can be enumerated recursively as In terms of these generating functions, Eq. (B30) reads Using Eqs. (B24), (B28), (B31), and (B35), we finally obtain an expression for ρ of the form using which we identify the critical line to be While we are not able to derive a more explicit expression for R n (φ, θ) using this approach, we have verified using Mathematica that it yields the same critical line of Eq. (B23) for several values of n. That is, we find that Following the derivation of critical exponents in App. A, it is clear that for any finite n, the critical exponents β and γ are controlled by the behaviour of the function near the critical line R n (θ, φ) = 1. Unfortunately, it does not seem possible to obtain R n (φ, θ) in closed form. However, since R n (φ, θ) is analytic for any finite n, the critical behaviour on the transition line is expected to be determined by the leading terms in its Taylor expansion, which are linear in φ for given θ and vice versa. Numerically, we find that this is indeed the case; see Fig. 5 . The numerical data suggests that γ = β = 1 everywhere except on the line φ = 1, whose critical behaviour is discussed below. When the tracing depth is infinite, there is a discontinuous transition as θ → 1 − , in the sense that the "order parameter", i.e. the probability ρ(θ) of formation of an infinite infected cluster, jumps discontinuously from 0 to 1 at θ c = 1. To see this, note that when φ = 1, the evolution of an asymptomatic cluster from an asymptomatic source is described by the following branching process where X A n denotes the total number of new asymptomatic infections in generation n, the indicator functions 1 X A n =R n 0 reflect the fact that a single symptomatic case will terminate the branching process, and The probability generating function of X A n then satisfies the recurrence relation where f (y) is the probability generating function for the descendants of a single asymptomatic node, Using these results, it can be shown by induction that The probability that the branching process is extinct in generation n is then The extinction probability is thus It follows that the critical point occurs at θ = θ c = 1, and that the probability of formation of an infinite cluster, which is usually regarded as the order parameter for percolation transitions, behaves like Such discontinuous behaviour of the order parameter indicates that the transition has a first-order character. For example, at θ = 1 all asymptomatic clusters are infinite and the critical exponent δ is not even defined. However, several other critical exponents are defined, a scenario reminiscent of one-dimensional site percolation, which also has a critical probability p c = 1 and a mixture of continuous and discontinuous behaviour as p → p − c . For any finite tracing depth n, the exact critical point along the line φ = 1 is found to lie at with mean-field critical exponents, γ = β = 1. While this can be obtained from the results of App. B, we provide an intuitive explanation here. When φ = 1, all nodes are on the contact network, and hence only two types need to be considered: symptomatic (S) and asymptomatic (A), which occur with probabilities (1 − θ) and θ respectively. Since any infinite cluster must consist entirely of A nodes, we can directly obtain the recurrence relation for the probability ρ of formation of an infinite cluster. For contact tracing with recursive depth n, the only event that does not rule out the existence of an infinite cluster is the formation of an n-generation tree in which every node is asymptomatic. This occurs with probability θ Nn , where where N n is the total number of nodes in such a tree excluding the root node. Given such a tree, infinite clusters can originate from any of the R n 0 asymptomatic leaves in generation n. This yields the following recurrence relation for ρ: (C11) Expanding Eq. (C11) to second order in ρ, it is straightforward to derive Eq. (C9) and that Thus, β = 1 for any finite n. A similar argument yields the mean cluster size which has the same critical point as the percolation probability, and yields a critical exponent γ = 1, since Appendix D: Mean-field-like estimate for the critical line In this section, we derive a simple, approximate formula for R = R ∞ (φ, θ), based on linear interpolation and a "mean-field" assumption, whereby inter-generational correlations are neglected. It is first helpful to label the possible types of infected individual by α ∈ {CA, CS, N A, N S}, and note that susceptible individuals of each type occur with the independent probabilities given in Table D1 : Now suppose that there are no correlations between generations. Then the effective reproduction number is simply an average over the possible types of node: R = α∈{CA,CS,N A,N S} p α R α . (D2) Here, the probabilities p α are given as in Table D1 , while R N A = R 0 , R N S = R S , and R CS = R S (1−φ), corresponding to the average number of live nodes generated by a symptomatic individual on the contact network. However, R CA is essentially undetermined in this approach, since discarding correlations in time also discards contact tracing, to which the effective value of R CA is highly sensitive. We will therefore treat R CA as a variational parameter, to be estimated self-consistently. In the "best" case, asymptomatic transmission within the network is completely suppressed, and R CA = (1 − φ)R 0 . In the "worst" case, asymptomatic transmission within the network is not suppressed at all, and R CA = R 0 . A simple way to proceed is to solve for the unique linear interpolation between these cases that passes through the known endpoint (φ, θ) = (1, 1) of the non-perturbative critical line (see Appendix C). The resulting approximation to R is given by Appendix E: Estimating the number of tests required for successful contact tracing The number of tests per symptomatic infection required to successfully implement the digital contact tracing scheme described in this paper depends on the basic reproduction number R 0 , the fraction of asymptomatic cases θ, the recursive depth of contact tracing n, the typical number of contacts per individual N and the number of new cases N C per day (this is the true number of cases, that includes undetected, asymptomatic infections). In this Appendix, we estimate the number of tests required to implement single-step contact tracing (n = 1); our calculation is easily generalized to larger recursive depths n > 1. First suppose that R 0 < 1/θ. Then isolating (1 − θ)N C symptomatic cases and tracing their contacts during their possible pre-symptomatic contagious period, for a total of N T = N (1 − θ)N C tests per day, leaves θN C asymptomatic cases per day that can spread the disease. In the next generation, these give rise to R 0 θN C new infections. The effective reproduction R is then given by R = new cases old cases = R 0 θN C N C = R 0 θ < 1. Thus epidemic spread is prevented within a handful of disease generations. Next suppose that R 0 > 1/θ. Again, we start by testing the symptomatic cases and the contacts they encountered during their pre-symptomatic contagious period, leading to N T = N (1 − θ)N C tests per day. During the second generation of infections, this cohort yields (1 − θ)R 0 θN C hitherto undetected symptomatic cases per day, all of which arise from asymptomatic sources. For these cases, there are R 0 distinct possibilities that need to be considered. The probability that a given asymptomatic source generates k symptomatic infections is given by and the average number of tests per symptomatic case is (R 0 + k + 1) × N/k, where k = 1, 2, . . . , R 0 , which can be understood as follows. For each symptomatic infection, their contacts from the pre-symptomatic period as well as the period when they were likely infected need to be tested, yielding a total of k × 2N tests. Once the asymptomatic source is found, we must additionally trace their N contacts, to find R 0 − k asymptomatic infections in the current generation, and finally test their N contacts to prevent future transmission. Thus the total number of contacts traced in the event Eq. (E2) is (2k + 1 + R 0 − k) × N = (R 0 + k + 1) × N . Putting this all together, the average number of contacts that need to be tested per symptomatic case is in terms of which the required daily testing rate is As an illustrative example, with R 0 = 3, θ = 1/2, N eff = 3.3N N C , and assuming around 10 significant contacts per person and around 50, 000 cases per day, N T ≈ 1.2 million tests per day. The effective R achievable in this manner is Notice that R < 1 for θ < θ c , where the critical value of asymptomatic transmission is given by For R 0 = 3, this is θ c ≈ 0.69. For higher rates of asymptomatic transmission, higher orders of recursive contact tracing are necessary to prevent an epidemic, with a concomitant increase in the number of daily tests. Factors that make an infectious disease outbreak controllable Safety and efficacy of the chadox1 ncov-19 vaccine (azd1222) against sars-cov-2: an interim analysis of four randomised controlled trials in brazil, south africa, and the uk Efficacy and safety of the mrna-1273 sars-cov-2 vaccine Safety and efficacy of the bnt162b2 mrna covid-19 vaccine An evidence review of face masks against covid-19 Review of ventilation strategies to reduce the risk of disease transmission in high occupancy buildings Contact tracing and disease control Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing Proximity: a recipe to break the outbreak (2020) A high-resolution human contact network for infectious disease transmission Epimap: Towards quantifying contact networks for understanding epidemiology in developing countries Progress in Location-Based Services 2014 , Lecture Notes in Geoinformation and Cartography Exposure Notifications: Using technology to help public health authorities fight COVID-19 Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts, The Lancet Global Health Automated and partly automated contact tracing: a systematic review to inform the control of covid-19 More Is Different On emergence from the perspective of physical science The effectiveness of contact tracing in emerging epidemics Contact tracing in stochastic and deterministic epidemic models More precisely, the statistical physics of the epidemicto-immune phase transition implies that in the limit of an infinite population, epidemics are suppressed almost surely beyond a certain threshold population fraction φ > φc on the contact-tracing network and sufficiently large recursive depth n Serial interval of novel coronavirus (covid-19) infections, International journal of infectious diseases Exact and approximate formulas for contact tracing on random trees Early dynamics of transmission and control of COVID-19: a mathematical modelling study Given the greatly increased state of awareness of the disease at this point it seems reasonable to assume that such higher values are not relevant to disease dynamics today Probability : An Introduction Next, embed the branching processes model in this tree in the natural way, with Patient Zero at the root, and "occupy" nodes according to their probability of infection. The the probability of formation of an infinite cluster of infections, ρ, is precisely the probability that occupied sites percolate on the Cayley tree However, on the Cayley tree the bond and site percolation thresholds are equal; since the site percolation picture is slightly more natural from the branching process viewpoint On the critical behavior of the general epidemic process and dynamical percolation Field theoretic formulation of an epidemic process with immunisation Epidemic models and percolation Spread of epidemic disease on networks Second look at the spread of epidemics on networks Recursive contact tracing in reed-frost epidemic models Percolation Self-organized critical forestfire model Forest-fire model with immune trees Why many countries failed at covid contacttracing-but some got it right Why we need at least 500,000 tests per day to open the economy -and stay open Why We Must Test Millions a Day Superspreading and the effect of individual variation on disease emergence Some cluster size and percolation problems Here, we derive the exact critical line for digital herd immunity using two complementary approaches. In App. B 1, we obtain recurrence relations for the full probability generating function for the size of infected clusters, which allows us to identify when the mean size of an infected cluster diverges. In App. B 2, we study the processes involved in the cluster growth and determine when the percolation probability of the infected cluster approaches zero.Appendix C: Critical behaviour along the line φ = 1Here, we study the critical behaviour along the line φ = 1. In the limit of infinite tracing depth, we find that there is a discontinuous phase transition at θ c = 1. However, for any finite contact-tracing depth n, the critical point θ c < 1, and the transition shows the same critical exponents as mean-field percolation.