key: cord-0939920-aq0480jq authors: Levesque, Jérôme; Maybury, David W.; Shaw, R.H.A. David title: A model of COVID-19 propagation based on a gamma subordinated negative binomial branching process date: 2020-11-10 journal: J Theor Biol DOI: 10.1016/j.jtbi.2020.110536 sha: 6b41a70d25f4532f9be815ff119b2f7828c47d72 doc_id: 939920 cord_uid: aq0480jq We build a parsimonious Crump-Mode-Jagers continuous time branching process of COVID-19 propagation based on a negative binomial process subordinated by a gamma subordinator. By focusing on the stochastic nature of the process in small populations, our model provides decision making insight into mitigation strategies as an outbreak begins. Our model accommodates contact tracing and isolation, allowing for comparisons between different types of intervention. We emphasize a physical interpretation of the disease propagation throughout which affords analytical results for comparison to simulations. Our model provides a basis for decision makers to understand the likely trade-offs and consequences between alternative outbreak mitigation strategies particularly in office environments and confined work-spaces. Combining the asymptotic limit of our model with Bayesian hierarchical techniques, we provide US county level inferences for the reproduction number from cumulative case count data over July and August of this year. As of June 20, 2020, there have been more than 8 million confirmed global cases of COVID-19, a respiratory illness caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Early indications suggest a case/infection fatality rate of between 0.5% to 2% [1, 2, 3, 4] with poor prognosis strongly dependent on comorbidity factors such as advanced age, diabetes, and other poor health conditions [5] . The Centers for Disease Control and Prevention in the United States gives an overall current symptomatic case fatality ratio of 0.4% [6] while studies involving seroprevalence indicate a median infection fatality rate of 0.25% [7] . Canada has seen over 100,000 cases and the entire world has engaged in costly outbreak mitigation strategies to prevent excess deaths. Governments around the world have focused on controlling COVID-19 outbreaks primarily by reducing direct human-to-human contact through varying degrees of society-wide lockdowns and strong social distancing measures. By limiting the opportunity for infectious contacts, the hope is that the infection rate will remain low enough to prevent medical support systems from becoming overwhelmed while also reducing the effective reproduction number of the disease. Evidence suggests that government policies are having a positive effect [8] , but some strategies may also become prohibitively expensive in the not too distant future. An alternative outbreak controlling strategy to lock-downs is contact tracing with isolation. In this strategy, health authorities trace the human-to-human contacts of an infected person and isolate those contacts who are at risk of having become infected. If the probability of isolating potentially infected contacts is high and the time to isolation is sufficiently short, contact tracing with isolation may offer better cost benefit performance relative to lock-downs in keeping society safe [9] . Modelling the spread of infectious diseases falls into two broad classes [10] : deterministic modelling, which captures the thermodynamic limit and large scale behaviour of the underlying epidemiological phenomenon, and stochastic modelling, which describes the microscopic statistical nature of the generative process. Traditional compartmental models (e.g., SIRD), usually expressed as a set of coupled ordinary differential equations, fall into the first class while branching processes, in which each infected individual randomly generates "offspring", belong to the second class. In this paper, we focus on a stochastic formulation of COVID-19 following Hellewell et al. [11] . In the early stages of an epidemic, in a well-mixed and homogeneous population with an infected population that is small relative to the susceptible population, branching models well approximate the dynamics of propagation [12] . A formal mathematical treatment of this statement can be found in [13] , and [14] , with extensions to explosive growth models found in [15] . Branching model approximations for epidemic application appear at least as early as the mid-1950s [16, 17] , and continue to find use in public health domains (see for example, [18, 19, 20] ). Branching processes have also found application in modelling past epidemics such as Ebola Virus Disease outbreaks in West Africa [21, 22] . For an introduction to branching processes, see [23] , chapter 11, and for a wider introduction to the application of branching processes in other biological fields see [24] . In the current COVID-19 pandemic, branching models have found use in estimating the probability that an imported case into a susceptible population will lead to sustained transmissions [25] , and in developing the efficacy of contact tracing strategies [11] . Given the high variance of the observed case count data with COVID-19, especially during the start of a local outbreak, branching process provide and excellent framework for capturing the stochastic nature of super-spreading events [26] . For understanding the challenges in predicting regional trajectories of COVID-19, and for generating relevant policy insights, [27] emphasize the importance of parsimonious models. The authors also stress the usefulness of branching processes, including self-exciting point processes, when analyzing individual COVID-19 count data with parsimonious constructions. In [11] , the authors develop a branching process to model propagation in the presence of contact tracing with isolation. The model uses a negative binomial distribution to generate secondary cases produced by an infected individual with new infections assigned a time of infection through draws from a serial interval distribution. By truncating the serial interval distribution through isolation events, the authors show that in most of their scenarios contact tracing and case isolation is enough to control a new outbreak of COVID-19 within 3 months. While the construction in [11] provides a rich base for numerical simulations, to gain further insight, we extend the model to a fully continuous time setting which provides us with a complete generative model, including expressions for the generating and characteristic functions. Furthermore, each part of our model has a direct physical interpretation of the underlying disease propagation mechanism. The model balances fidelity and parsimony so that the model can 1) be calibrated to data relatively easily, 2) provide semi-analytic tractability that allows for trade-off analysis between different mitigation strategies 3) generate realistic simulated sample paths for comparing interventions. Our code is available as R packages 1 2 . In this paper, we build a Crump-Mode-Jagers (CMJ) branching process model through a subordinated Lévy process. CMJ constructions contain the triple of random processes (λ x , ξ x (·), χ x (·)) defined by, • λ x is a random variable that denotes an infected person's communicable period; • ξ(t) = #{k : σ(ω, k) ≤ t} counts the number of infected people over event space Ω(ω) in time t. ξ x (t − σ x ) denotes the random number of infected people created by an infected person at every moment of her communicable period over the interval [σ x , t); • χ x (t − σ x ) is a random characteristic of the infected person within the interval [σ x , t); } is the number of infectious existing at moment t). Our model begins by generating infections from an infected individual through a compound Poisson process where the event times represent transmission events (σ x ). We assume that an individual is infectious from the moment she becomes infected. The number of new infections at each transmission event is a draw from the logarithmic distribution [28] (ξ(t)) and consequently, the resulting generative process is the negative binomial process (see, for example, Quenouille [29] ). The stochastic counting processes remains "on" during the communicable period and then shuts "off" at the end-that is, the communicable period is the random lifetime (λ x ) in the CMJ language. We model the communicable period as a gamma distributed random variable, Γ(a, b), with meant = a/b. By subordinating our resulting negative binomial process with a gamma process for the communicable period, we arrive at our model of COVID-19 propagation-a gamma negative binomial branching process (GNBBP). (For details on subordinated Lévy processes, see [30] .) The model we present assumes a well-mixed population in that each child in the branching processes has the same statistical properties as the parent. While our model includes the property of super-spreader events, generated by the fat tail of the counting process, it does not include heterogeneous mixing scenarios whereby susceptible contacts for an infected individual diminishes each time a new person becomes infected. Furthermore, the branching model assumes an infinite reservoir of susceptible people for the virus to spread (the process branches without regard to population constraints). This approximation holds provided that the susceptible population is not near exhaustion. We focus on the early days of an outbreak with infected population sizes small compared to the total susceptible population over time scales not hierarchically larger than the communication window. It is this limit that is most relevant to decision makers for setting workplace policies. We model the propagation of COVID-19 by assuming that people become infectious immediately after contracting the virus and that they can infect others throughout the duration of their communicable period. We assume the population is homogeneous and that each new infected individual has the same statistical properties as previously infected people. Specifically, we assume that an infected person infects Q(t) other people during the given time interval [0, t] according to a compound Poisson process, where the number of infectious events, N (t), follow a Poisson counting process with arrival rate λ, and Y i , the number infected at each event, follows the logarithmic distribution, The characteristic function for Q(t) reads, with λ = −r ln(1 − p); thus Q(t) follows a negative binomial process, In this process, during a communicable period, t, an infected individual infects Q(t) people based on a draw from the negative binomial with mean rtp/(1 − p). The infection events occur continuously in time according to the Poisson arrivals. However, the communicable period, t, is in actuality a random variable, T , which we model as a gamma process 3 with density, which has a mean ofT = at/b. By promoting the communicable period to a random variable, the negative binomial process changes into a Lévy process with characteristic function, where η(u), the Lévy symbol, and ψ(s), the Laplace exponent, are respectively given by, with, Z(t) is the random number of people infected by a single infected individual over her random communicable period. Without loss of generality, we absorb t into a (or alternatively set t = 1, representing a single lifetime) giving a mean communicable periodT = a/b. The gamma process smears out the end of the communicable period. We see that R 0 = E[Z(1)] = arp/(b(1−p)), and thus our process has the same mean as the negative binomial process with a fixed stopping time of t = a/b. In fact, since λ = −r ln(1 − p) we have the simple relationship, = Mean number of infectious events in a lifetime × (12) Mean number infected at each event. The variance of the our counting process is over-dispersed relative to the the negative binomial, The model has four parameters: 3 We apply a gamma process subordinator to the negative binomial process. • p sets the number of infected people per infectious interaction. The mean number of infected people per infectious event is, gives the arrival rate of infectious events. • a, b together set the mean communicable period,t = a/b, and determine the variance along with the skewness and kurtosis of the gamma distribution, Γ(a, b). In the limit b → 0 with a/b finite, the gamma distribution becomes a delta function at the mean time and we recover the negative binomial process evaluated at t = a/b. The characteristic function eq.(6) of the stopped stochastic process allows us to explore the model's analytical properties, which can help decision makers better understand trade-offs in small environments. Given a random characteristic χ(t), such as the number of infectious individuals at time t, (e.g., where λ x is the random communicable period) the expectation of the process follows, Defining the Malthusian parameter, α > 0, if it exists, by, we can change eq.(16) into a renewal equation, which has the solution, Thus the asymptotic behaviour of the solution is governed by the pair parameters α, and β. Recall that ξ(t) = #{k : σ(ω, k) ≤ t} counts the number of infections during the observation window [0, t] over event space Ω(ω). In our model we have, where λ and µ are respectively the Poisson arrival rate and the mean of logarithmic distribution, and where, Therefore, which leads to the expected result for the mean of direct infections per individual, Using eq.(22) and eq.(17) we find that, which we can solve for the Malthusian parameter, α, by Newton-Raphson. The asymptotic solution to eq. (16) given that the Malthusian parameter exists is, If R 0 < 1 the branching process will not experience asymptotic exponential growth, instead we can solve eq.(16) for its long term limit, In the CMJ framework, we have the generating function, with the number of infected by time t composed of infections at time σ i , The probability of extinction reads, and for our model, we arrive at the transcendental relationship, Again, we can apply Newton-Raphson and solve for the extinction probability Q. In addition to the extinction probability for our branching process, we can estimate the average number of total infected people at extinction if extinction occurs by considering the theory random graphs. The branching process is a directed bipartite graph (it is a tree) and given the generating function for the process, we know the distribution of the outgoing edges from a randomly chosen vertex. In [31] , the authors extend Erdos-Renyi constructions of random graphs to graphs with arbitrary vertex degree. They compute the mean component size for graphs, including graphs excluding the giant component, if it exists. The total number infected corresponds to the random characteristic E(χ(t)) = 1 and thus eq.(27) has the non-Malthusian growth solution, In [31] , the authors consider two generating functions, • G 0 (s): the generating function for the probability distribution of the vertex's degree; and • G 1 (s) = G ′ 0 (s)/G ′ 0 (1): the generating function for the probability distribution of the outgoing edges from a randomly chosen vertex. Eq.(6) with e iu → s is G 1 (s) in the notation of [31] and for our purposes we do not need an explicit formula for G 0 (s). The average component size of the graph, in the absence of a giant component, is [31] which matches the renewal equation solution eq.(33) if G ′ 0 (1) = G ′ 1 (1)-that is, the generating two functions intersect tangentially at s = 1. At G ′ 1 (1) = 1 a phase transition occurs and the giant component emerges. The fraction of the graph occupied by the giant component is, where Q is the extinction probability for the distribution of outgoing edges, Q = G 1 (Q). Since the fraction of the graph that does not belong to the giant component is composed precisely of those graphs which have gone extinct in our process, we impose G 0 (Q) = Q. Thus, we demand that the two generating functions intersect on the 45 • line at Q < 1 when α > 0. The average component size in this case becomes [31] , where, As Q → 1 we see that G 1 (Q) and G 0 (Q) increasingly intersect tangentially, finally becoming tangent at Q = 1, which is consistent with our observation in eq.(34). We takex to be the average size of the total infected population at extinction, if extinction occurs. The process in eq.(6) represents the spread of the disease from an infected individual without any mitigation strategies. Imagine that we can trace, contact, and isolate infected individuals with a success probability q and with an mean time to isolation ofm