key: cord-0264308-grakaxc3
authors: Chu, A.; Huber, G.; McGeever, A.; Veytsman, B.; Yllanes, D.
title: A random-walk based epidemiological model
date: 2020-11-06
journal: nan
DOI: 10.1101/2020.11.04.20226308
sha: fa35304b125de5b278e1d58d44707329f8a06eb5
doc_id: 264308
cord_uid: grakaxc3

Random walkers on a two-dimensional square lattice are used to explore the spatio-temporal growth of an epidemic. We have found that a simple random-walk system generates nontrivial dynamics compared with traditional well-mixed models. Phase diagrams characterizing the long-term behaviors of the epidemics are calculated numerically. The phase boundary separating those sets of parameters leading to outbreaks dying out and those leading to indefinite growth is mapped out in detail. The functional dependence of the basic reproductive number R0 on the model's defining parameters reveals the role of spatial fluctuations and leads to a novel expression for R0. Special attention is given to simulations of inter-regional transmission of the contagion. The attack rate and the (growing) radius of gyration of the affected zones are used as measures of the severity of the outbreaks, in cases where R0 is not sufficiently prescriptive to chart the epidemic dynamics.

Classic epidemiological models 1, 2 often assume panmictic populations: every infected person has an equal chance to affect any other person in the population. While these models have been successful in describing an ideal dynamics of epidemic spread, they do not account for inhomogeneity: an infected person has higher probability to transmit the disease to a member of their household 3, 4 or to a person in their locale. The overall reduction of long-distance travel due to the COVID-19 pandemic makes the latter cause of inhomogeneity especially relevant: many infected persons (but not all) spread the infection only in their immediate vicinity. Likewise, the local neighborhoods of both susceptible and infected persons are dynamic -they vary in space and time depending on the local state of the epidemic. These spatio-temporal inhomogeneities are studied in this paper.

There are two approaches to geographical inhomogeneity: detailed analyses based on contact-tracing and mobility data, 5, 6 or analyses based on stylized models. 4, 7, 8 The first approach, while potentially highly accurate, requires a large number of parameters and is not robust with respect to unforeseen changes in mobility patterns. The second approach, in contrast, requires a small number of well-defined parameters and provides a sound intuition about their effects on the outcome. It is especially useful when planning interventions and policy changes, or investigating various "what if" scenarios. It is the latter approach that we have taken here.

In the present paper, we treat a very simple model: a random walker (modeled on an infected person) takes τ steps (a positive integer) on a 2D square lattice. After each step, the visited site can become infected with probability p. In this case, at the site of the new infection, it branches off another random walker, also with lifetime τ, that can also further spread the contagion to further sites, and so on. We study the resulting epidemic in space and time.

This model depends on two parameters: τ, representing the length of time of the infectious period, and p, representing the infectiousness. It also depends on the "walking pattern", which may include steps to the neighboring sites or jumps to more distant locations.

Panmictic models predict that the spread of epidemics depends only on the basic reproduction number R 0 , the number of infected persons produced by one walker in a fully susceptible population, which, in turn, can be estimated in the panmictic approximation as

We will refer to this estimate as the naïve approximation. However, the random-walk model exhibits a surprisingly rich behavior, well beyond the "naïve approximation" above. We will show that the spread of the epidemics depends on the interplay between p and τ. Moreover, the detailed geometry of the epidemics depends on the stochasticity of individual walks and the nature of random walks on the plane. In subsequent sections, we describe our model in full, study its behavior using numerical simulations and theory, and draw conclusions for real-world epidemics.

We model the geographic distribution of the population by an infinite 2D square lattice. At any given time, each site can be in one of three states. Although the model proposed here is fundamentally different than an SIR model, the same nomenclature is adopted; namely, each site in the lattice is either susceptible, infected, or removed. (Figure 1(b) ). All sites are initialized as susceptible, with the exception of the origin which contains a single infectious agent. An infected site generates a brand new random walker, which starts to move on the lattice (Figure 1(a) ). If the walker lands on a susceptible site (sites are represented as square cells in the figure), it becomes infected with probability p. After τ steps an infectious agent becomes removed. Thus, there are two fundamental parameters that control the outbreak: p, the probability of infection, and τ, the number of steps for which an infective agent (walker) is active before being removed.

Another feature of the model is the way the infectious agent chooses its next move. The model is quasilocal: most of the time the walker moves to one of the eight adjacent sites, but sometimes it makes a longer jump. Namely, if the walker is at r 0 = (x 0 , y 0 ), its next location, r, is given by

with u and θ drawn from uniform distributions:

Here X is the integer part of X (the largest integer number not exceeding X), and U(a, b) is the uniform distribution in the interval [a, b]. 1 This function produces a one-step walk to one of the eight adjacent sites approximately 66.8% of the time; 95.7% of jumps are within a distance of 2 sites. The random-variate generator for the jump distances r in eq. (2) is derived from a probability density function of the form φ (r) = A/r 4 , where the constant A is of order the lattice constant cubed a 3 (in this paper, a = 1 is chosen throughout). The exponent of −4 corresponds to interactions of the form ∼ 1/r d+σ when d = 2, σ = 2, a case of special interest in a number of early studies, 9, 10 because it sits at the margin between long-range and short-range interactions. However, in the context of epidemic models, as noted by Bunde et al. 11 , Grassberger 7 , and Hallatschek and Fisher 8 , it presents no special difficulties.

In the initial stage of the outbreak, the infected person (the index case) is completely surrounded by the susceptible population.

The key parameter describing this state is R 0 : the number of people infected by the index case. The value of R 0 depends on p and τ in some form, but the naïve approximation, R 0 ≈ pτ, must fail, as can be seen whenever the path of even a single walker self intersects; i.e., the approximation does not account for the recurrence of a two-dimensional random walker. We ran 100 000 realizations of the simulation, collecting how many agents were infected by the index case. Table 1 shows the R 0 calculated for p and τ tabulated on log-log axes, 2 an idea that will be explored further in the discussion of the phase diagram. The breakdown of the naïve approximation is evident in the lack of constancy of R 0 along the diagonals of the table, which is particularly apparent when pτ is large. 

where K and τ 0 are constants that depend on how the walker's next jump is selected. For the jump generator in equation (2) we have

As shown in Figure 2 , this formula is in excellent agreement with the numerical experiments without any adjustable parameters.

The progression of the outbreak beyond the initial stage is illustrated by Figure 3 . Depending on the parameters, the outbreak may die out or spread indefinitely, and it is a natural starting point to describe this dependence by a phase diagram. Phase diagrams are used in physics and related fields as a visualization of how pressure and temperature (or other thermodynamic variables) affect the bulk thermodynamic state of a substance. Here, phase diagrams are used in an analogous manner to provide insight into how p and τ affect the long-term behavior of an epidemic (i.e., how likely it is to die out after a certain amount of time has passed). In contrast to traditional phase diagrams, it is challenging to define discrete phases for the behavior of an epidemic, so the measures used in this section of our study are continuous variables with finite gradients between phases.

Each point in the phase diagram describes the expected behavior of an epidemic with the given p and τ inputs. The data from running 100 realizations of each of these outbreaks were used to produce a histogram of the growth rate (defined as the number of new infections) at the final time step, as shown in Figure 7 in the Appendix. The final time step is chosen by the following function:

This sliding criterion adjusts for outbreaks with longer τ, i.e., when infective agents have longer lifespans. Notably, the histogram exhibits a large spike at zero new infections, which represents the fact that a significant fraction of outbreaks will die out quickly, since, due to the intrinsic stochasticity of our model, not every infection reaches epidemic-level proportions. From this histogram, a cutoff of 10 new infections at the final time step is used as a proxy for determining whether an outbreak has

All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 6, 2020. ; https://doi.org/10.1101/2020.11.04.20226308 doi: medRxiv preprint 

All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 6, 2020. ; https://doi.org/10.1101/2020.11.04.20226308 doi: medRxiv preprint phase variable = number of realizations with growth rate below cutoff total number of realizations .

Conveniently, the phase variable is already normalized on the scale 0 to 1, and represents what fraction of epidemics, with a given p and τ, are expected to die out. The plot of this phase variable is shown in Figure 4 . The determination of the phase boundary allows us to quickly predict the behavior of the system in the regions above or below the boundary, even in the absence of extensive simulations. For each horizontal cross-section of the phase diagram (corresponding to a 1D phase diagram for a fixed value of τ), we have fitted the phase variable as a function of p to a smooth interpolating function in order to find the location of the phase boundary, defined as the point with p = 0.5. The resulting phase boundary is well approximated by a hyperbola in the p, τ plane. In particular, a least-squares fit gives the curve p boundary ≈ 2.0τ −1.1 boundary . It is customary in epidemiology 2 to predict the progression of epidemics based on the value of R 0 . Accordingly, on Figure 4 "iso-R 0 " lines were overlaid on the phase diagram: each point on an iso-R 0 line has p and τ values which produce outbreaks with a common R 0 . As expected, the phase boundary indeed coincides with an iso-R 0 line, having R 0 ≈ 1.39 (and not R 0 = 1, because of the role of fluctuations inherent in a particle-based, spatial model). As shown in the Methods section, this finding corresponds to a robust theoretical prediction.

In order to complement the mapping out of the phase diagram, the random-walk outbreak model was also used to simulate real-world infectious dynamics. In this section, the effects of having multiple regions with separate p values (representing adjacent states or provinces with different distancing or shelter-in-place policies) is explored. The spatial aspect of this scenario is simplified to two infinite half-planes with distinct p values.

Most notably, the random-walk model corroborates the hypothesis that it is possible for an infection to grow very slowly (constantly on the brink of completely dying out) until it eventually reaches a region with a higher value of p, upon which it exhibits rapid growth. This is demonstrated in Figure 5 . Moreover, there is also a significant backflow of infectors returning from the high-p region to the original low-p region.

In this section, we explore two alternative metrics to characterize the incidence and growth of an epidemic. First, the attack rate is generally defined as the proportion of exposed susceptibles that have been infected. Given the set of infected sites, I, the set of removed sites, R, and the set of visited sites (which includes infected and removed ones), V , we may define the attack rate in the context of our lattice-based model. Using |X| to represent the cardinality of set X, attack rate is calculated as

In our spatial model, we can also study the growth of the affected area. To this end, a simple and intuitive metric is the

All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 6, 2020. ; https://doi.org/10.1101/2020.11.04.20226308 doi: medRxiv preprint 

The attack rate and radius of gyration for selected p and τ values along two cross-sections of the phase diagram near the phase boundary were plotted and are shown in Figure 6 . These two transects of the phase diagram were chosen to capture the effects of crossing the boundary from different positions. For each transect, 10 different p, τ pairs are sampled and then the attack rate and radius of gyration are calculated for each pair. These computations are averaged from 1000 realizations of the model for 800 time steps. For the fixed-τ interval, τ = 50 and p ranges from 0.01 to 0.10 (in increments of 0.01). For the fixed-p interval, p = 0.4 and τ ranges from 1 to 10 (in increments of 1).

Figures 6(b) and 6(d) show how the attack rate increases with p and τ. Notice that for very small values of τ, below the phase boundary, the attack rate can decrease with time. This is because we normalize by the number of visited sites, which is always one at start but increases with time. The attack rate of eq. (8), hence, is initially equal to p but can decrease for dying epidemics as more sites are visited but not infected.

The primary result from figure 6(c), which is shown in log-log scale, is the linear growth of the radius of gyration with time for developing epidemics. The curves for τ = 1, τ = 2, and τ = 3 are below the phase boundary, and, hence, they plateau, indicating that the infection has died out and the radius is no longer increasing. Figure 6 (e) shows largely the same results as figure 6(c). However, this plot has a more punctuated change after t = τ = 50, when several infections begin to plateau and die out. This behavior is explained by the low p values, which cause most outbreaks to stop growing after the lifetime of the initial walker expires at t = 50. This linear growth of R g translates into a quadratic, rather than exponential, growth for the cumulative incidence of the epidemic, as observed in recent works. [12] [13] [14] 

We see that a simple lattice model provides a number of interesting insights about epidemic spread. One of the most useful features of idealized models like this one is that they provide an understanding of which factors are salient for the outcome, and which are not. This understanding can be used in the construction of more detailed and realistic models, in the same way that a pencil sketch is useful when starting a full-size oil painting.

One of the most important questions when studying epidemics is whether the value of R 0 is sufficient to describe the epidemics behavior, or whether a detailed analysis of the contact network is necessary. Our model shows that the answer depends on what aspect of the epidemic is interrogated. The spatial features of the outbreak are not determined by R 0 alone, but depend on the the individual infectivity and the number of individual contacts separately. In contrast, the phase boundary between infinite spread and localization of the epidemic is mainly determined by R 0 , though, as a consequence of having a single index case, not by the canonical value R 0 = 1.

A significant idealization in our model is the motion of the infective agents: the random walkers can go anywhere on the lattice and do not have a memory of their origins. In real life, people tend to have permanent residences and return there daily. These periodic movements would work to make the outbreaks even more localized than those in our model. Introducing more realistic walking patterns is, therefore, a compelling direction for future work. Another interesting possibility would be to use the end state of one simulation, with a lattice including both removed and still susceptible sites, as the initial condition to simulate the effect of subsequent epidemic waves. Such a numerical experiment has a direct bearing on the question of thresholds to herd immunity in a fully spatial model, a topic that has received scant attention.

All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 6, 2020. ; https://doi.org/10.1101/2020.11.04.20226308 doi: medRxiv preprint Figure 6 . Study of the radius of gyration of the set of infected sites and of the attack rate performed across two transects of the phase diagram. 6(a) Location of the two transects in the phase diagram that are examined in detail. 1000 outbreaks of 800 time steps are simulated for each p and τ combination and the attack rate and radius of gyration are calculated. Note that both of the studied transects cross the boundary between phases. The boundary is located at τ = 50, p = 0.0337 and τ = 5, p = 0.4. 6(b) The attack rate for the fixed-p interval. 6(c) The radius of gyration for the fixed-p interval. 6(d) The attack rate for the fixed-τ interval. 6(e) The radius of gyration for the fixed-τ interval.

All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 6, 2020. ; https://doi.org/10.1101/2020.11.04.20226308 doi: medRxiv preprint

The phase boundary introduced above separates the cases where an outbreak dies out from those where it spreads indefinitely. Let P 0 be the probability that the outbreak dies out given one index case. We assume that the number of people infected by the index case has a Poisson distribution with mean R 0 . The meaning of R 0 is thus the same as in classical epidemiology: the number of people infected by the index case in a naïve population. 2 The probability that the number of secondary infections is zero, and that the outbreak dies out immediately, is exp(−R 0 ). The probability that the index case infects exactly n persons is exp(−R 0 )R n 0 /n! If the index case infected n persons, the epidemic dies out only if each secondary outbreak caused by each of these n persons dies out. The probability of such an event P 1 is, strictly speaking, different from P 0 since the secondary infections operate in a different environment than the index case. However, we neglect this and stipulate that P 1 ≈ P 0 . The probability that n secondary outbreaks die out is therefore P n 0 . Then we can write down

At 0 ≤ R 0 ≤ 1 the only solution for this equation is P = 1. This means that at R 0 ≤ 1 the epidemics is always localized. At R 0 > 1 equation (10) gets a solution between 0 and 1. If we draw the phase boundary at the point where exactly 50% of outbreaks survive, then the critical value of R 0 is R 0 = 1.39. This agrees extremely well with Figure 4 . If we have N index cases, the probability that the epidemic is localized is, to first approximation, P N 0 . At large N this probability quickly approaches zero, for R 0 > 1. It is important to note that the phase boundary, unlike the other features of the epidemic, depends only on R 0 .

Asymptotics for R 0 In this section, we derive a theoretical estimate for R 0 : the number of infections produced by one random walker in a susceptible population. We use ideas similar to those in [15] [16] [17] .

Let us consider a walker that starts at the point x. Let C(τ, m, x, w, y) be the probability for this walker to end up at the point y after making τ steps, while visiting the point w exactly m times. The probability that the walker does not infect the site w during this trip is (1 − p) m , since the probability to infect at each visit is p. Summing over w, y and m, we get the total number of infected sites as

where C (τ, p, x, w, y) = ∞ ∑ m=0

(1 − p) m C(τ, m, x, w, y).

This function is (up to a normalization) the generating function introduced in Rudnick and Gaspari [17, Section 3.2] . To calculate it, we introduce a transformation similar to the transition from the canonical ensemble to the grand canonical ensemble in statistical physics. Namely, let us take a complex variable z and write down the series:

The inverse transformation is

where the complex integral is taken around the point z = 0. As we shall see below, the value of this integral is defined by the singularity of H at z → 1. Let P(τ, x, y) be the probability for a random walker starting at the point x to end at the point y. We introduce two functions G(z, x, y) = ∞ ∑ τ=0 z τ P(τ, x, y),

A remarkable theorem [17, Section 3.3] states that function H can be expressed through the functions G and G 1 :

H(z, p, x, w, y) = G(z, x, y) − G(z, x, w)G(z, w, y) 1 + pG 1 (z, w, w) .

All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 6, 2020. ;

Integration of G over y and w is easy due to the normalization. The only nontrivial contribution is from the function G 1 in the denominator of this expression. In two dimensions, this function can be expressed as [17, Section 3.5]

where χ(q) is the structure factor, i.e. the Fourier transform of the jump probability function. More precisely, let p(x) be the probability for the walker to go from the point 0 to the point y in one step. We then define the structure factor χ(q) as

At small q, we can expand the structure factor as

which gives near the singularity z → 1

Here q 0 ≈ 1 is the upper limit in the integral (17) (assuming the lattice constant to be 1). Using these asymptotics, and integrating equations (11), (14) , and (16), we get equation (4) with

Note that we use a different normalization from the one used in the Rudnick and Gaspari 17 : they count the number of random walks, whereas we count the probability. This means that the singularities of the functions H, G and G 1 are at z = 1 instead of z = z c , and that χ(0) = 1. The coefficients K and τ 0 depend only on the structure factor χ. To calculate these two, let us generate a large number N of jumps according to equations (2) and (3). Let us take q along the x axis. If jump number i lands at the point r i = (x i , y i ), then it contributes the term

to χ. Due to symmetry, the contribution of the second term is zero, and we arrive at

In our experiments, we generate N = 10 6 points, which gives c ≈ 0.458, and the resulting values of K and τ 0 in equation (5).

Code and data for figures and tables can be found at https://github.com/czbiohub/random-walk-epidemic-model.

Histogram of epidemic growth rate across several realizations of the outbreak for p = 0.5 and τ = 4. Note that 1000 independent runs are used here to provide additional detail for the reader, but the model uses 100 runs for computational efficiency.

All rights reserved. No reuse allowed without permission.

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted November 6, 2020. ; https://doi.org/10.1101/2020.11.04.20226308 doi: medRxiv preprint

A primer on stochastic epidemic models: Formulation, numerical simulation, and analysis

The mathematics of infectious diseases

Transmission of SARS-COV-2 infections in households -Tennessee and Wisconsin

A minimal model for household effects in epidemics

Mobility network modeling explains higher SARS-CoV-2 infection rates among disadvantaged groups and informs reopening strategies

No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity

Modelling the impact of testing, contact tracing and household quarantine on second waves of COVID-19

Two-dimensional SIR epidemics with long range infection

Acceleration of evolutionary spread by long-range dispersal

Critical exponents for long-range interactions

Percolation with long-range interactions

Universality classes for spreading phenomena: A new model with fixed static but continuously tunable kinetic exponents

Fractal kinetics of COVID-19 pandemic

The COVID-19 pandemic: growth patterns, power law scaling, and saturation

Differences in power law growth over time and indicators of COVID-19 pandemic progression worldwide

Number of distinct sites visited by N random walkers

Territory covered by n diffusing particles

Elements of the Random Walk: An Introduction for Advanced Students and Researchers

Competing interests: The authors declare no competing interests.