key: cord-0152299-ak2gc3nz
authors: Clum, Charles; Mixon, Dustin G.
title: Parameter estimation in the SIR model from early infections
date: 2020-08-10
journal: nan
DOI: nan
sha: 6a01b1983cead2f2b390b38f2eaf0c3cb4c56c59
doc_id: 152299
cord_uid: ak2gc3nz

A standard model for epidemics is the SIR model on a graph. We introduce a simple algorithm that uses the early infection times from a sample path of the SIR model to estimate the parameters this model, and we provide a performance guarantee in the setting of locally tree-like graphs.

During an epidemic, government leaders are expected to help maintain public health while simultaneously preventing an economic meltdown. In the absence of a vaccine, decision makers must choose between various non-pharmaceutical interventions. This decision requires an informative forecast of the epidemic at a very early time. To obtain such a forecast, it is helpful to have a parametrized model for epidemics. What follows is a particularly popular compartmental model that originates from the classic work of Kermack and McKendrick [4] .

Definition 1 (SIR model). Fix a simple, connected graph G and parameters λ, µ ≥ 0. Consider a continuous-time Markov chain in which the state is a partition (S, I, R) of V (G). For the initial state, draw v ∼ Unif(V (G)) and put

In fact, (s(t), i(t), r(t)) is also a continuous-time Markov chain in this case. The initial conditions are s(0) = n − 1, i(0) = 1, r(0) = 0, and the process transitions (s, i, r) → (s − 1, i + 1, r)

with rate λis and (s, i, r) → (s, i − 1, r + 1)

with rate µi. Assuming n is large and λn =: β, then putting σ := s/n, ι := i/n, and ρ := r/n, we may pass to the mean-field approximation:

This approximation is popular because it is much easier to interact with. The approximation is good once the number of infected vertices becomes a fraction of n, and the approximation is better when this fraction is larger [5] . This suggests an initial condition of the form σ(t 0 ) = 1 − δ − γ, ι(t 0 ) = δ, ρ(t 0 ) = γ for some small t 0 , δ, γ > 0. For simplicity, we translate time so that t 0 = 0. We argue there is no hope of determining (β, µ) from data of the form {ι(t) + ρ(t)} t∈[0, ] for small > 0. (While the following argument is not rigorous, it conveys the main idea.) Notice that for t ∈ [0, ], it holds that σ(t) ≈ 1, and so ι(t) ≈ δe (β−µ)t and

Then our data takes the form

δ · e (β−µ)t =: a + be ct .

We can expect to determine a, b, and c by curve fitting. However, we don't know δ or γ, but rather their sum. As such, we claim that (a, b, c) only determines β − µ. Indeed, for every choice of (β, µ) such that β − µ = c, it could be the case that

which would then be consistent with the data (a, b, c). Of course, additional information about (β, µ) could conceivably be extracted from higher-order terms, since a + be ct is merely an approximation of ι(t) + ρ(t). However, we expect any such signal to be dwarfed by noise in the data. While the short-term behavior of ι is exponential with rate β − µ, the long-term behavior is instead governed by the quotient R 0 := β/µ, known as the basic reproductive number. This can be seen by dilating time by substituting s = µt. In this variable, the mean-field approximation instead takes the form

That is, R 0 (together with initial conditions) determines ι modulo time dilation, and notably, whether the curve ever exceeds the capacity of the medical care system. However, R 0 = β/µ cannot be determined from β − µ.

Overall, the complete graph is not amenable to determining (λ, µ) from {U (t)} t∈[0, ] . However, real-world social networks are far from complete. Like social networks, expander graphs have low degree, but considering their spectral properties, one might presume that they are just as opaque as the complete graph. Surprisingly, this is not correct! In this paper, we show how certain graphs (including certain expander graphs) are provably amenable to determining (λ, µ) from {U (t)} t∈[0, ] .

In the following section, we introduce our approach. Specifically, we isolate infections that pass across bridges in a local subgraph of the social network, and then we estimate λ and µ from these infection statistics. Section 3 gives the proof of our main result: that our approach provides decent estimates of λ and µ in the setting of locally tree-like graphs. Our proof makes use of the vast literature on SIR dynamics on infinite trees. We conclude in Section 4 with a discussion of opportunities for future work.

We start with the simple example in which G = K 2 . According to the SIR process, one of the two vertices is infected, and then it either infects the other vertex or it recovers before doing so. Let Z denote the random amount of time it takes for the second vertex to become infected. Notice that Z = ∞ with probability µ λ+µ . On the other hand, if we condition on the event Z < ∞, then the distribution of Z is exponential with rate λ + µ. (This is a consequence of the fact that the minimum and minimizer of independent exponential random variables are independent.) Notice that if we could estimate

then we could recover (λ, µ), as desired. For example, if we had access to multiple independent draws of the SIR model on K 2 , then we could obtain such estimates. For certain types of graphs, we can actually simulate this setup, and this is the main idea of our approach. In practice, we will not have the time to determine whether Z < ∞, and so we instead truncate Z ← min{Z, τ } for some threshold τ > 0. In particular, we write Z ∼ CI(λ, µ, τ ) to denote a random variable with distribution

We seek to estimate λ and µ given τ and estimates of the following quantities:

First, we show that good estimates of p and q yield good estimates of λ and µ:

Lemma 4. Suppose m := (λ + µ)τ ≥ 2, and take P and Q such that

for some > 0. Then

Proof. First, observe that

Since m ≥ 2, we have (m + 1)e −m ≤ 1/2, and so Thus,

and similarly for (1 − P )/Q.

Next, we produce estimates P and Q given independent realizations of Z:

Proof. For convenience, we put k := |A| and identify A = [k]. We have EL i = p, Var L i = p(1 − p), and L i ∈ [0, 1] almost surely. As such, we may apply Bernstein's inequality for bounded random variables (see Theorem 2.8.4 in [7] ):

Put δ = 1 − e − . Then both of the following hold with probability 1 − 2e −c kp(1−p) :

Note that this implies

Next, we estimate Q. Conditioned on A , the random variables {Z a } a∈A are all distributed like a τ -truncated version Y of a random variable X ∼ Exp(λ + µ), and there exist absolute constants C 1 , C 2 > 0 such that

As such, we may apply Bernstein's inequality for subexponential random variables (see Theorem 2.8.1 in [7] ):

The result follows from the union bound.

If we had access to the infected vertices at time t 0 , we could use the formulas in Lemma 5 to obtain estimators of the SIR parameters that provide a good approximation to the true parameters: Lemma 6. Consider the SIR model on a graph G with parameters λ and µ. Select r, t 0 , t 1 > 0 and put τ := t 1 − t 0 . Let B(r) denote the subgraph of G induced by vertices of distance at most r from U (0). The set of bridges in B(r) with one vertex in I(t 0 ) and another vertex in

For any fixed k, > 0, define the events

Proof. Let S denote the vertices in G of distance exactly r from U (0). Notice that for every vertex v ∈ V (B(r)) \ S, the edges incident to v in B(r) are precisely the edges incident to v in G. As such, the SIR processes on G and on B(r) are identical until the stopping time

Let B(r) denote the subgraph of G induced by vertices of distance at most r from the minimizer of T .

For each quantity T (v), Z a , A I , A I , B a , b(a), P, Q,λ I ,μ I defined in the statement of the lemma, there is a corresponding quantity defined by replacing the SIR process on G with the SIR process on B(r), and we denote these variables byT (v),Z a ,Ã,Ã ,B a ,b(a),P ,Q,λ,μ. Each of these variables equals its counterpart over the event

It remains to bound P(F|E 2 ). Conditioned onÃ and {b(a)} a∈Ã , then for each a ∈Ã, the Markov property implies thatZ a has distribution CI(λ, µ, τ ). Also, the vertices {b(a)} a∈Ã are pairwise distinct almost surely. Indeed, ifb(a) =b(a ), then if we delete the edge {a,b(a)}, we can still traverse a walk from a to U (0) to a tob(a ) =b(a), implying {a,b(a)} was not a bridge. As such, conditioned onÃ and {b(a)} a∈Ã , the variables {Z a } a∈Ã are jointly independent. Then Lemmas 4 and 5 together imply

Of course, in our setup, we do not have access to I(t 0 ), but rather U (t 0 ) = I(t 0 ) ∪ R(t 0 ), and so we cannot apply Lemma 6 directly. Instead, we assume that λ is sufficiently large compared to µ that U (t 0 ) is a decent approximation of I(t 0 ). This approach is summarized in Algorithm 1. As we will see, the approximation I(t 0 ) ≈ U (t 0 ) introduces some error in our estimators. To analyze the performance of Algorithm 1, it is convenient to focus on a certain (large) family of graphs. We say a graph G is (r, η)-locally tree-like if for a fraction 1 − η of the vertices v ∈ V (G), it holds that the subgraph induced by the vertices of distance at most r from v is a tree. For example, it is known that for every fixed choice of d ∈ N with d > 1 and c ∈ (0, 1 4 ), there exists γ > 0 such that a random d-regular graph on n vertices is (c log d−1 n, n −γ )-locally tree-like with probability approaching 1 as n → ∞; see Proposition 4.1 in [2] . By focusing on this class of graphs, we may apply the vast literature on SIR dynamics on infinite trees to help analyze the performance of Algorithm 1.

Theorem 7 (Main Result). Fix d ∈ N with d ≥ 3 and c, γ > 0, and consider any sequence of d-regular, (c log d−1 n, n −γ )-locally tree-like graphs on n vertices with n → ∞. Select any α and β such that 0 < α < β < c 2e(d−1)λ , and put

For each n, let E 0 denote the event that the subgraph B(r) induced by vertices of distance at most r from U (0) is a tree, and let E ∞ denote the event that U (∞) contains a vertex of distance greater than r from U (0). Suppose λ ≥ 6µ > 0. Then one of the following holds:

, meaning there is a subsequence of n for which, with probability at least 1 − o(1), no vertex outside of B(r) will ever be infected, in which case the infection does not spread to even a constant fraction of the graph.

(b) For each n, the following holds:

with probability tending to 1 as n → ∞.

. The proof is given in the next section, and Figure 1 illustrates the actual behavior of Algorithm 1 for comparison. The factors of 8 in (1) are due to the approximation I(t 0 ) ≈ U (t 0 ), and they have not been optimized. We suspect that these factors can be replaced by terms that approach 1 as d → ∞, but this requires a different technique. We also suspect the hypothesis λ ≥ 6µ is an artifact of our proof. As the following lemma indicates, the threshold (d − 2)λ > µ would be more natural; this threshold arises from standard results on Galton-Watson processes (see the lecture notes [1] , for example).

Lemma 8. Consider any sequence {G n } of d-regular graphs on n vertices with n → ∞ such that G n is (r n , η n )-locally tree-like with r n → ∞ and η n → 0. For each n, consider the SIR model on G n with parameters λ and µ, and let E ∞ denote the event that U (∞) contains a vertex of distance greater than r from U (0).

Proof. Restrict to the event E 0 , and put {v} = U (0). Deleting v from B(r n ) produces d connected components, each of which can be viewed as a subgraph of the infinite (d − 1)-ary tree T that is rooted by the corresponding neighbor w of v. The SIR evolution on T determines a Galton-Watson process that gives the eventual number Z m of unsusceptible vertices at distance m ≥ 1 from w:

where X m,k denotes the number of vertices infected by the kth infected vertex in T that has distance m from w. The random variables {X n,k } n,k≥1 are independent with distribution matching a random variable that we denote X. We can describe the distribution of X as follows. Draw random variables T ∼ Exp(µ) and C 1 , . . . , C d−1 ∼ Exp(λ). If N denotes the number of k ∈ [d − 1] for which C k < T , then N is distributed like X. Conditioned on T , this number is a binomial with parameters d − 1 and 1 − e −λT . Hence,

Since in addition, it holds that P{X > 1} > 0, Theorem 1.7 in [1] gives that ∞ m=1 Z m is finite almost surely. Put M := sup{m : Z m > 0}. Then M < ∞ almost surely. In particular, a union bound over the d different neighbors of v gives

where the last step uses the fact that the survivor function of M vanishes at infinity.

Restrict to the event E 0 , and put {v} = U (0). As before, we identify a Galton-Watson process to analyze, but this one is slightly different: Delete one of the neighbors of v from B(r n ) and identify the connected component C containing v with a subgraph of a (d − 1)-ary tree T with root v. The SIR evolution on T determines a Galton-Watson process that gives the eventual number Z m of unsusceptible vertices at distance m ≥ 1 from v:

where X m,k denotes the number of vertices infected by the kth infected vertex in T that has distance m from v. The random variables {X m,k } m,k≥1 are independent with distribution matching a random variable that we denote X. We see that P(E c ∞ |E 0 ) is at most the extinction probability q of this process. By Theorem 1. 

it follows that q ≤ p 0 1−p 0 −p 1 . Before estimating p 0 and p 1 , it is helpful to introduce some notation. An infected parent with d − 1 children infects X of these children. The parent recovers exponentially with rate µ, and we let R denote the recovery time. Simultaneously, each child is infected exponentially with rate λ, and so we denote independent random variables I i ∼ Exp(λ) such that

gives the time of infection for child i. Then

Next, the order statistic I (2) has the same distribution as 1

This identity combined with the fact that x → µx µ+λx is an increasing function gives

3 Proof of Theorem 7

The last statement follows from Lemma 8. To prove the remainder of the result, we will assume that (a) does not hold and prove that (b) holds. Since (a) does not hold, there exists some κ > 0 such that P(E 0 ∩ E ∞ ) > κ for all sufficiently large n. Select n sufficiently large, and let F denote the failure event that (1) does not hold for some o(1) function to be identified later. We wish to show that P(F|E 0 ∩ E ∞ ) = o(1). Let E 1 denote the event that U (t 1 ) ∪ ∂U (t 1 ) is contained within distance r of U (0). In particular, on the event E 0 ∩ E 1 , all of infected and recovered vertices at time t 1 reside in a tree with root U (0). Also, selecting k := α(µ+λ)

· c log d−1 n, let E 2 denote the event that |U (t 0 )| ≥ k. We will make use of two simple inequalities involving arbitrary events A, B, and C. First,

Furthermore, if P(C c |B) = 1, then

Two applications of (2) gives

To bound the third term in (4), we let P λ,µ denote the probability measure corresponding to the SIR model with parameters λ and µ. Then, for sufficiently large n, we have

As such, it suffices to show that P λ,0 (E 0 ∩ E c 1 ) = o(1). We accomplish this by analyzing the corresponding branching process:

Lemma 9. Let G denote the infinite d-ary tree. Consider the process {H t } t≥0 of induced subgraphs of G in which H 0 is induced by the root vertex, and then for each v ∈ V (G)\V (H t ) that is a G-child of some vertex in H t , it holds that v is added to H t at unit rate. Let B m denote the first time at which a vertex of distance m from the root vertex of G is added to H t . Then B m ≥ m/(2ed) with probability ≥ 1 − e −m/2 .

Proof. Let B m,r denote the rth time at which a vertex of distance m from the root vertex of G is added to H t , and note that B m = B m,1 . Let c, θ > 0 be given (to be selected later). Then Markov's inequality gives

where the last identity, which appears in Theorem 1 in [5] , follows from analyzing a certain martingale. In our case, X 0 := B 1,1 ∼ Exp(d) and X r := B 1,r+1 − B 1,r ∼ Exp(d − r) for r ∈ {1, . . . , d − 1} are independent. It follows that

Overall,

Selecting θ = ed and c = 1/(2ed) gives the result.

Lemma 10. Suppose G is (r, η)-locally tree-like, consider the SIR process on G with µ = 0, and take any t 1 ≤ r−2 2e(d−1)λ . Then P(E c 1 |E 0 ) ≤ 2e −(r−2)/2 .

Proof. By time dilation, we may take λ = 1 without loss of generality. Condition on E 0 . After time T ∼ Exp(d), there are two infected vertices u and v. Removing the edge {u, v} from H produces two (d − 1)-ary trees, with root vertices u and v. Extend these trees to infinite (d−1)-ary trees H u and H v . Let B u denote the first time at which a vertex of distance r − 2 from the root vertex is infected in H u , and similarly for B v . Then by assumption on t 1 , we have

Finally, we apply the union bound and Lemma 9 to get

Overall, (5) and Lemma 10 together give P(E c 1 |E 0 ∩ E ∞ ) = o(1). Next, we may combine this bound with (3) to bound the second term in (4): (1) . To this end, it is convenient to consider the stopping time T 0 := inf{t :

t 0 and c < 1, it follows that

Next, our assumptions that α < c 2e(d−1)λ , (d − 2)λ > µ, and c < 1 together imply

from which it follows that k < r. On the event E ∞ , it holds that U (∞) induces a tree that contains a path of length greater than r, from which is follows that |U (∞)| > r. As such,

This allows us to continue (7):

To continue, we show that P(

Proof. The random number N of transitions that occur over the interval (0, T 0 ] is given by

Conditioned on the sequence of states of the discrete time Markov chain {M n } n≥0 , the transition times {X n } n≥1 are independent and exponentially distributed with (deterministic) parameters λe(I n , S n ) + µ|I n |. For any sequence of states in the event {T 0 < ∞}, the first N of these parameters are all at least µ + λ. Put E := {T 0 < ∞} and denote P E (A) := P(A|E). Then

Put t 0 := 4k µ+λ . Conditioning on N , we may apply Bernstein's inequality for subexponential random variables (see Theorem 2.8.1 in [7] ). In particular, we let C > 0 denote a universal constant such that any random variable of the form X ∼ Exp(λ) satisfies X − EX ψ 1 ≤ C λ . Then

where the last step applies the bound N ≤ 2k and c 1 := 2c max(C 2 ,C) .

Overall, (6), (8) and Lemma 11 together give that the second term in (4) is o (1) . It remains to show that the first term in (4) is o (1) . To this end, we first show that P(

where the last step applies (8) and Lemma 11. Combining this with (5) and Lemma 10 after a union bound then gives

as claimed. As such, it suffices to show that

. For this, it is convenient to consider the event A t 0 = {e(I(t 0 ), S(t 0 )) > 0} that the infection is still spreading at time t 0 . Since E 1 is the event that the infection stays within B(r) up to time t 1 (i.e., after t 0 ) and E ∞ is the event that the infection eventually escapes B(r), it follows that

Since λ ≥ 6µ and d ≥ 3, we may select any ∈ ( 2µ µ+λ , 1 − 2 d ); we will refine our choice later. Defining B t 0 := {|R(t 0 )| < |U (t 0 )|}, we then have

We bound the first term of (9) by analyzing the underlying discrete time Markov chain: Almost surely, the end state of this process takes the form (S, ∅, V (G) \ S). Importantly, {M n } n≥0 is a discrete time Markov chain in which, conditioned on M n , it holds that R n+1 strictly contains R n with probability µ|I n | λe(I n , S n ) + µ|I n | .

We will consider this process until the stopping time N := inf{n : e(I n , S n ) = 0}.

Specifically, for each n ∈ {1, . . . , N }, let X n indicate whether the nth transition recovers a vertex, i.e., R n strictly contains R n−1 . For each n > N , let X n be Bernoulli with success probability µ µ+λ , all of which are independent of each other and of M N . Put M n := M min(n,N ) . Let P n denote the (random) probability measure conditioned on the state history {M j } n j=0 . Notice that in the event {n < N }, we have the bound P n {X n+1 = 1} = P{X n+1 = 1|M n } = µ|I n | λe(I n , S n ) + µ|I n | ≤ µ µ + λ almost surely. Meanwhile, in the complementary event {n ≥ N }, X n+1 is Bernoulli with success probability λ µ+λ and independent of M n , and so P n {X n+1 = 1} = P{X n+1 = 1|M n } = P{X n+1 = 1} = µ µ + λ almost surely. Overall, P n {X n+1 = 1} ≤ µ µ+λ almost surely. Next, let J denote a set of positive integers j 1 < · · · < j m . Then the law of total probability gives

Next, P jm−1 {X j = 1 ∀j ∈ J \ {j m }} ∈ {0, 1}, and so

Take expectations of both sides and apply induction to get

Next, let N 0 denote the number of transitions that have occurred by time t 0 . In the event A t 0 , it holds that N 0 < N , and so |R(t 0 )| = N 0 j=1 X j . Also, we have N 0 ≤ 2|U (t 0 )|. As such,

µ+λ . This then gives Overall, Lemma 12 gives that the first term in (9) is o(1). It remains to bound to second term in (9). To do so, we restrict to the event E 0 ∩ E 1 ∩ E 2 ∩ B t 0 and argue that F occurs with probability o(1). We adopt the notation from Lemma 6 and Algorithm 1. Since B(r) is a tree, A consists of the vertices in U (t 0 ) that have a neighbor in S(t 0 ), while A I = A ∩ I(t 0 ). For every a ∈ A \ A I ⊆ R(t 0 ), since a cannot infect b(a), it holds that Z a = τ . It follows that Q = Q I and P = |A I | |A| P I , and sô λ λ I = P P I = |A I | |A| ,μ µ I = 1 − P 1 − P I = 1 − |A I | |A| P I 1 − P I = 1 + 1 − |A I | |A| · P I 1 − P I .

As such,λ λ I ≤ 1 andμ µ I ≥ 1. To obtain bounds in the other directions, we require a lemma: 

In this paper, we introduced a simple algorithm to estimate SIR parameters from early infections. There are many interesting directions for future work. First, we do not believe that Theorem 7 captures the true performance of Algorithm 1, especially in light of Figure 1 , and this warrants further investigation. Next, it would be interesting to consider other types of estimators. Notice that since Algorithm 1 explicitly makes use of certain properties of the underlying graph, it is clear why it fails for the complete graph. Does the behavior of the maximum likelihood estimator have a similarly transparent dependence on the underlying graph? We focused on locally tree-like graphs in part because there is a rich literature on SIR dynamics over infinite trees, but it would be interesting to analyze the performance of Algorithm 1 on other graph families. Also, there is a multitude of compartmental models for epidemics with various choices of probability distributions for transition times between compartments. Finally, one might consider alternative models for what data is available.

For example, to model asymptomatic infections, one might assume that a random fraction of infected vertices are not known to be infected.

The simple Galton-Watson process: Classical approach

Local Kesten-McKay law for random regular graphs

The first birth problem for an age-dependent branching process

A Contribution to the Mathematical Theory of Epidemics

Limit theorems for sequences of jump Markov processes

Early transmission dynamics in Wuhan, China, of novel coronavirusinfected pneumonia

High-dimensional probability: An introduction with applications in data science

The authors thank Boris Alexeev for interesting discussions that inspired this work. DGM was partially supported by AFOSR FA9550-18-1-0107 and NSF DMS 1829955.

Since U (t 0 ) induces a subtree of G, Lemma 13 gives that |A| ≥ (1 − 2 d )|U (t 0 )|, and sowhere the last two steps use the fact that d ≥ 3 and the choice = (1 + 0 ) 2µ µ+λ for some small 0 > 0. It follows thatwhere the last two steps use the facts that λ ≥ 6µ and 0 is small. Next, considerSincek and τ both increase with factors of log n, Lemma 6 implies that (λ I ,μ I ) converges to (λ, µ) in probability. As such,where the last step holds when n is large. The result then follows from the fact that (λ I ,μ I ) converges to (λ, µ) in probability.