key: cord-0444229-ei6zah9c
authors: Tran, Ky; Yin, George
title: Optimal Control and Numerical Methods for Hybrid Stochastic SIS Models
date: 2021-01-09
journal: nan
DOI: nan
sha: 6a1509c48614387ea671501840d036e548d31d9d
doc_id: 444229
cord_uid: ei6zah9c

This work focuses on optimal controls of a class of stochastic SIS epidemic models under regime switching. By assuming that a decision maker can either influence the infectivity period or isolate infected individuals, our aim is to minimize the expected discounted cost due to illness, medical treatment, and the adverse effect on the society. In addition, a model with the incorporation of vaccination is proposed. Numerical schemes are developed by approximating the continuous-time dynamics using Markov chain approximation methods. It is demonstrated that the approximation schemes converge to the optimal strategy as the mesh size goes to zero. Numerical examples are provided to illustrate our results.

This work focuses on optimal controls and numerical methods for a class of stochastic epidemic models under regime switching. A SIS system stems from the compartmental models [18] , which subdivides the population into susceptible (S) and infected (I) classes. The word SIS is an abbreviation of susceptible-infected-susceptible. That is, a susceptible individual becomes infected, and later on becomes susceptible again; see [1, 4, 18] for detailed discussions, and [13] and references therein for different applications. The models include in particular diseases being sexually transmitted and diseases being transmitted by bacteria, in which there is no permanent immunity. For such diseases, a promising model is the following classical deterministic SIS epidemic model dS(t) = µN − βS(t)I(t) + γI(t) − µS(t) dt, S(0) = s 0 , dI(t) = βS(t)I(t) − µ + γ I(t) dt, I(0) = i 0 , (1.1) subject to S(t) + I(t) = N, along with the initial values S(0) = s 0 > 0 and I(0) = i 0 > 0, where S(t) and I(t) are the numbers of susceptible and infected individuals at time t in a population of size N, respectively; µ and γ −1 are the average death rate and the average infectious period, respectively. The parameter β is the disease transmission coefficient with β = λ/N and λ being the disease contact rate of an infective individual. More specifically, λ is the per day average number of adequate contacts of an infective so that after an adequate contact with an infective, a susceptible individual becomes infected. There is a variety of applications of SIS models. To mention just a few, we refer to [12] for a white noise parameter perturbation of system (1.1), and [13] for a SIS model with Markovian switching; for SIS models with vaccination we refer to [15, 22, 34, 33] . All the papers above focus on the asymptotic properties of the diseases. The reader can also find related works on long-time behaviors of various epidemic models in [9, 26, 30] . Although SIS epidemic models and various epidemic models have been well studied, the work on optimal control of stochastic epidemic models is relatively scarce and the problem is largely open. The objective of such control problems is to identify effective strategies for minimizing the impacts of infectious diseases through a set of mixed control strategies including treatments, vaccination, isolation, and health promotion campaigns, etc. Most of the published papers focus on deterministic epidemics and the primary method applied to solve the associated control problems is Pontryagin's maximum principle. To mention just a few, we refer to [25] for a work in the early days of this area, [2, 5, 6, 8, 14] for a deterministic SIR model with isolation and/or vaccination, [7, 28] for controlled epidemic network models. In [17] , the authors studied two stochastic SIS models with external parameter noise and scaled additive noise to minimize the long term average costs. In [19] , the authors investigated SIS meta-population models on networks by comparing different approaches numerically. The recent work [11] studies a SIS model under complete and incomplete observation of the state process by investigating the associated Hamilton-Jacobi-Bellman equation and the Kol-mogorov forward equation respectively. Recently, increasing attention has been devoted to optimal control of the deterministic SIR model due to the appearance of the highly infectious diseases such as COVID-19; see [3, 10, 24] . To the best of our knowledge, not much attention has been given to stochastic hybrid SIS models to date. The main difficulties come from the complexity of the model. Even for the stochastic SIS without regime switching, the methods and results in [11] only works under a set of strict assumptions on the model and the cost function. In this work, we focus on a stochastic SIS model under Markovian switching. We develop numerical approximation schemes by approximating the continuous-time dynamics by Markov chains and then showing that the approximations converge to the correct optimal strategy as the mesh size goes to zero. Note that the addition of the Markovian switching is to account for environment changes that cannot be modeled by Brownian type of noise, but rather displaying jump behavior. We use the Markov chain approximation methodology developed by Kushner and Dupuis [21] ; see also [29] . Motivated by recent developments in modeling of epidemics, we consider two different models. The first one is a direct extension of (1.1), in which the decision maker is able to control the recovery rate to some extent or isolate a part of the infected individuals. In the second model, we incorporate the vaccination into the formulation. That is, the decision maker is able to control the recovery rate and also to take a decision on the fraction of vaccinated individuals. Moreover, treating a general cost function, we take into account of any cost associated with the control and either the outbreak size or the infectious burden under the assumption that there are limited control resources.

The rest of the work is organized as follows. Section 2 begins with the problem formulation. Section 3 presents numerical algorithms based on the Markov chain approximation method. Section 4 focuses on a model with the incorporation of vaccination. In Section 5, we present several examples. Finally, the paper is concluded with some further remarks. To facilitate the reading, all proofs are placed in an appendix at the end of the paper in order not to interrupt the flow of presentation.

Inspired by the recent trend in modeling using regime-switching models, we use a continuoustime Markov chain to model environment changes that are not covered in the usual diffusion models. Assume throughout the paper that both the Markov chain α(t) and the scalar standard Brownian motion w(·) are defined on a complete filtered probability space (Ω, F , F (t), P), where {F (t)} is a filtration satisfying the usual condition (i.e., right continuous, increasing, and F (0) containing all the null sets).

Started with the classical deterministic SIS epidemic model given in (1.1), we normalize the population size to one by replacing S and I with S/N and I/N, respectively. We obtain

where S(t) and I(t) are the fraction of susceptible and infected individuals at time t, respectively. Throughout the paper, we use i and s to denote the state variables of I(t) and S(t), respectively, with i 0 and s 0 being the initial data. Taking into account the environmental noise, the system parameters µ, λ, γ may experience abrupt changes, which is modeled by a Markov chain α(·) as in [13] 

We suppose that α(·) is a continuous time Markov chain taking values in a finite set M = {1, 2, . . . , m 0 } generated by Λ = (Λ ιℓ ) m 0 ×m 0 . The long time behavior of (2.2) has been studied in [13] . A key parameter of the system is the average number of contacts per infected people per day λ α(t) , which can be perturbed by white noise; that is, λ α(t) → λ α(t) + σ α(t)ẇ (t) with w(·) being a scalar standard Brownian motion independent of α(·). Thus,

As in [11, 17, 19] , we suppose that the natural development of the disease can be influenced by a decision maker. In particular, the decision maker is able to control the magnitude of the recovery rate to some extent. Such a control can be performed by increasing the treatment capacity or the efficiency of medication. To be more specific, γ α(t) is the recovery rate without any action by the decision maker, while γ α(t) + C(t) is the recovery rate at time t if a control C(t) ≥ 0 is applied at t. As in [19] , C(t) can be interpreted as the product

is the proportion of the infected population treated at time t and ζ is the treatment effectiveness. We can also regard C(t) as per-capital rate of isolation, which has a direct effect only on infected individual. Thus, the model under consideration can be regarded as a standard isolation model; see [5, 14] . To indicate that we have limited resources to handle the epidemic, we suppose that C(·) takes values in a nonempty compact set U of [0, ∞), where 0 ∈ U. The controlled version is given by

Before proceeding further, we state the following result regarding the existence of a unique positive solution to (2.4).

Theorem 2.1. For any given initial value (s 0 , i 0 , ℓ 0 ) ∈ (0, 1) 2 × M satisfying s 0 + i 0 = 1 and C(t) ≡ c 0 ∈ U, equation (2.4) has a unique global solution (S(t), I(t), α(t)) ∈ (0, 1) 2 × M and S(t) + I(t) = 1 for any t ≥ 0 with probability one.

Given that I(t) + S(t) = 1, I(t), the fraction of infected individuals, obeys the stochastic Lotka-Volterra model with Markovian switching given by

As in [5, 14] , we choose the eradication time τ as the terminal time of planning (that is, time horizon). Formally, let ζ be a positive constant such that ξ < 1, then

To make the definition in (2.6) meaningful, we assume that the initial number of infected units I(0) is strictly greater than ξ. Thus, τ is the first time at which the state variable I drops to ξ. Let A i,ℓ denote the collection of all admissible controls with initial value (i, ℓ) ∈ (0, 1) × M. A strategy C(·) will be in A i,ℓ if C(t) is F (t)-adapted and C(t) ∈ U for any t ≥ 0. We suppose that increasing the recovery rate is costly and the treatment of the infected individuals also create additional costs. The cost is described by the bounded cost function where δ > 0 is the discounting factor and E i,ℓ denotes the expectation with respect to the probability law when the process (I(t), α(t)) starts with initial condition (i, ℓ). The goal is to minimize the cost functional and find an optimal strategy C * (·) such that

Formally, the associated Hamilton-Jacobi-Bellman equation of the underlying problem is given by

for all (i, ℓ) ∈ [ξ, 1] × M, with the boundary condition V (ξ, ℓ) = 0 for any ℓ ∈ M.

In the examples in Section 5, we will work on a specific cost function of the form

where a 0 , a 1 , and a 2 are the weighting factors, representing the cost per unit time of the components 1, i, ic 2 , respectively. In particular, τ 0 a 0 dt is the cost due to the time period needed for outbreak eradication, while τ 0 a 1 I(t)dt is the cost that infected individual creates for the society due to lost working hours and standard medical care, not including the treatment C(·). τ 0 a 2 I(t)C 2 (t)dt is the cost of treating infected individuals. We assume that the cost is proportional to the number of infected individuals, and the cost per each patient depends quadratically on the treatment effort C(t). As in [5] , if one is interested in minimizing the sum of the eradication time and the total epidemic size, one can use the cost function given by

where a 0 and a 1 are the weighting factors, representing the cost per unit time of epidemic duration and the cost of a single new infection, respectively.

Our standing assumptions are as follows.

(A) 1. The system parameters µ ℓ , γ ℓ , λ ℓ are all nonnegative for each ℓ ∈ M.

3. For any ℓ ∈ M, the cost function F (·, ℓ, ·) is bounded, continuous, and nonnegative.

Under assumption (A), for any initial value (i, ℓ) ∈ (0, 1) × M and C(t) ≡ c ∈ U, equation (2.5) has the unique global solution (I(·), α(·)) ∈ (0, 1) × M for all t ≥ 0; see Theorem 2.1. It follows from the boundedness of F (·) that V (i, ℓ) < ∞ for any (i, ℓ) ∈ (0, 1) × M. Although our work is motivated by the presence of a Markovian switching random environment, our results and the numerical schemes we developed can also be used to the corresponding control problems of diffusion models without switching. In particular, one can simply take M = {1}.

We are in a position to construct a numerical procedure for solving the optimal control problem.

Following the Markov chain approximation method in [21, 29] , we construct a controlled Markov chain in discrete time to approximate the controlled switching diffusions.

Let h > 0 be a discretization parameter for the continuous state variable. Define

Let {(I h n , α h n ) : n ∈ Z + } be a discrete-time controlled Markov chain with state space S h such that the controlled Markov chain well approximates the local behavior of the controlled diffusion I(t), α(t) . At any discrete-time step n, the magnitude of the control component C h n must be specified. The space of controls is U.

The sequence C h is said to be admissible if it satisfies the following conditions:

The collection of all admissible control sequences for initial state (i, ℓ) will be denoted by

we define a family of the interpolation intervals ∆t h (i, ℓ, c). The values of ∆t h (i, ℓ, c) will be specified later. Then we define

For (i, ℓ) ∈ S h and C h ∈ A h i,ℓ , the cost functional for the controlled Markov chain is defined as

The corresponding dynamic programming equation for the discrete approximation is given by

for any (i, ℓ) ∈ S h .

Let E h,c i,ℓ,n , Cov h,c i,ℓ,n denote the conditional expectation and covariance with given

Our objective is to define transition probabilities p h ((i, ℓ), (i ′ , ℓ ′ )|c) so that the controlled Markov chain {(I h n , α h n )} is locally consistent with the controlled switching diffusion (2.5) in the sense that the following conditions hold:

Using the procedure in Inspired by [21] , for (i, ℓ) ∈ S h , we define

,

where for a real number r, r + = max{r, 0}, 

To proceed, we construct a continuous-time interpolation of the approximating chain. For use in this construction, we define n h (t) = max{n : t h n ≤ t}, t ≥ 0. The discrete time processes associated with the controlled Markov chain {(I h n , α h n )} are defined as follows. Let

The piecewise constant interpolation processes, denoted by

are naturally defined as E|ε h 1 (t)| = 0 for T 0 ∈ (0, ∞).

Define τ h = t h η h . The cost functional from (3.2) can be rewritten as

For our analysis, it is more convenient to use the notion of relaxed controls. We first briefly recall the notion of "relaxed control", which arises naturally in the weak convergence analysis for the approximation to the optimal control problems. Given a relaxed control m(·), there is a probability measure m t (·) defined on the σ-algebra B(U) such that m(dcdt) = m t (dc)dt. With the given probability space, we say that m(·) is an admissible relaxed stochastic control for (w(·), α(·)) or (m(·), w(·), α(·)) is admissible, if (i) for each fixed t ≥ 0, m(t, ·) is a random variable taking values in R(U × [0, ∞)), and for each fixed ω, m(·, ω) is a deterministic relaxed control; (ii) the function defined by m(A × [0, t]) is F (t)-adapted for any A ∈ B(U). As a result, with probability one, there is a measure m t (·, ω) on the Borel σ-algebra B(U) such that m(dcdt) = m t (dc)dt. Note that for a sequence of ordinary controls C h = {C h n : n ∈ Z + }, the associated relaxed control m h (dcdt) belongs to R(U × [0, ∞)). Note also that the limits of the "relaxed control representations" of the ordinary controls might not be ordinary controls, but only relaxed controls.

With the notion of relaxed control given above, we can write (3.10) as

Note also that the value function defined in (2.8) can be rewritten as 

The proof of the next lemma can be obtained similar to that of [32, Theorem 3.1].

Lemma 3.4. The process {α h (·)} converges weakly to α(·), which is a Markov chain with generator Λ = (Λ ιℓ ).

The following theorems establish the convergence of the approximating Markov chain to the original controlled switching diffusion as well as the convergence of the value functions. To facilitate the reading, all proofs are placed in an appendix at the end of the paper. (3.13) (d) The limit processes satisfy

Our main convergence result is given below. 

Building on our approach of the SIS models, we consider a SIS epidemic model with vaccination proposed in [15, 22] ; see also [33, 34] for closely related models. We assume that S(t), I(t), and V(t) are the fraction of infectious, susceptible, and vaccinated individuals at time t, respectively. The evolution of (S(·), I(·), V(·)) is given by

where q(·) is the fraction of vaccinated for newborns, p(·) is the proportional coefficient of vaccinated for the susceptible, ε is the rate of losing their immunity for vaccinated individuals.

The other parameters are understood as in the preceding sections; that is, µ and γ −1 are the average death rate and the average infectious period respectively, λ is the disease contact rate of an infective individual. Note that if V(0) = p(t) = q(t) = 0 for t ≥ 0, then V(t) = 0 for any t ≥ 0 and the system (4.1) reduces to (2.1).

Using the same methods as in the preceding sections, we obtain the following controlled SIS system under regime switching in random environment

−σ α(t) S(t)I(t)dw(t), dI(t) = λ α(t) S(t)I(t) − µ α(t) + γ α(t) + C(t) I(t) dt + σ α(t) S(t)I(t)dw(t), dV(t) = µ α(t) q(t) + p(t)S(t) − (µ α(t) + ε α(t) )V(t) dt, (4.2) where α(·) is a Markov chain taking values in M = {1, 2, . . . , m 0 } and w(·) is a scalar standard Brownian motion independent of α(·). We suppose that C(·) takes values in a nonempty compact set U of [0, ∞). Suppose V p and V q are nonempty compact subsets of [0, 1]. Moreover, p(·) and q(·) takes the values in V p and V q , respectively. Compared to the related formulations in [5, 14] , we also consider the possibility that vaccinated individuals lose their immunity. We suppose that the initial value is (s 0 , i 0 , v 0 ) ∈ (0, 1) 2 ×[0, 1) satisfying s 0 + i 0 + v 0 = 1. Thus, we exclude the trivial case i 0 = 0 or unrealistic cases such as i 0 = 1 and s 0 = 0. A fundamental result on the existence of a unique global solution of (4.2) is given below. 

and q(t) ≡ q 0 ∈ V q , the equation (4.2) has a unique global solution (S(t), I(t), V(t), α(t)) ∈ (0, 1) 2 × [0, 1) × M and S(t) + I(t) + V(t) = 1 for any t ≥ 0 with probability one.

Since S(t) + I(t) + V(t) = 1 for any t ≥ 0, we need only consider the last two equations; that is,

Let A i,v,ℓ denote the collection of all admissible controls with initial value (i, v, ℓ) ∈ (0, 1) × [0, 1) × M satisfying 0 < i + v ≤ 1; that is,

Then Theorem 4.1 indicates that A i,v,ℓ = ∅. The control component is (C(·), p(·), q(·)). A strategy (C(·), p(·), q(·)) will be in A i,v,ℓ if C(t), p(t), q(t) is F (t)-adapted and

For a control strategy (C(·), p(·), q(·)) ∈ A i,v,α , we define the cost functional as

where δ > 0 is the discounting factor and E i,v,ℓ denotes the expectation with respect to the probability law when the process (I(t), V(t), α(t)) starts with initial condition (i, v, ℓ). The goal is to minimize the cost functional and find an optimal strategy (C * (·), p * (·), q * (·)) such that J(i, v, ℓ, C * (·), p * (·), q * (·)) = V (i, v, ℓ) := inf (C(·),p(·),q(·))∈A i,v,α J(i, v, ℓ, C(·), p(·), q(·)). (4.5)

We use the same method as in the preceding section to construct a discrete-time controlled Markov chain {(I h n , V h n , α h n ) : n ∈ Z + } with the state space

At any discrete-time step n, the magnitude of the control component (C h n , p h n , q h n ) must be specified. The space of controls is

is said to be admissible if it satisfies the following conditions:

(c) (I h n , V h n , α h n ) ∈ T h for all n ∈ Z + . The class of all admissible control sequences (C h , p h , q h ) for initial state (i, v, ℓ) will be denoted by A h i,v,ℓ . We need to define the transition probabilities p h ((i, v, ℓ), (i ′ , v ′ , ℓ ′ )|c, p, q) so that the controlled Markov chain {(I h n , V h n , α h n ) : n ∈ Z + } is locally consistent with respect to the controlled diffusion (4.2). To proceed, we denote

In particular, for (i, v, ℓ) ∈ T h and (c, p, q) ∈ U × V p × V q , we define

, ∆t h (i, v, ℓ, c, p, q) = h 2 Q h (i, v, ℓ, c, p, q) .

(4.7) For (i, v, ℓ) ∈ T h and (C h , p h , q h ) ∈ A h i,v,ℓ , the cost functional for the controlled Markov chain is given by

The value function is

The main convergence result in this case is given below.

be the value functions defined in (4.5) and (4.9), respectively.

Throughout this section, we suppose the discounting factor is δ = 0.05. Also, we work with M = {1, 2} and ξ = 0.02.

Example 5.1. We start with the model given by (2.5) . That is,

For (i, ℓ) ∈ (ξ, 1] × M, we take the initial control C 0 (i, ℓ) ≡ max U and set the initial value

We outline how to find the sequence V h n (·) as follows. For each (i, ℓ) ∈ S h and control c ∈ U, we compute

Note also that if i ≤ ξ, then V h n (i, ℓ) = 0 for any ℓ and n. We choose the control C h n+1 (i, ℓ) and record an improved value V h n+1 (i, ℓ) by

The iterations stop as soon as the increment V h n (·)−V h n+1 (·) reaches a predetermined tolerance level. We set the error tolerance to be 10 −8 . Our specific example is motivated by [13, Example 6.2.1]. The system parameters are given by

Suppose that the generator Γ of the Markov chain α(·) is given by

The set of controls is given by U = {k/5 : 0 ≤ k ≤ 15}. Thus, there are 16 control levels.

In the first example, we consider the cost function F (i, ℓ, c) = 1 + ℓi + ℓic 2 ; (5.2) see Remark 2.1 for its interpretation. The value function and optimal control is shown in Figure 1 . In each regime ℓ, there is a level L(ℓ) such that the optimal control is the maximum control effort for i < L(ℓ). It appears that the value function in regime 2 is higher than that in regime 1, while the optimal control in regime 2 is strictly smaller than that in regime 1 possibly because of the higher cost function in regime 2; that is F (i, 2, c) > F (i, 1, c) for any (i, c) ∈ (0, 1) × U.

In the second experiment, we consider the cost function

aiming at minimizing the sum of the eradication time and the total epidemic size. The value function and optimal control is shown in Figure 2 . The optimal control in each regime has a similar shape as that in the preceding experiment. Although F (i, 2, c) > F (i, 1, c) for any (i, c) ∈ (0, 1) × U, the optimal control for i ∈ (0.4, 0.6) in regime 2 is higher than that in regime 1. The value function and optimal control is shown in Figure 3 . The results in Figure 3 tells us that the control cost is small enough so that we should apply the maximum possible control in any regime.

Example 5.2. We consider the model with vaccination given by (4.2) ; that is,

We use the same parameters and the control set as in the preceding exam. In addition, F (i, ℓ, c, p, q) = 1 + ℓi + 2ℓic 2 + (0.1)p + (0.1)q,

We take the initial control 

We outline how to find the sequence of costs of V n (·) as follows. For each (i, v, ℓ) ∈ T h , and control (c, p, q) ∈ U × V p × V q , we compute

+F (i, ℓ, c, p, q)∆t h (i, v, ℓ, c, p, q).

. The iterations stop as soon as the increment V h n+1 (·) − V h n (·) reaches the tolerance level. Intuitively, the shape of the optimal control will depend on how large the coefficients of the cost function are. The value function and optimal control are displayed in Figures 4, 5, 6 , and 7.

It can be seen that with the presence of an effective vaccination, the value function shown in Figure 4 is much lower than that of Figure 1 in Example 1. Moreover, Figure 5 reveals that one should use a lower control C(·) compared to the case with no vaccination. Figures  6 and 7 provide the optimal vaccination plan. It appears that the optimal vaccination plan depends on the regime of the environment. The numerical studies emphasize the importance of effective vaccines with low costs in controlling epidemics. 

This paper focused on numerical methods for optimal control of SIS epidemic models. We considered two models incorporating treatments, isolation, and vaccination. Using the Markov chain approximation method, we are able to treat a hybrid model affected by two types of environmental fluctuations with a general cost consideration. The convergence of Figure 7 : The optimal control q(·) in regime 1 (left) and in regime 2 (right) the algorithms was proved. Several numerical examples were used to demonstrate the performance of our algorithm. Some interesting questions deserve further investigation. One can study a set of mixed controls consisting of vaccination, isolation, and social distancing, etc. One can also apply the approach in this work to investigate a variety of control problems of epidemic models, which becomes an urgent issue due to COVID-19. Thus, the techniques and the simulation study might be of interests to researchers in various disciplines.

Proof of Theorem 2.1. The proof is a special case of that of Theorem 4.1 given below when we take V(0) = p(t) = q(t) = 0 for any t ≥ 0 with probability one.

Proof of Theorem 4.1. Since the coefficients of (4.2) are locally Lipschitz continuous, there is a unique local solution (I(t), V(t), α(t)) on t ∈ [0, ζ), where ζ is the explosion time. Now, by (4.2), S + I + V satisfies

Let k 0 be a sufficiently large positive integer such that s 0 , i 0 ∈ (1/k 0 , 1). For each k ≥ k 0 , we define

for some positive constant K independent of (s, i, v, ℓ). By Dynkin's formla, we obtain for k ≥ k 1 and t ∈ [0, T ] that EΦ (S(τ k ∧ t), I(τ k ∧ t)) = Φ (s 0 , i 0 ) + E Note that for each ω ∈ {τ k ≤ T }, I(τ k (ω)) ≤ 1 k or S(τ k (ω)) ≤ 1 k . It follows from the definition of Φ(·, ·) that Φ S(τ k (ω)), I(τ k (ω)) ≥ k. In view of (A.3) and (A.6), we obtain Φ(s 0 , i 0 )e KT ≥ kε, leading to a contradiction as k → ∞. Thus, ζ = τ ∞ = ∞ with probability one. The conclusion follows.

The proofs below are motivated by [21, 29] . Let D[0, ∞) denote the space of functions that are right continuous and have left-hand limits endowed with the Skorohod topology. All the weak convergence analysis will be on this space or its k-fold products D k [0, ∞) for appropriate k. We will provide a sketch of the proofs and refer to the well-known references for the details.

Proof of Theorem 3.5. (a) The tightness of {α h (·)} is obvious by Lemma 3.4. For other components, we use the tightness criteria in [20, p. 47 ]. Specifically, a sufficient condition for tightness of a sequence of processes ζ h (·) with paths in D k [0, ∞) is that for any T 0 , ρ ∈ (0, ∞),

The detailed proof of the tightness is standard; see [29] . Thus, H h (·) is tight. As a result, a subsequence of H h (·) converges weakly to the limit H(·) = I(·), α(·), B(·), M(·), m(·), τ . Since the sizes of jumps of I h (·), B h (·), and M h (·) go to zero as h → 0, then I(·), B(·), and M(·) have continuous paths with probability one. E|ε h 3 (t)| = 0 for T 0 ∈ (0, ∞). By the Burkholder-Gundy inequality, there is a positive constant K such that E|M h (t)| 2 ≤ KE t 0 a I h (u), α h (u) du + ε h 3 (t) ≤ K(t + 1), for any h > 0. Thus, the family {M h (t) : h > 0} is uniformly integrable. For any ρ > 0,

a(I h (u), α h (u))du + ε h 4 (ρ), (A.7) where E|ε h 4 (ρ)| → 0 as h → 0. To characterize M h (·), let r be an arbitrary integer, t > 0, ρ > 0 and {t k : k ≤ r} be such that t k ≤ t < t + ρ for each k. Let Ψ(·) be a real-valued and continuous function of its arguments with compact support. Then in view of (A.7), we have Proof of Theorem 3.6 and Theorem 4.2. The proofs are modifications of that for [29, Theorem 7] . Hence, we omit the details for brevity.

Infectious diseases of humans: dynamics and control

Optimal control of deterministic epidemics

Optimal immunity control by social distancing for the SIR epidemic model

Mathematical epidemiology: Past, present, and future. Infectious Disease Modelling

Optimal control of epidemic size and duration with limited resources

Time-optimal control strategies in SIR epidemic models

Optimal vaccination and treatment of an epidemic network model

An explicit optimal isolation policy for a deterministic epidemic model

Classification of asymptotic behavior in a stochastic SIR model

Contact rate epidemic control of COVID-19: an equilibrium view

Optimal control and the value of information for a stochastic epidemiological SIS-model

A stochastic differential equation SIS epidemic model

The SIS epidemic model with Markovian switching

Optimal control of epidemics with limited resources

Global analysis of SIS epidemic models with variable total population size

Brownian Motion and Stochastic Calculus

Stochastic contagion models without immunity: their long term behaviour and the optimal level of treatment, CEJOR Cent

Containing papers of a mathematical and physical character

Stochastic epidemic metapopulation models on networks: SIS dynamics and control strategies

Approximation and weak convergence methods for random processes

Numerical methods for stochastic control problems in continuous time

Qualitative analyses of SIS epidemic model with vaccination and varying total population size

Stability analysis for SIS epidemic models with vaccination and constant population size

Epidemic control via stochastic optimal control

On the optimal control of a deterministic epidemic

Long-term analysis of a stochastic SIRS model with general incidence rates

Stochastic partial differential equation SIS epidemic models: modeling and analysis

Optimal vaccine allocation to control epidemic outbreaks in arbitrary networks

Numerical methods for controlled regime-switching diffusions and regime-switching jump diffusions

Extinction and permanence in a stochastic SIRS model in regime-switching with general incidence rate

Hybrid Switching Diffusions: Properties and Applications

Discrete-time singularly perturbed Markov chains: aggregation, occupation measures, and switching diffusion limit

Stationary distribution of stochastic SIS epidemic model with vaccination under regime switching

The threshold of a stochastic SIS epidemic model with vaccination

Clearly the sequence {τ k } is monotonically increasing. Let τ ∞ := lim l→∞ τ k . Then τ ∞ ≤ ζ. It suffices to show that τ ∞ = ∞ with probability one. If this were false, there would exist a T > 0 and ε > 0 such that P {τ ∞ ≤ T } > ε. Therefore we can find some k 1 ≥ k 0 such thatWe fix ω ∈ Ω. For t ∈ [0, τ ∞ ), by (A.1) and (A.2), I(t) ∈ (0, 1]. We have from the third equation in (4.2) that To proceed, we consider the functionand defineNote that for our study, it suffices to consider Φ to be independent of the switching states. We have Φ(s, i) > 0 and