key: cord-0462181-c4lzdwya
authors: Elokda, Ezzat; Bolognani, Saverio; Hota, Ashish R.
title: A Dynamic Population Model of Strategic Interaction and Migration under Epidemic Risk
date: 2021-09-07
journal: nan
DOI: nan
sha: 55d11208fe712e8ac9a151a007450000c3bb8d4a
doc_id: 462181
cord_uid: c4lzdwya

In this paper, we show how a dynamic population game can model the strategic interaction and migration decisions made by a large population of agents in response to epidemic prevalence. Specifically, we consider a modified susceptible-asymptomatic-infected-recovered (SAIR) epidemic model over multiple zones. Agents choose whether to activate (i.e., interact with others), how many other agents to interact with, and which zone to move to in a time-scale which is comparable with the epidemic evolution. We define and analyze the notion of equilibrium in this game, and investigate the transient behavior of the epidemic spread in a range of numerical case studies, providing insights on the effects of the agents' degree of future awareness, strategic migration decisions, as well as different levels of lockdown and other interventions. One of our key findings is that the strategic behavior of agents plays an important role in the progression of the epidemic and can be exploited in order to design suitable epidemic control measures.

Infectious diseases or epidemics spread through society by exploiting social interactions. As the disease becomes more prevalent, individuals reduce their social interaction and even migrate to safer locations in a strategic and nonmyopic manner [1] , [2] , which plays a significant role in epidemic evolution. Accordingly, past work has explored decentralized or game-theoretic protection strategies against epidemics on (static) networks [3] - [7] . More recently, evolution of network topology and epidemic states in a comparable time-scale have been studied in the framework of activitydriven networks [8] , [9] . Game-theoretic decision-making in this framework was recently studied in [10] where myopic bounded rational agents decide whether to activate or not as a function of the epidemic prevalence.

To the best of our knowledge, there have been few rigorous game-theoretic formulations that model agents that • decide their degree of activation, and consequently influence the resulting network topology,

• decide whether to migrate to different locations, and • maximize both current and long-run future pay-off in the same time-scale as epidemic evolution. In this work, we present a framework to address the above research gap.

Motivated by the presence of asymptomatic carriers in COVID-19 [11] , [12] , we build upon the SAIR epidemic model studied in [13] , [14] . We consider a large population E. Elokda The research leading to these results was partly supported by the Swiss National Science Foundation (SNSF) via the NCCR Automation. regime where the state of an individual agent is characterized by its infection state and its location (or zone). At discrete time instants, each agent decides its degree of activation and its next location. The agent is then paired randomly with other agents, and its infection state evolves following an augmented SAIR epidemic model which also takes into account unknowingly recovered agents as described in Section II. Agents maximize a discounted infinite horizon expected reward which is a function of the aggregate infection state, the zonal distribution of the agents, and the policy followed by the population.

In a departure from the conventional assumption of a static population distribution in the classical population game setting [15] , epidemic evolution leads to a dynamically evolving population which makes the analysis challenging. We utilize the recently developed framework of dynamic population games [16] , in which the authors show a reduction of the dynamic setting to a static population game setting [15] . This simplifies the analysis in comparison to the existing approaches such as anonymous sequential games [17] , [18] and mean field games [19] - [21] , and is particularly useful for epidemic models. As a consequence of the reduction, standard evolutionary models [15] can be adapted for the coupled dynamics of the agents' states and strategic decision making, which evolve on the same time-scale.

The paper is structured as follows: the dynamic population model is presented in Section II, and its stationary equilibria are analyzed in Section III. The evolutionary update of agents' policies is modeled in Section IV. Numerical experiments reported in Section V provide compelling insights into agents' behavior, effects of lockdown measures and strategic mobility patterns. For instance, we observe that if recovered agents are exempt from lockdown measures, then an increased level of activity by susceptible and asymptomatic agents can happen without having much impact on the peak and total infections. Their strategic behavior does not lead to a higher infection level and the social welfare improves due to overall higher activity levels.

We consider a homogeneous population of non-atomic agents or individuals. The state and the dynamics of this population are described by the following elements.

We augment the SAIR epidemic model to distinguish between recovered agents who are aware of being recovered and those who are recovered, but unaware of ever being infected. Specifically, each agent can be in one of the following infection states: Susceptible (S), infected Asymptomatic (A), Infected symptomatic (I), Recovered (R), Unknowingly recovered (U) (agents that have recovered without showing symptoms). Agents in states R and U are immune from further infection. Moreover, each agent resides in one of Z zones or locations. Formally, we define the state of each agent as (s, z) ∈ S × Z, where S = { S, A, I, R, U } and

is the space of probability distributions supported in X. We write d[s, z] to denote the proportion of agents with infection state s residing in zone z.

We consider a dynamic environment that evolves in discrete-time (e.g., each time interval representing a day). At each time step, each agent strategically chooses:

• its activation degree a ∈ A = { 0, 1, . . . , a max }, which denotes the number of other agents it chooses to interact with (a = 0 signifies no activation), and

• the zonez ∈ Z where to move for the next day. The combined action is denoted (a,z) ∈ A × Z. A (Markovian) policy is denoted by π : S × Z → ∆(A × Z), and it maps an agent's state (s, z) ∈ S ×Z to a randomization over the actions (a,z) ∈ A × Z. The set of all possible policies is denoted by Π. Explicitly, π[a,z | s, z] is the probability that an agent plays (a,z) when in state (s, z). All agents are homogeneous and follow the same policy π. Further, agents that have never shown symptoms act in the same way, i.e. π[· | S, z] = π[· | A, z] = π[· | U, z]. Note that π is time-varying; agents change their strategies as the epidemic unfolds. The dynamics of π are detailed in Section IV.

We now derive a dynamic model of the evolution of state distribution d when the agents adopt a policy π. We denote the policy-state pair (π, d) as the social state. The state of each agent changes at every time step according to transition probabilities encoded by the stochastic matrix

where p[s + , z + | s, z, a,z](π, d) denotes the probability distribution over the next state when an agent in infection state s and zone z chooses action (a,z) in social state (π, d).

Note that the Markov chain P [s + , z + | s, z](π, d) is not time-homogeneous as the social state (π, d) is time-varying. The state transition function is defined as p[s + , z + | s, z, a,z](π, d)

The zone transition probabilities are assumed to be independent of (π, d) and given by

In order to derive the infection state transition probabilities P[s + | s, z, a](π, d) (which are schematically represented in Figure 1 ), we combine the transition rules of the (augmented) SAIR model with the specific activation actions as follows.

• At a given time, an agent in state s in zone z chooses its activation degree a according to policy π. Then, it is paired randomly with up to a other individuals in zone z with the probability of being connected with another agent being proportional to the activation degree of the target agent (analogous to the configuration model [22] ).

• The agent could also fail to pair with one or more of the a other individuals. This occurs with increasing probability as the total amount of activity in the zone (to be defined hereafter) is low 1 .

• Once the network is formed, a susceptible agent becomes asymptomatically infected with probability β A ∈ [0, 1] per each asymptomatic neighbor and with probability β I ∈ [0, 1] per each infected neighbor.

• An asymptomatic agent becomes (symptomatically) infected 2 with probability δ I A ∈ (0, 1], and recovers without being aware of it with probability δ U A ∈ [0, 1]. • An infected agent recovers with probability δ R I ∈ (0, 1]. • An individual in state U becomes aware of its recovery with probability δ R U ∈ [0, 1] (for example via serological tests on the population).

• The network thus formed gets discarded at the next time step and the process repeats.

Note that with the exception of the transition from state S to A, all other state transition probabilities are defined via exogenous parameters and do not depend on the social state. In order to compute the transition probability from S to A, we define the total amount or mass of activity in zone z as

which is determined by the mass of active agents and their degrees of activation. Similarly, the mass of activity by asymptomatic and symptomatic agents in zone z are

In order to consider the event of failing to pair with an agent when the amount of activity in the zone e z (π, d) is low, we introduce a small constant amount > 0 of fictitious activation that does not belong to any of the agents. Consequently in zone z, the probability of not interacting with any agent, the probability of a randomly chosen agent being asymptomatic and the probability of a randomly chosen agent Fig. 1 : Infection state transition diagram. Self loops are not shown.

being symptomatic are, respectively,

Note that for a given , the probability of encountering an infected agent (symptomatically or not) goes to zero as the amount of infections goes to zero, as desired. As a result, the probability of a susceptible agent to not get infected upon activation with degree a is

It is easy to see that when a susceptible agent does not interact with any other agent (i.e., a = 0), it remains susceptible. When it participates in exactly one interaction (a = 1) in z, the probability that its neighbor is asymptomatic

. When it draws a > 0 independent agents to interact with, it must not get infected in any of the interactions to remain susceptible, and this occurs with the specified probability. As a consequence, we have

The remaining transition probabilities follow directly as:

where we have suppressed (z, a, π, d) for better readability. These expressions completely specify P[s + | s, z, a](π, d) ( Figure 1 ) and therefore the state transition function (2).

Each agent's own state (s, z) and action (a,z) yields an immediate reward for the agent, composed of a reward r act [s, z, a] for their activation decision, a reward r mig [s, z,z] for their migration decision, and a reward r dis [s] for how the agent's health is affected by the disease. Formally,

The activation reward is defined as

where o[a] ∈ R + denotes the social benefit of interacting with a other agents and is assumed to be non-decreasing in a with o[0] = 0, and c[s, z, a] ∈ R + denotes the cost imposed by the authority in zone z to discourage activity. We assume that c[s, z, a] are non-decreasing in a and satisfy

element-wise, since lockdown measures can be more stringent against individuals showing symptoms and more benign for individuals who are known to be immune.

The migration reward encodes the non-negative cost of migrating to a new zone. In this work, we define

However, one may consider a richer cost function that incorporates specific travel restrictions between zones, etc. The third term in (5) encodes the cost of being ill:

otherwise.

We now introduce the strategic decision-making process of the agents. The immediate expected reward of an agent in state (s, z) when it follows policy π is

with r[s, z, a,z] as defined in (5). The expected discounted infinite horizon reward of an agent in state (s, z) with discount factor α ∈ [0, 1) following the homogeneous policy π is recursively defined as

or, equivalently in vector form,

Equation (6) is the well-known Bellman equation. Note that it is continuous in the social state (π, d), as I − α P (π, d) is guaranteed to be invertible for α ∈ [0, 1).

While an agent can compute the expected discounted reward V (π, d) at a given social state (π, d), the policy π may not be optimal for the agent. Thus, we assume that each agent chooses its current action (a,z) in order to maximize Q[s, z, a,z](π, d) := r[s, z, a,z]

i.e., the agent is aware of the immediate reward and the effect of its action on their future state; however, it assesses the future reward based on a stationarity assumption on the social state (π, d). In other words, the agent chooses its action to maximize a single-stage deviation from the homogeneous policy π [23, Section 2.7], and assumes that its own actions are not going to affect the social state significantly.

We start by introducing the notion of best response based on the single-stage deviation reward defined in (7).

Definition 1 (Best Response). The best response of an agent in state (s, z) at the social state (π, d) is the set valued correspondence B s,z : Π × D ⇒ ∆(A × Z) given by

The above notion of best response is from the perspective of an individual agent in state (s, z) when all other agents are following the homogeneous policy π and their states are distributed as per d. The agent will choose any distribution σ over the actions which maximizes its expected single-stage deviation reward Q at the current state (s, z).

Consequently, a social state (π, d) is stationary when agents in all states are playing their best response when they follow the policy π, and the state distribution d is stationary under this policy. Thus, we have the following definition of a stationary equilibrium.

Definition 2 (Stationary equilibrium). A stationary equilibrium is a social state (π, d) ∈ Π × D which satisfies

Thus, at the equilibrium, the state Markov chain P (π, d) (1) is time-homogeneous, and the agents behave optimally in the corresponding Markov decision process [23] . Note that we denote stationary equilibria with boldface notation. When (π, d) is a stationary equilibrium, π corresponds to the Nash equilibrium under state distribution d.

Theorem 1 (Theorem 1 in [16] ). A stationary equilibrium (π, d) for the proposed dynamic population game is guaranteed to exist.

We refer to [16] for the details of the proof, which relies on a fixed-point argument. There, a general dynamic population game is considered in which the state and action spaces are finite, and the state transition and reward functions are continuous in the social state (π, d). By definition, our model is an instance of such a dynamic population game.

The following proposition shows that the stationary distribution d does not have any asymptomatic or infected agents (i.e., eventually, the epidemic dies out). The final stationary distribution is however not unique. 

Then, any social state (π, d) satisfying

is a stationary equilibrium.

The proof is presented in the extended version [24] . Proposition 1 shows how stationary equilibria can be computed without solving a fixed-point problem, and directly identifies some dominant strategies for the agents. The identification of dominant strategies is insightful for the design of interventions (e.g., lockdown measure for the different compartments). For example, it is possible to verify that

corresponds to the dominant activation strategies for agents in R (knowingly immune agents). Notice that the activation caused by immune agents appears in the denominator of the probability that a generic agent interacts with an infectious agent -see (4) -and therefore looser lockdown measures for recovered agents can be used to reduce the spreading as shown in simulations in Section V.

Different stationary equilibria correspond to drastically different outcomes in terms of the impact of the epidemic on the population. For this reason, we investigate the transient behavior leading to the equilibrium, which is determined by the state dynamics from Section II-C, i.e., d + = d P (π, d), and the way in which agents update their policies. For this second part, we get inspiration from the evolutionary dynamic models in classical population games [15] , and more precisely from the perturbed best response dynamics.

We assume that the agents are not perfectly rational, with the bounded rationality factor λ ∈ [0, ∞). When they are making a decision on which action to play, they follow the logit choice function [15, Section 6.2], given bỹ

.

For λ = 0, it results in a uniform distribution over all the actions. At the limit λ → ∞, we recover the perfect best response. At finite value of λ, it assigns higher probabilities to actions with higher payoffs. In order to model the fact that agents update their policies gradually, we consider the discrete-time update

where η ∈ (0, 1] is a parameter that controls the rate of policy change: for η < 1, agents have inertia in their decision making, while for η = 1 agents promptly update their action decision to the perturbed best responseπ. Note that this update model leads to a perturbed version of the equilibrium policy π at the rest points, rather than the exact policy [15] .

We present a select number of case studies to showcase • the effect of agents' strategic activation decisions on the spread of the epidemic,

• the impact of lockdown measures on both epidemic containment and social welfare, and

• the effect of strategic migration decisions on how the epidemic spreads across multiple locations. For this purpose, we consider an infectious epidemic characterized by β A = β I = 0.2, δ I A = δ U A = 0.08, and δ R I = 0.04. The agents can activate up to degree a max = 6, and the activation reward is linear in the activation degree, with a unit reward for maximum activation o[a max ] = 1. The illness is quite severe, with a discomfort cost c dis = 10. Initially, we let the agents choose an activation degree uniformly at random and do not plan any move. The agents are highly rational (λ = 10) and unless otherwise stated, we consider that they update their decisions with an inertia η = 0.2. We consider both a single zone and a two zone setting, and denote the zones by Z1 and Z2. In all cases, the epidemic starts in Z1 with a proportion of 2% of that zone's population asymptomatic (A), and 1% infected (I).

We further consider that authorities can enforce lockdown regulations through the parameter a lock [s, z], which represents the maximum allowed activation degree and can differ between zones and for agents in different infection states. Lockdown is implemented by setting c[s, z, a] = 0 if a ≤ a lock [s, z], and c[s, z, a] = 3o[a] otherwise. Regardless of the lockdown measures, we always assume that the discomfort of the illness is sufficient to prevent symptomatically infected agents from activating. As a consequence, the main threat of the epidemic is due to the presence of asymptomatically infected agents in the population.

We first investigate the single zone scenario, with a focus on how agents of different cognitive ability react under various lockdown measures, and the resulting effects on the epidemic spread. Figure 2 shows an example with lockdown degree a lock = 2 and three cases. In cases (2a) and (2b), the lockdown is enforced on the whole population, and the cases differ in the agents' cognitive level of the future. In (2a), the agents are completely myopic. Notice how they simply adhere to the lockdown degree 3 , which aligns with standard epidemic models. Farsighted agents (2b), on the other hand, actively adjust their activation decisions in response to the epidemic threat, and volunteer to limit their activity beyond the lockdown requirement at peak infection times. The reduction in activity levels leads to a less severe epidemic spread, with a smaller total and peak infection.

Case (2c) also considers farsighted agents, but this time recovered agents are exempt from lockdown and thus activate at the maximum degree, as per their dominant strategy. This leads to a significant reduction of the total amount of infections. In fact, due the prevalence of activity of immune agents, it becomes less likely for a susceptible agent to encounter an infected agent. Consequently, susceptible, asymptomatic and unknowingly recovered agents too increase their level of activation. Nevertheless, the peak infection remains being largely unchanged.

This insight is further explored in Figure 3 , which shows the effect of different lockdown measures on three main performance indices: the total infections (R + U at the end of the epidemic), the peak infections (highest value of I 4 ), and the average welfare (mean reward (5) in the population over the duration of the epidemic).

We depict the same cases considered in Figure 2 and perform a parameter sweep over the strictness of the lockdown a lock . Additionally, we showcase the effect of performing serological tests to increase the amount of knowingly immune agents, at rate δ R U = 0.05. We observe that myopic agents perform poorly along all the performance metrics. For farsighted agents, the exemption of recovered agents and serological tests lead to significant improvements in the average welfare because the knowingly immune agents are able to achieve their maximum activation degree without increasing the threat of the epidemic.

We showcase the effect of strategic migration in a setting with two zones, with zone Z1 initially holding 90% of the total population, and zone Z2 initially infection free. In both zones, (knowingly) recovered agents are exempt from lockdown, and the lockdown restrictions for the other agents are different in the zones. Namely, Z1 has a looser lockdown, with a maximum allowed activation degree of 4, whereas Z2 only allows a maximum of 2. The migration cost is c mig = 2, and the agents are farsighted with α = 0.9. Note that with these parameters, susceptible agents will want to move to Z1 when the epidemic is not prevalent, as per Proposition 1 5 . The inertia in the policy update is η = 0.1.

Additionally, both zones perform serological testing at rate δ R U = 0.01. The resulting epidemic spread is displayed in Figure 4 . First, notice how the proportion of unknowingly recovered agents decays, in contrast to Figure 2 , in which no serological testing is performed.

We now focus on the strategic migration behavior of agents who are either susceptible or think they are (S, A, U). Note that symptomatically infected and recovered agents never move since they are immune to the threat of the epidemic, and the activation costs are the same for them in both zones. Initially, the epidemic risk is still small in Z1, and the occupants of Z2 start moving there to benefit from the more lenient lockdown measure. This trend soon reverses, however, with the rise of infections in Z1: the strategic agents elect to move to the zone with stricter lockdown to escape the epidemic risk. Since a proportion of the movers are asymptomatically infected, this leads to an outbreak of the epidemic in Z2 as well, with lower, but significant, peak infections than Z1. Eventually, once the infections in Z1 has decreased sufficiently (at approximately day 50), some Z2 residents move to Z1 again, initiating a second wave of infections in Z1. At the end of the epidemic, all the remaining agents move from Z2 as per their dominant strategy.

In this paper, we propose a model of strategic behavior at both individual and societal level based on first princi-ples, demonstrate its potential to explain complex activation and migration patterns, and to guide the design of more effective epidemic control measures. We characterize the stationary equilibria in the proposed dynamic population game setting and illustrate how a better understanding of the emergent behavior can be leveraged to design effective mitigation strategies. For instance, we show that withdrawing restrictions on recovered agents leads to higher levels of activity by susceptible agents without increasing the peak and total infection levels. This observation provides a rigorous justification for conducting large-scale serological testing and letting individuals with immunity to interact freely for significantly improving the welfare of the society. A natural extension of this model includes considering the behavior of vaccinated individuals and designing optimal intervention strategies that incorporates the strategic response of agents.

The first property follows the fact that states A, I, and U are transient in the infection state Markov chain (see Figure 1 ), and evolve independently of the agents' migrations. We omit the formal steps, which follow standard Markov chain theory arguments.

We now observe that, given a distribution that is only supported on the two compartments S and R, no transition between compartments of the extended SAIR model is possible. The reward for agents in both these compartments then becomes independent from the actions of others, and the set A * s,z defines their dominant activation strategies. We now consider migration strategies of the agents at the equilibrium (π, d). We first consider an agent in infection state s ∈ {S, R} residing in zone z ∈Z s . In this zone, the activation reward is maximum among all other zones for the agent in state s. Since the infection state remains unchanged, and migration is costly (c mig > 0), migrating to a different zone does not lead to a beneficial single-stage deviation for the agent.

We now consider an agent in zone z ∈ Z n s := Z \ {Z s ∪ Z 0 s }. According to policy π, the agent does not migrate to a different zone. Consequently, the value function for the agent is In other words, an agent in a zone in Z n s does not find it beneficial to migrate anywhere else in a single-stage deviation.

It remains to show that an agent in zone z ∈ Z 0 s finds it beneficial to move to a zone inZ s and consequently, we must have d[s, z] = 0 for z ∈ Z 0 s . Under policy π, we have π[a / ∈ A * s,z , z / ∈Z s |s, z] = 0.

Consequently, the value function for the agent is

in other words, the policy π yields a higher value compared to any policy that does not include migration. Similarly, one can show that migrating to a zone in Z n s instead does not yield a beneficial single-stage deviation since r * act [s, z ] < r act [s] for any zone z ∈ Z n s .

More Americans are leaving cities, but don't call it an urban exodus

Pandemic population change across metro America: Accelerated migration, less immigration, fewer births and more deaths

Decentralized protection strategies against SIS epidemics in networks

Game-theoretic vaccination against networked SIS epidemics and impacts of human decision-making

Networked SIS epidemics with awareness

Disease dynamics on a network game: A little empathy goes a long way

A differential game approach to decentralized virus-resistant weight adaptation policy over complex networks

An analytical framework for the study of epidemic models on activity driven networks

Analysis and control of epidemics in temporal networks with self-excitement and behavioral changes

Impacts of game-theoretic activation on epidemic spread over dynamical networks

Clinical characteristics of 24 asymptomatic infections with COVID-19 screened among close contacts in Nanjing, China

Public health policy: COVID-19 epidemic and SEIR model with asymptomatic viral carriers

Modelling a pandemic with asymptomatic patients, impact of lockdown and herd immunity, with applications to sars-cov-2

Implications of asymptomatic carriers for infectious disease transmission and control

Population games and evolutionary dynamics

Dynamic population games

Anonymous sequential games

Equilibria of dynamic games with many players: Existence, approximation, and market structure

Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle

Discrete time, finite state space mean field games

Stationary equilibria of mean field games with finite state and action space

Networks: An Introduction

Competitive Markov decision processes

A dynamic population model of strategic interaction and migration under epidemic risk